mirza.town
about archive rss

05/01/2025

Don’t Use JSON Schemas in Your LLM Prompts

Use them in the response_schema instead, if your LLM supports it of course. :^)

After a long and costly battle with Gemini, I’m calling it quits. I’ve been trying to get a JSON response from Gemini’s flash model for a while now. Most of the time, the JSON response was malformed (often beyond repair), and I had to manually fix it. This required extensive testing to identify where it broke and whether the breakage was consistent enough to address.

Thanks to my team lead, I found out that providing a JSON schema in the prompt was only one way to get a JSON response. The better approach was to create a custom class that represents the JSON response and use that class with the response_schema parameter in the GenerationConfig in Python. I guess tunnel vision got the best of me, and I hadn’t even considered this solution.

Here’s a simplified example:

class ComplexObject:
    color: str
    is_cool: bool

class JSONResponse:
    attributes: List[ComplexObject]

generation_config = genai.GenerationConfig(response_mime_type="application/json", 
                                           response_schema=JSONResponse)

gemini_model = genai.GenerativeModel("gemini-1.5-flash-latest",
                                     generation_config=generation_config)

payload =   """
            You are a helpful assistant that outputs 
            random cool colors and numbers based on user input.
            """

response = gemini_model.generate_content(payload)

With this very simplified example, we can get a JSON response from Gemini without any errors—at least so far in my testing.

Nice Tip

If the order of attributes in the JSON response is important, you can add a prefix to the attribute names. For example, to switch the order of the colors and is_cool attributes in the JSON response, you can modify the attribute names like this:

class ComplexObject:
    a_is_cool: bool
    b_color: str

This may not seem like a huge deal, but if the LLM generates a response and is affected by its own prior output, this can be a lifesaver.