mirza.town
about archive rss

04/08/2023

Guidance is magic and magic is expensive.

Wouldn’t it be nice just to force ChatGPT to generate a text with a specific format? Sometimes it has it’s own mind and ~prompt engineering~ is not effective enough. Just give me a valid JSON and I’ll be on my way!

Lets say we have a text like this;

Hello! I’d like to buy PIC16F877A I/PT SMD TQFP-44 8-Bit 20 MHz Microcontroller. I’d like to pay with my credit card. My creditcard number is 1234 5678 9012 3456 and the expiration date is 12/34. My name is John Doe and my address is 1234 Main St. Springfield, USA. I’d like to receive my order in 2 days. How is Nancy by the way? Anyways, thank you! Oh, and I’d like tobuy 3 of those!

Without any deviation, any error, any typo, any anything we want to get a JSON like this;

{
    "product": "PIC16F877A I/PT SMD TQFP-44 8-Bit 20 MHz Microcontroller",
    "payment": {
        "type": "credit card",
        "number": "1234 5678 9012 3456",
        "expiration": "12/34"
    },
    "address": {
        "name": "John Doe",
        "street": "1234 Main St.",
        "city": "Springfield",
        "country": "USA"
    },
    "quantity": 3
}

Welp, Guidance lets us do just that. But…

How?

First, we need to specify which parts of the JSON schema needs to be filled. Simply;

Extract information from given text and fill the JSON.
Text: "{{user_input}}"

JSON:
{
    "product": "{{gen 'name' stop='"'}}",
    "payment": {
        "type": "{{gen 'payment_type' stop='"'}}",
        "number": "{{gen 'credit_card_number' stop='"'}}",
        "expiration": "{{gen 'expiration_date' stop='"'}}"
    },
    "address": {
        "name": "{{gen 'name' stop='"'}}",
        "street": "{{gen 'street' stop='"'}}",
        "city": "{{gen 'city' stop='"'}}",
        "country": "{{gen 'country' stop='"'}}"
    },
    "quantity": "{{gen 'quantity' stop='"'}}"
}

And the the result will be just the way want. Neat! But…

What’s the catch?

You pay for every and I mean every single {{gen}} expression. You have a user input 1000 tokens long? Let’s say you are using text-davinci-003 model which can extract and correct information fairly well. It costs 0.02$ per 1000 token, not bad. But you have 9 fields that you want to extract and fill. Behind the scenes, Guidance will generate the fields in a loop which takes the output of each generation and feeds it to the next one. You will end up paying minimum 0.18$ for a single user input. Not so neat.

Important note; You can’t use the gpt-3.5-turbo model for completion tasks. It’s a shame since it’s 10 times cheaper than the text-davinci-003. You’d have to pray that it outputs just the JSON and not some explanation or a question.

My thoughts

I came across this problem while I was building CV Builder But Better where I needed to extract key information from a user input. Welp, Guidance is nice but I opted for standart prompt engineering. If the answer doesn’t start with a { or ends with a }, I just discard it and ask the user to try again. It’s not perfect but it’s way cheaper and faster. Guidance readme states:

When multiple generation or LLM-directed control flow statements are used in a single Guidance program then we can significantly improve inference performance by optimally reusing the Key/Value caches as we progress through the prompt. This means Guidance only asks the LLM to generate the green text below, not the entire program. This cuts this prompt’s runtime in half vs. a standard generation approach.

I might be dumb but I don’t see the implications of this. The green text they mention, was already the only thing they was asking LLM to generate. Do they mean it stops generating when needed? Welp, even if it’s 2 times faster for a single {{gen}} expression, wouldn’t it be 2 times slower if there are 4 {{gen}} expressions?

If you need a ready to use text template - such as a Government form - and you want to fill specific, known fields with user input, Guidance is the way to go since it’s easier to use and it’s more reliable. But if you want to extract information from a user input, I’d say stick with prompt engineering.

Update

07/08/2023

I’ve came across ChatGPT’s new feature; function calling which can deliver the same results as Guidance for my case and many more useful functionality.

I’m not going to implement it in CVBBB since this is specific to OpenAI’s API and I don’t want to be dependent on it because in the end, I want to be able to generate CVs locally for free.