Learn how to configure an LLM's temperature setting to optimize for consistency or creativity.
A model’s temperature determines how creative its outputs are. The lower the temperature, the more deterministic and consistent the outputs will be. The higher the temperature, the more random and variable the outputs will be.
For legal uses cases – where consistency and logic are paramount – you usually want to set the temperature of the model to the very lowest value, which for OpenAI’s GPT series is 0.
However, there may be times when dialing the temperature up to .5 or even 1 makes sense – for example, when using prompts to draft more casual, freeform text like emails or letters.
Let’s see temperature in action by doing some simple math. First let’s set the temperature to 0:
User Message:
What is two plus two?
Completion:
Four
Now let’s try a temperature of 1:
User Message:
What’s two plus two?
Completion:
Four
Wait… nothing changed?
That’s because while temperature controls if a model’s outputs are more deterministic or random, large language models are fundamentally built around the concept of predicting the next word.
And in the case of a prompt like “What is two plus two?”, the overwhelming majority of data the model is trained on will indicate that the next word can only be four.
Now let’s try a use case where the instructions are more complex and the desired output is more ambiguous. Again, we’ll start with a temperature of 0:
System Message:
1. Carefully read the text provided by the user.
2. Summarize its contents in three numbered sentences.
3. Be sure to include key facts, dates, and names.
User Message:
Subject: Immediate Repayment of Outstanding Loan
Dear Acme Inc.,
I write to you on behalf of XYZ Co. regarding the outstanding loan of $10,000 that your company owes us. Despite our previous correspondences and reminders, we have yet to receive any payment or response from your end. This is not only unprofessional but also a clear violation of the terms agreed upon in our contract.
The loan was due for repayment on January 1, 2023, and as of today, it is seven months late. This delay is unacceptable and has caused significant inconvenience to our operations.
We demand immediate action to rectify this situation. The full amount of $10,000 must be paid to XYZ Co. within the next 10 business days. Failure to comply with this demand will leave us with no choice but to take legal action to recover the debt, including but not limited to filing a lawsuit against Acme Inc.
Please consider this as the final warning. We expect your prompt attention to this matter to avoid further complications.
Kindly confirm receipt of this letter and inform us of your plan of action at the earliest.
From,
XYZ Co.
Completion:
1. XYZ Co. has written to Acme Inc. demanding immediate repayment of an outstanding loan of $10,000, which was due on January 1, 2023, and is now seven months late.
2. Despite previous reminders, XYZ Co. has not received any payment or response from Acme Inc., which they consider unprofessional and a violation of their contract terms.
3. XYZ Co. is demanding full payment within the next 10 business days, warning that failure to comply will result in legal action, including a potential lawsuit against Acme Inc.
With its temperature set to 0, GPT 4 has done an admirable job of following our instructions, providing a summary that captures the key facts of the demand letter and that consists of three numbered sentences.
Let’s see how the results change when we set the temperature to 1.5:
Completion:
1. XYZ Co. has sharply reminded Acme Inc. to repay an outstanding loan amount of $10,000 that is due since January 1, 2023, about a delay that operationally inconvenienced XYZ.
2. XYZ communicated threats of a legal course, including filing a lawsuit if payment is not made within the next ten business days decided by the contract.
3. Concluding with a final warning, XYZ solicits Acme's soonest plan regarding this important matter and requests confirmation in receipt of the letter soon.
While this summary is mostly correct, it contains a few subtle errors. For example, the summary says “XYZ communicated threats of a legal course, including filing a lawsuit if payment is not made within the next ten business days decided by the contract” but nothing in the demand letter text actually states that the ten business day deadline was in the original contract.
Additionally, the entire summary reads awkwardly. Phrases like “XYZ solicits Acme’s soonest plan”, “has sharply reminded Acme”, and “operationally inconvenienced” are not typical and would likely stand out in a real-world communication.
So why would you ever use a higher temperature? The most common reason is when you want the output to change even if the prompt is the same. For example, maybe you want to use a prompt to come up with ideas for how to rewrite a paragraph:
System Message:
Rephrase the text provided by the user to make it more collegial and less aggressive.
User Message:
I write to you on behalf of XYZ Co. regarding the outstanding loan of $10,000 that your company owes us. Despite our previous correspondences and reminders, we have yet to receive any payment or response from your end. This is not only unprofessional but also a clear violation of the terms agreed upon in our contract.
With a temperature of 0, the outputs are nearly identical:
Completion 1:
I'm reaching out to you on behalf of XYZ Co. concerning the outstanding loan of $10,000 that is due from your company. We've noticed that despite our previous communications and reminders, we haven't received any payment or response. We understand that there may be reasons for this delay, but we would like to remind you of the terms agreed upon in our contract. We appreciate your prompt attention to this matter.
Completion 2:
I'm reaching out to you on behalf of XYZ Co. concerning the outstanding loan of $10,000 that is due from your company. We've noticed that despite our previous communications and reminders, we haven't received any payment or response. We understand that there may be reasons for this delay, but we would like to remind you of the terms agreed upon in our contract. We appreciate your prompt attention to this matter.
With a temperature of 1.2, the outputs have much higher variance:
Completion 1:
I'm reaching out on behalf of XYZ Co. to discuss the remaining balance of $10,000 attributed to your organization. We've reached out a few times previously with no response determined on this matter. We encourage improving this communication, as honoring the deliverables agreed upon in our contract not only reinforces duty but professionalism as well.
Completion 2:
I'm composing this note on behalf of XYZ Co. in relation to the amount of $10,000 that is still due from your company. Despite prior communications sent, it seems that we are yet to receive an update or payment from your side. This situation brings us to a deviation from the pre-determined terms outlined in our handling agreement, and we hope we can work together to resolve this issue professionally.
A couple of observations about the examples above.
First, as the temperature is dialed up, we start to see more “hallucinations” creep in. These are instances where the model makes up something out of thin air. For example, the second output mentions a “handling agreement” – something neither the system message nor user message ever reference.
Second, while the “creativity” of the output does appear to go up along with the temperature, this mostly happens on the level of the words being written, not the ideas being conveyed. A good way to conceptualize temperature is to imagine if someone asked you to rewrite the same sentence ten times. Your first rewrite would likely be the most “obvious” or “predictable” version; by the time you reached the tenth rewrite, you would likely be stretching your language skills for new or unusual turns of phrase. But in all of the drafts you wrote of the sentence, the substance of what you were trying to convey would likely be similar.
This is akin to what happens when you use a higher temperature. If you’re goal is to make the substance or ideas outputted by the model more creative, that is better controlled by carefully designing and testing prompts that give you the desired results, rather than turning up the proverbial heat 😊
So keep that temperature at “0” unless you have a very good reason to not!