Use separators to divide prompts | Best practices for legal prompt engineering

Use headers, special characters, XML tags, or system messages as separators to improve prompt clarity and guide LLMs to deliver more accurate, structured responses.

An important way to make sure your instructions are followed is to use separators like headers or special characters. Well used, separators can make all the difference when it comes to prompt engineering.

Let’s take an example where we don’t include separators when asking for the LLM to extract entities like the Investor, Signer, Investment Date, and Investment Amount:

System Message:

Extract the specified entities from the text.

User Message:

THIS CERTIFIES THAT in exchange for the payment by IOU Inc (the “Investor”) of $400,000 (the “Purchase Amount”) on or about 12-31-2022, Logs A-Lot Co., a Delaware corporation (the “Company”), issues to the Investor the right to certain shares of the Company’s Capital Stock, subject to the terms described below.
This Safe is one of the forms available at http://ycombinator.com/documents and the Company and the Investor agree that neither one has modified the form, except to fill in blanks and bracketed terms.  
The “Post-Money Valuation Cap” is $40,000,000.  See Section 2 for certain additional defined terms…
[BODY OF AGREEMENT OMITTED]
IN WITNESS WHEREOF, the undersigned have caused this Safe to be duly executed and delivered.
COMPANY:
By:  
Address:  
Email:  
INVESTOR:
By:  
Name:  
Title:  
Address:  
Email:  
Investor
Text
Signer
Text
Investment
Number
Investment Date
Date

Completion:

Investor: IOU Inc
Payment by Investor: $400,000
Date of Payment: 12-31-2022
Company: Logs A-Lot Co.
Post-Money Valuation Cap: $40,000,000
Form Location: http://ycombinator.com/documents
Note how the completion doesn’t have all the information we asked for. That shouldn’t be surprising: Even an experienced human would have trouble deciphering the instructions above.  

To remedy this, let’s add some simple separators – in this case headings to clearly identify the text we want to analyze versus the entities we want to extract from that text:

System Message:

Extract the specified ENTITIES from the DOCUMENT TEXT using the specified Labels and Data Type.

User Message:

DOCUMENT TEXT:

THIS CERTIFIES THAT in exchange for the payment by IOU Inc (the “Investor”) of $400,000 (the “Purchase Amount”) on or about 12-31-2022, Logs A-Lot Co., a Delaware corporation (the “Company”), issues to the Investor the right to certain shares of the Company’s Capital Stock, subject to the terms described below.
This Safe is one of the forms available at http://ycombinator.com/documents and the Company and the Investor agree that neither one has modified the form, except to fill in blanks and bracketed terms.  
The “Post-Money Valuation Cap” is $40,000,000.  See Section 2 for certain additional defined terms…
[BODY OF AGREEMENT OMITTED]
IN WITNESS WHEREOF, the undersigned have caused this Safe to be duly executed and delivered.
COMPANY:
By:  
Address:  
Email:  
INVESTOR:
By:  
Name:  
Title:  
Address:  
Email:  


ENTITIES:  

Entity #1:
Label: Investor
Data Type: Text
Entity #2:
Label: Signer
Data Type: Text
Entity #3:
Label: Investment
Data Type: Number
Entity #4:
Label: Investment Date
Data Type: Date

With the addition of these simple headers and clarifying labels, our completions are now consistent and contain exactly the information we want:  

Completion:

Entity #1: IOU Inc
Entity #2: Not provided in the document
Entity #3: $400,000
Entity #4: 12-31-2022

Using separators becomes even more important when you are putting a lot of content in your prompt or using a prompt to combine information from multiple sources – for example, document text, case data, and a timeline.

In terms of how to separate out sections of your prompt, there are a couple of options that you can try out:

Headers

This is probably the most straightforward approach. Use headers in all caps like “THIS” followed by a colon and line break to denote different sections of text. This works for many use cases but one thing to watch out for is if you are putting document text into your prompt, the document text may itself contain similarly styled headers, which can occasionally lead to confused or inconsistent outputs.

Special characters

You can also use special characters as separators so long as they are unlikely to appear in any other text you incorporate into your prompts (this guarantees the LLM can consistently pick the right special characters out). OpenAI, for example, recommends using triple quotation marks (like """this""") and then calling this type of notation out in your prompt (“Analyze the text contained in the triple quotation marks”). You could just as easily use other delimiters like “+++this+++” or “<<<this>>>” if you feel confident those won’t show up much in your text or in the model’s training data.

XML tags

A good way to think about LLMs are as a sort of “fuzzy” computer. They let people write in normal sentences and control how software behaves – basically, they let you code without code! That said, writing in the syntax of “code” can sometimes be helpful because, while the LLM won’t actually execute that code, code syntax represents a more formal – and therefore more understandable – way of expressing language, at least for the LLM. Does this mean you need to write code to be a prompt engineer? Absolutely not. But simply by enclosing sections of your prompt in little bits of code syntax called XML tags like <tag-name>this</tag-name>, you can make it easier on the LLM to separate your prompt into logical concepts.  

System Messages

A late entrant to the separator party are system messages, which have recently been added to the prompting capabilities of some models like GPT 4 and GPT 3.5. System messages are covered more fully in the “Element: System Message versus User Message” article but the basic idea in using system messages as a separator is to put your instructions in the system message and the text you want model to perform those instructions on in the user message. In fact, that’s exactly what we did in the earlier example by putting the instructions “Extract the specified ENTITIES from the DOCUMENT TEXT using the specified Labels and Data Type” in the system message.

Additional Reading:

Next articles