Maximum Length | Elements of prompt engineering

Learn how to control the maximum length of prompt outputs and structure your prompts to prevent context window failures.

You may have noticed that when new models are released, they frequently include a suffix like 4k, 8k, 16k, 32k, or even 100k. This number refers to the “context windowi.e., the total number of “tokens” that you can include in a given prompt-and-completion pair.  

If the length of your prompt plus the generated completion exceeds the model’s context window, the request will fail or get terminated prematurely. Maximum length (also called “max tokens”) lets you control how long an outputted completion can be to ensure such failures don’t happen. Setting the “maximum length” of a completion does not determine how long the completion is; rather it sets the ceiling on that outputted text. Often, the actual completion is significantly less than the maximum length.

To take an example, let’s say you are working with GPT 4 8k, which has a context window of precisely 8,192 tokens. Since a token represents about ¾ of a word or four characters, that window roughs out to about 24 double-spaced pages (at 250 words per page). If you expect to send a prompt of 20 pages, you will want to set the “maximum length” of the completion to no more than 1,000 tokens, or roughly three double-spaced pages. By contrast, if you are working with GPT 4o’s 123k model, which has a context window of nearly 350 doubled-spaced pages, you can set the maximum length much, much higher.  

For legal tasks involving analyzing long documents or complicated instructions, a larger context window can make a crucial difference. However, some use cases go beyond even the large context windows offered by GPT 4o and Claude 3.5. In those cases, you need to think about how to “chunk” your prompt text logically so it can be separated into multiple prompts.

For example, if you are trying to summarize the entirety of the United States Federal Rules of Civil Procedure so you can then use a single prompt to ask questions about the Rules, a good approach would be to summarize each of the Titles separately. You could then concatenate the summaries into one larger summary and feed that into a prompt where you ask your question.

For truly massive summarization tasks, you can even chain prompts together by feeding your concatenated summary into another prompt which then summarizes that text. Another important technique used for dealing with large amounts of text is to use embeddings, which allows you to retrieve the most “relevant” text from a large document corpus so that those relevant chunks of text can be used in a prompt.

Next articles