Prompt Engineering for Large Language Models
We have been using prompt engineering in an informal way in the last few chapters. In this chapter we look at prompt engineering in more detail. My hope is that, dear reader, you find this chapter to be both a short reference for Large Language Models (LLMS) and a useful source of prompt examples at the end of the chapter.
Two Types of LLMS
In order to avoid “model hallucinations” (i.e., models making up answers with no basis in fact) when using Large Language Models (LLMS) it is important to explain in detail instructions and add context text to help the model better answer questions of solve problems.
Base LLMs and Instruction Tuned LLMs are both types of advanced AI language models. While they share some similarities in their underlying architecture and training processes, they have distinct differences in terms of their fine-tuning and usage. Here, we will discuss the similarities and differences in detail.
The similarities are:
- Architecture: Both models are based on the Transformer architecture, which has been shown to be highly effective for various natural language processing tasks. This architecture relies on self-attention mechanisms to process input data and generate outputs.
- Pretraining: Both models undergo a pretraining phase where they learn from a large corpus of text, such as websites, books, and articles. This helps the models acquire a general understanding of language, grammar, facts, and some reasoning abilities.
- Transfer learning: Both models can use transfer learning, which involves fine-tuning the pretrained models on specific tasks or datasets. This process adapts the model’s general knowledge to the target application, improving its performance on the desired task.
The differences are:
- Fine-tuning objectives: The key difference between the two models lies in their fine-tuning process. Base Large Language Models are fine-tuned to generate human-like text with minimal explicit guidance, whereas Instruction Tuned Large Language Models are specifically optimized to follow instructions provided in the input prompt.
- Task performance: Due to their different fine-tuning objectives, Instruction Tuned Large Language Models tend to perform better on tasks that require following explicit instructions, while Base Large Language Models may produce more diverse and creative responses that may not always adhere to the given instructions.
- Usage: Base Large Language Models can be used for a wide variety of tasks, including text generation, summarization, translation, and question-answering. Instruction Tuned Large Language Models, on the other hand, are designed to be more controllable and responsive to input instructions, making them more suitable for applications that require precise output based on specific instructions.
- Prompt engineering: To get the desired output from a Base Large Language Model, users often need to carefully craft prompts, which can be a trial-and-error process. With Instruction Tuned Large Language Models, users can provide more explicit instructions to achieve the desired output more consistently.
In summary, both Base and Instruction Tuned Large Language Models share the same foundation in terms of architecture and pretraining but differ in their fine-tuning objectives and application strengths. While Base Large Language Models generate more diverse and creative text, Instruction Tuned Large Language Models excel at tasks that require following specific instructions and are more controllable.
Prompt Examples
We have used both OpenAI and Hugging Face Models so far in this book. When you practice prompt engineering and learn new techniques, your knowledge of prompt engineering can be applied to the OpenAI and Hugging Face APIs and models.
As Andrew Ng and Isa Fulford point out in their one hour class ChatGPT Prompt Engineering for Developers, you can craft prompts as if you were instructing an intelligent person who is not necessarily familiar with the current problem you are trying to solve with prompt engineering. There is a bit of a tradeoff here: you pay per input and output tokens with the OpenAI APIs so you may be tempted to compress your context text, instructions, and questions when creating prompts. However, in general to more context and information you can provide increases the quality of the resulting output.
The following sections show examples of prompt engineering that I use in my own work. In order to learn more techniques I recommend also performing web search for prompt engineering with a description of the problem you are working on. As an example, I asked Microsoft Bing+ChatGPT in a chat to help me with I need help with prompt engineering using the OpenAI APIs. My problem is: I have a text file that contains customer names and phone numbers. I need a prompt to list names and phone numbers. and the advice was a useful starting point. In a similar way I received useful advice when chatting with Google Bard.
Find Entities and Relationships
Prompt:
1 The task is to extract as many relevant entities and the relationships between these entities.\n
2 When possible, return the type of an entity using the Wikipedia class system.\n
3 Extract all relevant relationships between identified entities.\n
4 The relationships should follow the Wikipedia schema type.
5 The output of a relationship should be in a form of a triple Head, Relationship, Tail, for example
6 Bill Gates, BORN_AT, Seatle.\n
7 As an example "Bill Gates worked at Microsoft" should have an output with the following format:\n
8 entity
9 Bill Gates, person
10 Microsoft, organization
11
12 relationships
13 Bill Gates, WORKET AT, Microsoft\n
14 Task: process this text:
15 Steve Jobes founded Apple Computer and Bill Gates started Microsoft. Seattle is in the state of Washington.
16 Steve lived in Palo Alto. Palo Alto is located in California.
ChatGPT 4 returns:
1 entities
2 Steve Jobs, person
3 Apple Computer, organization
4 Bill Gates, person
5 Microsoft, organization
6 Seattle, city
7 Washington, administrative territorial entity
8
9 relationships
10 Steve Jobs, FOUNDED, Apple Computer
11 Bill Gates, FOUNDED, Microsoft
12 Seattle, IS_IN, Washington
Disambiguate Relationship Names
Prompt:
1 Given a list of relationships like:
2
3 brother
4 located in
5 sister
6 sibling
7 found at
8 father
9 location
10 mother
11 relative
12
13 Then return a list with an index for each unique relationship, like:
14
15 brother, 1
16 located in, 2
17 sister, 1
18 sibling, 1
19 found at, 2
20 father, 1
21 location, 2
22 mother, 1
23 relative, 1
24
25 Task: process the following text:
26
27 Sam has a brother Mike. Suzy has a sister Sally. Suszy ahs a mother Jill, and a father Bill.
28 Mike is in San Diego. Sam is located is Oakland. Suzy's family all live in Los Angeles.
ChatGPT 4 results are:
1 Based on the provided list of relationships, here is the processed text with an index for each unique relationship:
2
3 Sam has a brother Mike. (brother, 1)
4 Suzy has a sister Sally. (sister, 1)
5 Suzy has a mother Jill, and a father Bill. (mother, 1; father, 1)
6 Mike is in San Diego. (located in, 2)
7 Sam is located in Oakland. (located in, 2)
8 Suzy's family all live in Los Angeles. (location, 2)
9
10 Here, relationships like "brother," "sister," "mother," and "father" are assigned an index of 1, while relationships like "located in" and "location" are assigned an index of 2.
Disambiguate Relationship Names and Rewrite Text
Prompt:
1 We want to disambiguate relationship names and then rewrite text using canonical relationship names.
2
3 Given a list of relationships like:
4
5 brother
6 located in
7 sister
8 sibling
9 found at
10 father
11 location
12 mother
13 relative
14
15 Then generate a list with an index for each unique relationship, like:
16
17 brother, 1
18 located in, 2
19 sister, 1
20 sibling, 1
21 found at, 2
22 father, 1
23 location, 2
24 mother, 1
25 relative, 1
26
27 And then rewrite the original text with the relation indices.
28
29 Task: process the following text:
30
31 Sam has a brother Mike. Suzy has a sister Sally. Suszy ahs a mother Jill, and a father Bill.
32 Mike is in San Diego. Sam is located is Oakland. Suzy's family all live in Los Angeles.
The ChatGPT 4 output is:
1 First, we create the indexed list of unique relationships:
2
3 brother, 1
4 located in, 2
5 sister, 1
6 sibling, 1
7 found at, 2
8 father, 1
9 location, 2
10 mother, 1
11 relative, 1
12
13 Now, we rewrite the original text using the relation indices:
14
15 Sam has a 1 Mike. Suzy has a 1 Sally. Suszy has a 1 Jill, and a 1 Bill.
16 Mike is 2 San Diego. Sam is 2 Oakland. Suzy's family all 2 Los Angeles.
In the first sentence Sam has a 1 Mike the number 1 refers to brother.
Disambiguate Relationship Names and Rewrite Text, Only Output Rewritten Text
Here we output just results in JSON format.
Prompt:
1 We want to disambiguate relationship names and then rewrite text using canonical relationship names.
2
3 Given a list of relationships like:
4
5 brother
6 located in
7 sister
8 sibling
9 found at
10 father
11 location
12 mother
13 relative
14
15 Then generate a list with an index for each unique relationship, like:
16
17 brother, <1>
18 located in, <2>
19 sister, <1>
20 sibling, <1>
21 found at, <2>
22 father, <1>
23 location, <2>
24 mother, <1>
25 relative, <1>
26
27 And then rewrite the original text using the relation indices, and list relation names with indices. Only output rewritten text and names with indices.
28
29 Task: process the following text outputing JSON format:
30
31 Sam has a brother Mike. Suzy has a sister Sally. Suzy has a mother Jill, and a father Bill.
32 Mike is in San Diego. Sam is located is Oakland. Suzy's family all live in Los Angeles.
The ChatGPT 4 output is:
1 {
2 "relationships": {
3 "brother": "<1>",
4 "located in": "<2>",
5 "sister": "<1>",
6 "sibling": "<1>",
7 "found at": "<2>",
8 "father": "<1>",
9 "location": "<2>",
10 "mother": "<1>",
11 "relative": "<1>"
12 },
13 "rewritten_text": "Sam has a brother Mike. Suzy has a sister Sally. Suzy has a mother Jill, and a father Bill.\nMike is <2> San Diego. Sam is <2> Oakland. Suzy's family all live <2> Los Angeles."
14 }
Topic Modeling for Document Classification
In this example we show a complete template that includes document text. To use this in your programs, replace the document text with a token string that you can replace with the text for the document you are topic modeling.
Prompt:
1 Topic modeling involves reading text and assigning a topic name (also known as a document classification).
2 The topics that you know how to classify are:
3
4 sports
5 health
6 chemistry
7 economy
8 politics
9
10 The output format will be:
11
12 Known topics: <list of all topics>
13 Topic: <topic of text>
14
15 Task: what is the best topic name for the following text:
16
17 John and Sam went to the playing field yesterday. Sam sprained his ankle.
The result is:
1 Known topics: sports, health, chemistry, economy, politics
2
3 Topic: sports
In this prompt template, I used placeholder document text John and Sam went to the playing field yesterday. Sam sprained his ankle. that you would replace with the longer text from the document you want to determine the topic or classification of.
In this template, I like to ask the LLM to repeat the list of topics (or classifications) it knows.
Prompt Engineering Wrapup
When you experiment with the examples in this chapter you can both use the ChatGPT web interface to modify and develop prompts and you can also revisit the chapter on OpenAI APIs and experiment with prompts programmatically. My personal preference is to perform initial experiments with the ChatGPT web interface but to do most of the iterative prompt development in Emacs with a SLIME REPL.
I mentioned earlier Andrew Ng’s and Isa Fulford’s one hour prompt engineering class. The devote time and provide good advice on interactive developement. Their examples use the Python language.