Prompt engineering techniques for new tasks without the need for extensive training for the model.
Note: The info on this page is mainly summarised from the references listed below, except where indicated otherwise.
Zero-Shot Prompting
- Large language models (LLMs) such as GPT-3.5 Turbo, GPT-4, and Claude 3, are tuned to follow instructions and are trained on large amounts of data.
- Large-scale training makes these models capable of performing some tasks in a “zero-shot” manner.
- Zero-shot prompting: the prompt used to interact with the model won’t contain examples or demonstrations.
- The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it.
- Removes the need for extensive training data, instead relying on carefully crafted prompts that guide the model toward novel tasks.
- The model receives a task description in the prompt but lacks labelled data for training on specific input-output mappings.
- The model then leverages its pre-existing knowledge to generate predictions based on the given prompt for the new task.
Prompt:
Classify the text into neutral, negative or positive.
Text: I think the vacation is okay.
Sentiment:
Output:
Neutral
- Note that in the prompt above we didn’t provide the model with any examples of text alongside their classifications, the LLM already understands “sentiment”.
References
- Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927.
- Zero-Shot Prompting – Nextra. (2022). Promptingguide.ai. https://www.promptingguide.ai/techniques/zeroshot
Few-Shot Prompting
- LLMs fall short on more complex tasks when using the zero-shot setting.
- Few-shot prompting can be used as a technique to enable in-context learning.
- We provide demonstrations in the prompt to steer the model to better performance, to induce an understanding of a given task.
- The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.
- Requires additional tokens to include the examples, which may become prohibitive for longer text inputs.
- Tokens:
- Fundamental units of text that the model processes.
- Can be a word, a part of a word (e.g., “programming” might be tokenised into “program” and “ming”), or even a single character.
- Token-based billing:
- Most LLM service providers charge based on the number of tokens processed.
- Includes both input tokens (the text you send to the model) and output tokens (the text the model generates in response). (OpenAI, 2024)
- Tokens:
- The selection and composition of prompt examples can significantly influence model behaviour, and biases like favouring frequent words may still affect few-shot results.
- Careful prompt engineering is critical to achieve optimal performance and mitigate unintended model biases.
Prompt:
A “whatpu” is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a “farduddle” means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
Output:
When we won the game, we all started to farduddle in celebration.
- The model has somehow learned how to perform the task by providing it with just one example (i.e., 1-shot).
- For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).
Prompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
Output:
You’re right! Let’s add up the odd numbers: 15 + 5 + 13 + 7 + 1 = 41. And indeed, 41 is an even number. It’s an interesting observation!
- With increased demonstrations:
Prompt:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
Output:
Let’s examine the last group:
Odd numbers: 15, 5, 13, 7, 1
Sum of odd numbers: 15 + 5 + 13 + 7 + 1 = 41
Since 41 is an odd number, the statement is:
A: The answer is False.
- Standard few-shot prompting works well for many tasks but is still not a perfect technique, especially when dealing with more complex reasoning tasks.
- It might help if we break the problem down into steps and demonstrate that to the model.
- More recently, chain-of-thought (CoT) prompting has been popularised to address more complex arithmetic, common sense, and symbolic reasoning tasks.
Tips
- The label space and the distribution of the input text specified by the demonstrations are both important, regardless of whether the labels are correct for individual inputs.
- Label space: The range of possible outputs the model can generate.
- The demonstrations should cover a diverse range of potential outputs within this label space.
- E.g., if you want the LLM to classify sentiment, the labels could be positive, negative, or neutral.
- Providing examples for each category helps the model understand the different possibilities within the label space. (Google AI, 2024)
- Distribution of the input text: The variety of text formats and styles used in the demonstrations.
- LLMs learn from the patterns and characteristics present in the input text.
- Providing a diverse range of text styles (e.g., formal, informal, different sentence structures) helps the model generalise its knowledge to unseen examples.
- This is important because the model may encounter different writing styles in real-world applications. (Google AI, 2024)
- Label space: The range of possible outputs the model can generate.
- The format you use also plays a key role in performance, even if you just use random labels, this is much better than no labels at all.
- Format: The way you structure your demonstrations.
- Acts as a guide, helping the model understand the relationship between the input text and the desired output, even if the specific labels are inaccurate.
- Random labels still provide some level of structure and guidance for the model.
- Without any labels, the LLM has no reference point for the expected output, making it more difficult to learn the task effectively. (Google AI, 2024)
- Using random labels from the true distribution of labels (e.g., if the true labels are mostly positive and few negative, using a similar distribution in the random labels) can further enhance the model’s performance.
- Imagine you’re training an LLM for sentiment classification with labels like positive, negative, and neutral.
- Using random labels would involve assigning these labels to the demonstration text examples in a completely random manner. (Google AI, 2024)
- Format: The way you structure your demonstrations.
References
- Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927.
- Few-Shot Prompting – Nextra. (2023). Promptingguide.ai. https://www.promptingguide.ai/techniques/fewshot
- OpenAI. (2024). ChatGPT (Apr 29 version) [Large language model]. https://chat.openai.com/chat
- Google AI. (2024). Gemini (May 1 version) [Large language model]. https://gemini.google.com/app
