Prompt Engineering, Part 2 - New Tasks Without Extensive Training

jashburn8020 · Sat, 15 Jun 2024

Prompt engineering techniques for new tasks without the need for extensive training for the model.

@startmindmap * Prompt Engineering * New Tasks Without Extensive Training * Zero-Shot Prompting * Few-Shot Prompting @endmindmap

Note: The info on this page is mainly summarised from the references listed below, except where indicated otherwise.

Zero-Shot Prompting

  • Large language models (LLMs) such as GPT-3.5 Turbo, GPT-4, and Claude 3, are tuned to follow instructions and are trained on large amounts of data.
    • Large-scale training makes these models capable of performing some tasks in a “zero-shot” manner.
  • Zero-shot prompting: the prompt used to interact with the model won’t contain examples or demonstrations.
    • The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it.
  • Removes the need for extensive training data, instead relying on carefully crafted prompts that guide the model toward novel tasks.
  • The model receives a task description in the prompt but lacks labelled data for training on specific input-output mappings.
  • The model then leverages its pre-existing knowledge to generate predictions based on the given prompt for the new task.

Prompt:

Classify the text into neutral, negative or positive.
Text: I think the vacation is okay.
Sentiment:

Output:

Neutral


  • Note that in the prompt above we didn’t provide the model with any examples of text alongside their classifications, the LLM already understands “sentiment”.

References

  • Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927.
  • Zero-Shot Prompting – Nextra. (2022). Promptingguide.ai. https://www.promptingguide.ai/techniques/zeroshot

Few-Shot Prompting

  • LLMs fall short on more complex tasks when using the zero-shot setting.
  • Few-shot prompting can be used as a technique to enable in-context learning.
    • We provide demonstrations in the prompt to steer the model to better performance, to induce an understanding of a given task.
    • The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.
  • Requires additional tokens to include the examples, which may become prohibitive for longer text inputs.
    • Tokens:
      • Fundamental units of text that the model processes.
      • Can be a word, a part of a word (e.g., “programming” might be tokenised into “program” and “ming”), or even a single character.
    • Token-based billing:
      • Most LLM service providers charge based on the number of tokens processed.
      • Includes both input tokens (the text you send to the model) and output tokens (the text the model generates in response). (OpenAI, 2024)
  • The selection and composition of prompt examples can significantly influence model behaviour, and biases like favouring frequent words may still affect few-shot results.
    • Careful prompt engineering is critical to achieve optimal performance and mitigate unintended model biases.

Prompt:

A “whatpu” is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.

To do a “farduddle” means to jump up and down really fast. An example of a sentence that uses the word farduddle is:

Output:

When we won the game, we all started to farduddle in celebration.


  • The model has somehow learned how to perform the task by providing it with just one example (i.e., 1-shot).
  • For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).

Prompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.

Output:

You’re right! Let’s add up the odd numbers: 15 + 5 + 13 + 7 + 1 = 41. And indeed, 41 is an even number. It’s an interesting observation!


  • With increased demonstrations:

Prompt:

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:

Output:

Let’s examine the last group:

Odd numbers: 15, 5, 13, 7, 1

Sum of odd numbers: 15 + 5 + 13 + 7 + 1 = 41

Since 41 is an odd number, the statement is:

A: The answer is False.


  • Standard few-shot prompting works well for many tasks but is still not a perfect technique, especially when dealing with more complex reasoning tasks.
  • It might help if we break the problem down into steps and demonstrate that to the model.
  • More recently, chain-of-thought (CoT) prompting has been popularised to address more complex arithmetic, common sense, and symbolic reasoning tasks.

Tips

  • The label space and the distribution of the input text specified by the demonstrations are both important, regardless of whether the labels are correct for individual inputs.
    • Label space: The range of possible outputs the model can generate.
      • The demonstrations should cover a diverse range of potential outputs within this label space.
      • E.g., if you want the LLM to classify sentiment, the labels could be positive, negative, or neutral.
      • Providing examples for each category helps the model understand the different possibilities within the label space. (Google AI, 2024)
    • Distribution of the input text: The variety of text formats and styles used in the demonstrations.
      • LLMs learn from the patterns and characteristics present in the input text.
      • Providing a diverse range of text styles (e.g., formal, informal, different sentence structures) helps the model generalise its knowledge to unseen examples.
      • This is important because the model may encounter different writing styles in real-world applications. (Google AI, 2024)
  • The format you use also plays a key role in performance, even if you just use random labels, this is much better than no labels at all.
    • Format: The way you structure your demonstrations.
      • Acts as a guide, helping the model understand the relationship between the input text and the desired output, even if the specific labels are inaccurate.
      • Random labels still provide some level of structure and guidance for the model.
        • Without any labels, the LLM has no reference point for the expected output, making it more difficult to learn the task effectively. (Google AI, 2024)
    • Using random labels from the true distribution of labels (e.g., if the true labels are mostly positive and few negative, using a similar distribution in the random labels) can further enhance the model’s performance.
      • Imagine you’re training an LLM for sentiment classification with labels like positive, negative, and neutral.
      • Using random labels would involve assigning these labels to the demonstration text examples in a completely random manner. (Google AI, 2024)

References

Twitter, Facebook