[
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participation in our\ncommunity a harassment-free experience for everyone, regardless of age, body\nsize, visible or invisible disability, ethnicity, sex characteristics, gender\nidentity and expression, level of experience, education, socio-economic status, \nnationality, personal appearance, race, caste, color, religion, or sexual\nidentity and orientation.\n\nWe pledge to act and interact in ways that contribute to an open, welcoming, \ndiverse, inclusive, and healthy community.\n\n## Our Standards\n\nExamples of behavior that contributes to a positive environment for our\ncommunity include:\n\n* Demonstrating empathy and kindness toward other people\n* Being respectful of differing opinions, viewpoints, and experiences\n* Giving and gracefully accepting constructive feedback\n* Accepting responsibility and apologizing to those affected by our mistakes, \n  and learning from the experience\n* Focusing on what is best not just for us as individuals, but for the overall\n  community\n\nExamples of unacceptable behavior include:\n\n* The use of sexualized language or imagery, and sexual attention or advances of\n  any kind\n* Trolling, insulting or derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or email address, \n  without their explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\n  professional setting\n\n## Enforcement Responsibilities\n\nCommunity leaders are responsible for clarifying and enforcing our standards of\nacceptable behavior and will take appropriate and fair corrective action in\nresponse to any behavior that they deem inappropriate, threatening, offensive, \nor harmful.\n\nCommunity leaders have the right and responsibility to remove, edit, or reject\ncomments, commits, code, wiki edits, issues, and other contributions that are\nnot aligned to this Code of Conduct, and will communicate reasons for moderation\ndecisions when appropriate.\n\n## Scope\n\nThis Code of Conduct applies within all community spaces, and also applies when\nan individual is officially representing the community in public spaces.\nExamples of representing our community include using an official e-mail address, \nposting via an official social media account, or acting as an appointed\nrepresentative at an online or offline event.\n\n## Enforcement\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported to the community leaders responsible for enforcement at\n[INSERT CONTACT METHOD].\nAll complaints will be reviewed and investigated promptly and fairly.\n\nAll community leaders are obligated to respect the privacy and security of the\nreporter of any incident.\n\n## Enforcement Guidelines\n\nCommunity leaders will follow these Community Impact Guidelines in determining\nthe consequences for any action they deem in violation of this Code of Conduct:\n\n### 1. Correction\n\n**Community Impact**: Use of inappropriate language or other behavior deemed\nunprofessional or unwelcome in the community.\n\n**Consequence**: A private, written warning from community leaders, providing\nclarity around the nature of the violation and an explanation of why the\nbehavior was inappropriate. A public apology may be requested.\n\n### 2. Warning\n\n**Community Impact**: A violation through a single incident or series of\nactions.\n\n**Consequence**: A warning with consequences for continued behavior. No\ninteraction with the people involved, including unsolicited interaction with\nthose enforcing the Code of Conduct, for a specified period of time. This\nincludes avoiding interactions in community spaces as well as external channels\nlike social media. Violating these terms may lead to a temporary or permanent\nban.\n\n### 3. Temporary Ban\n\n**Community Impact**: A serious violation of community standards, including\nsustained inappropriate behavior.\n\n**Consequence**: A temporary ban from any sort of interaction or public\ncommunication with the community for a specified period of time. No public or\nprivate interaction with the people involved, including unsolicited interaction\nwith those enforcing the Code of Conduct, is allowed during this period.\nViolating these terms may lead to a permanent ban.\n\n### 4. Permanent Ban\n\n**Community Impact**: Demonstrating a pattern of violation of community\nstandards, including sustained inappropriate behavior, harassment of an\nindividual, or aggression toward or disparagement of classes of individuals.\n\n**Consequence**: A permanent ban from any sort of public interaction within the\ncommunity.\n\n## Attribution\n\nThis Code of Conduct is adapted from the [Contributor Covenant][homepage], \nversion 2.1, available at\n[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].\n\nCommunity Impact Guidelines were inspired by\n[Mozilla's code of conduct enforcement ladder][Mozilla CoC].\n\nFor answers to common questions about this code of conduct, see the FAQ at\n[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at\n[https://www.contributor-covenant.org/translations][translations].\n\n[homepage]: https://www.contributor-covenant.org\n[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html\n[Mozilla CoC]: https://github.com/mozilla/diversity\n[FAQ]: https://www.contributor-covenant.org/faq\n[translations]: https://www.contributor-covenant.org/translations\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to Substation\n\nThank you so much for your interest in contributing to the Brex Prompt Engineering Crash Course! This document contains guidelines to follow when contributing to the project.\n\n## Table Of Contents\n\n[Code of Conduct](#code-of-conduct)\n\n[Submissions](#submissions)\n  + [Changes](#submitting-changes)\n  + [Bugs](#submitting-bugs)\n  + [Enhancements](#submitting-enhancements)\n\n## Code of Conduct\n\nThe Code of Conduct can be reviewed [here](CODE_OF_CONDUCT.md).\n\n## Submissions\n\n### Submitting Changes\n\nPull requests should be submitted using the pull request template. Changes will be validated through automation and by the project maintainers before merging to main.\n\n### Submitting Bugs\n\nBugs should be submitted as issues using the issue template.\n\n### Submitting Enhancements\n\nEnhancements should be submitted as issues using the issue template.\n"
  },
  {
    "path": "CONTRIBUTORS.md",
    "content": "# Contributors\n\nThank you to [all of our contributors](https://github.com/brexhq/prompt-engineering/graphs/contributors). For reviewing per-file contributions, run the following commands:\n\n```sh\ngit blame <file>\ngit log -p <file>\n```\n\n# Brex Team\n* [Steve Krenzel](https://twitter.com/stevekrenzel)\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Brex\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# [Brex's](https://brex.com) Prompt Engineering Guide\n\nThis guide was created by Brex for internal purposes. It's based on\nlessons learned from researching and creating Large Language Model (LLM)\nprompts for production use cases. It covers the history around LLMs as well as\nstrategies, guidelines, and safety recommendations for working with and\nbuilding programmatic systems on top of large language models, like [OpenAI's\nGPT-4](https://openai.com/research/gpt-4).\n\nThe examples in this document were generated with a non-deterministic language\nmodel and the same examples may give you different results.\n\nThis is a living document. The state-of-the-art best practices and strategies\naround LLMs are evolving rapidly every day. Discussion and suggestions for\nimprovements are encouraged.\n\n## Table of Contents\n- [What is a Large Language Model?](#what-is-a-large-language-model-llm)\n  - [A Brief, Incomplete, and Somewhat Incorrect History of Language Models](#a-brief-incomplete-and-somewhat-incorrect-history-of-language-models)\n    - [Pre-2000’s](#pre-2000s)\n    - [Mid-2000’s](#mid-2000s)\n    - [Early-2010’s](#early-2010s)\n    - [Late-2010’s](#late-2010s)\n    - [2020’s](#2020s)\n- [What is a prompt?](#what-is-a-prompt)\n  - [Hidden Prompts](#hidden-prompts)\n  - [Tokens](#tokens)\n  - [Token Limits](#token-limits)\n  - [Prompt Hacking](#prompt-hacking)\n    - [Jailbreaks](#jailbreaks)\n    - [Leaks](#leaks)\n- [Why do we need prompt engineering?](#why-do-we-need-prompt-engineering)\n  - [Give a Bot a Fish](#give-a-bot-a-fish)\n    - [Semantic Search](#semantic-search)\n  - [Teach a Bot to Fish](#teach-a-bot-to-fish)\n    - [Command Grammars](#command-grammars)\n    - [ReAct](#react)\n    - [GPT-4 vs GPT-3.5](#gpt-4-vs-gpt-35)\n- [Strategies](#strategies)\n  - [Embedding Data](#embedding-data)\n    - [Simple Lists](#simple-lists)\n    - [Markdown Tables](#markdown-tables)\n    - [JSON](#json)\n    - [Freeform Text](#freeform-text)\n    - [Nested Data](#nested-data)\n  - [Citations](#citations)\n  - [Programmatic Consumption](#programmatic-consumption)\n  - [Chain of Thought](#chain-of-thought)\n    - [Averaging](#averaging)\n    - [Interpreting Code](#interpreting-code)\n    - [Delimiters](#delimiters)\n  - [Fine Tuning](#fine-tuning)\n    - [Downsides](#downsides)\n- [Additional Resources](#additional-resources)\n\n## What is a Large Language Model (LLM)?\n\nA large language model is a prediction engine that takes a sequence of words\nand tries to predict the most likely sequence to come after that sequence[^1].\nIt does this by assigning a probability to likely next sequences and then\nsamples from those to choose one[^2]. The process repeats until some stopping\ncriteria is met.\n\nLarge language models learn these probabilities by training on large corpuses\nof text. A consequence of this is that the models will cater to some use cases\nbetter than others (e.g. if it’s trained on GitHub data, it’ll understand the\nprobabilities of sequences in source code really well). Another consequence is\nthat the model may generate statements that seem plausible, but are actually\njust random without being grounded in reality.\n\nAs language models become more accurate at predicting sequences, [many\nsurprising abilities\nemerge](https://www.assemblyai.com/blog/emergent-abilities-of-large-language-models/).\n\n[^1]: Language models actually use tokens, not words. A token roughly maps to a syllable in a word, or about 4 characters.\n[^2]: There are many different pruning and sampling strategies to alter the behavior and performance of the sequences.\n\n### A Brief, Incomplete, and Somewhat Incorrect History of Language Models\n\n> :pushpin: Skip [to here](#what-is-a-prompt) if you'd like to jump past the\n> history of language models. This section is for the curious minded, though\n> may also help you understand the reasoning behind the advice that follows.\n\n#### Pre-2000’s\n\n[Language models](https://en.wikipedia.org/wiki/Language_model#Model_types)\nhave existed for decades, though traditional language models (e.g. [n-gram\nmodels](https://en.wikipedia.org/wiki/N-gram_language_model)) have many\ndeficiencies in terms of an explosion of state space ([the curse of\ndimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality)) and\nworking with novel phrases that they’ve never seen (sparsity). Plainly, older\nlanguage models can generate text that vaguely resembles the statistics of\nhuman generated text, but there is no consistency within the output – and a\nreader will quickly realize it’s all gibberish. N-gram models also don’t scale\nto large values of N, so are inherently limited.\n\n#### Mid-2000’s\n\nIn 2007, Geoffrey Hinton – famous for popularizing backpropagation in 1980’s –\n[published an important advancement in training neural\nnetworks](http://www.cs.toronto.edu/~fritz/absps/tics.pdf) that unlocked much\ndeeper networks. Applying these simple deep neural networks to language\nmodeling helped alleviate some of problems with language models – they\nrepresented nuanced arbitrary concepts in a finite space and continuous way,\ngracefully handling sequences not seen in the training corpus. These simple\nneural networks learned the probabilities of their training corpus well, but\nthe output would statistically match the training data and generally not be\ncoherent relative to the input sequence. \n\n#### Early-2010’s\n\nAlthough they were first introduced in 1995, [Long Short-Term Memory (LSTM)\nNetworks](https://en.wikipedia.org/wiki/Long_short-term_memory) found their\ntime to shine in the 2010’s. LSTMs allowed models to process arbitrary length\nsequences and, importantly, alter their internal state dynamically as they\nprocessed the input to remember previous things they saw. This minor tweak led\nto remarkable improvements. In 2015, Andrej Karpathy [famously wrote about\ncreating a character-level\nlstm](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) that performed\nfar better than it had any right to.\n\nLSTMs have seemingly magical abilities, but struggle with long term\ndependencies. If you asked it to complete the sentence, “In France, we\ntraveled around, ate many pastries, drank lots of wine, ... lots more text ...\n, but never learned how to speak _______”, the model might struggle with\npredicting “French”. They also process input one token at a time, so are\ninherently sequential, slow to train, and the `Nth` token only knows about the\n`N - 1` tokens prior to it.\n\n#### Late-2010’s\n\nIn 2017, Google wrote a paper, [Attention Is All You\nNeed](https://arxiv.org/pdf/1706.03762.pdf), that introduced [Transformer\nNetworks](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model))\nand kicked off a massive revolution in natural language processing. Overnight,\nmachines could suddenly do tasks like translating between languages nearly as\ngood as (sometimes better than) humans. Transformers are highly parallelizable\nand introduce a mechanism, called “attention”, for the model to efficiently\nplace emphasis on specific parts of the input. Transformers analyze the entire\ninput all at once, in parallel, choosing which parts are most important and\ninfluential. Every output token is influenced by every input token.\n\nTransformers are highly parallelizable, efficient to train, and produce\nastounding results. A downside to transformers is that they have a fixed input\nand output size – the context window – and computation increases\nquadratically with the size of this window (in some cases, memory does as\nwell!) [^3].\n\nTransformers are not the end of the road, but the vast majority of recent\nimprovements in natural language processing have involved them. There is still\nabundant active research on various ways of implementing and applying them,\nsuch as [Amazon’s AlexaTM\n20B](https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning)\nwhich outperforms GPT-3 in a number of tasks and is an order of magnitude\nsmaller in its number of parameters.\n\n[^3]: There are more recent variations to make these more compute and memory efficient, but remains an active area of research.\n\n#### 2020’s\n\nWhile technically starting in 2018, the theme of the 2020’s has been\nGenerative Pre-Trained models – more famously known as GPT. One\nyear after the “Attention Is All You Need” paper, OpenAI released [Improving\nLanguage Understanding by Generative\nPre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf).\nThis paper established that you can train a large language model on a massive\nset of data without any specific agenda, and then once the model has learned\nthe general aspects of language, you can fine-tune it for specific tasks and\nquickly get state-of-the-art results.\n\nIn 2020, OpenAI followed up with their GPT-3 paper [Language Models are\nFew-Shot\nLearners](https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf),\nshowing that if you scale up GPT-like models by another factor of ~10x, in\nterms of number of parameters and quantity of training data, you no\nlonger have to fine-tune it for many tasks. The capabilities emerge naturally\nand you get state-of-the-art results via text interaction with the model.\n\nIn 2022, OpenAI followed-up on their GPT-3 accomplishments by releasing\n[InstructGPT](https://openai.com/research/instruction-following). The intent\nhere was to tweak the model to follow instructions, while also being less\ntoxic and biased in its outputs. The key ingredient here was [Reinforcement\nLearning from Human Feedback (RLHF)](https://arxiv.org/pdf/1706.03741.pdf), a\nconcept co-authored by Google and OpenAI in 2017[^4], which allows humans to\nbe in the training loop to fine-tune the model output to be more in line with\nhuman preferences. InstructGPT is the predecessor to the now famous\n[ChatGPT](https://en.wikipedia.org/wiki/ChatGPT).\n\nOpenAI has been a major contributor to large language models over the last few\nyears, including the most recent introduction of\n[GPT-4](https://cdn.openai.com/papers/gpt-4.pdf), but they are not alone. Meta\nhas introduced many open source large language models like\n[OPT](https://huggingface.co/facebook/opt-66b),\n[OPT-IML](https://huggingface.co/facebook/opt-iml-30b) (instruction tuned),\nand [LLaMa](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/).\nGoogle released models like\n[FLAN-T5](https://huggingface.co/google/flan-t5-xxl) and\n[BERT](https://huggingface.co/bert-base-uncased). And there is a huge open\nsource research community releasing models like\n[BLOOM](https://huggingface.co/bigscience/bloom) and\n[StableLM](https://github.com/stability-AI/stableLM/).\n\nProgress is now moving so swiftly that every few weeks the state-of-the-art is\nchanging or models that previously required clusters to run now run on\nRaspberry PIs.\n\n[^4]: 2017 was a big year for natural language processing.\n\n## What is a prompt?\n\nA prompt, sometimes referred to as context, is the text provided to a\nmodel before it begins generating output. It guides the model to explore a\nparticular area of what it has learned so that the output is relevant to your\ngoals. As an analogy, if you think of the language model as a source code\ninterpreter, then a prompt is the source code to be interpreted. Somewhat\namusingly, a language model will happily attempt to guess what source code\nwill do:\n\n<p align=\"center\">\n  <img width=\"450\" src=\"https://user-images.githubusercontent.com/89960/231946874-be91d3de-d773-4a6c-a4ea-21043bd5fc13.png\" title=\"The GPT-4 model interpreting Python code.\">\n</p>\n\nAnd it *almost* interprets the Python perfectly!\n\nFrequently, prompts will be an instruction or a question, like:\n\n <p align=\"center\">\n  <img width=\"500\" src=\"https://user-images.githubusercontent.com/89960/232413246-81db18dc-ef5b-4073-9827-77bd0317d031.png\">\n</p>\n\nOn the other hand, if you don’t specify a prompt, the model has no anchor to\nwork from and you’ll see that it just **randomly samples from anything it has\never consumed**:\n\n**From GPT-3-Davinci:**\n\n| ![image](https://user-images.githubusercontent.com/89960/232413846-70b05cd1-31b6-4977-93f0-20bf29af7132.png) | ![image](https://user-images.githubusercontent.com/89960/232413930-7d414dcd-87e5-431a-91c8-bb6e0ef54f42.png) | ![image](https://user-images.githubusercontent.com/89960/232413978-59c7f47d-ec20-4673-9458-85471a41fee0.png) |\n| --- | --- | --- |\n\n**From GPT-4:**\n| ![image](https://user-images.githubusercontent.com/89960/232414631-928955e5-3bab-4d57-b1d6-5e56f00ffda1.png) | ![image](https://user-images.githubusercontent.com/89960/232414678-e5b6d3f4-36c6-420f-b38f-2f9c8df391fb.png) | ![image](https://user-images.githubusercontent.com/89960/232414734-c8f09cad-aceb-4149-a28a-33675cde8011.png) |\n| --- | --- | --- |\n\n### Hidden Prompts\n\n> :warning: Always assume that any content in a hidden prompt can be seen by the user.\n\nIn applications where a user is interacting with a model dynamically, such as\nchatting with the model, there will typically be portions of the prompt that\nare never intended to be seen by the user. These hidden portions may occur\nanywhere, though there is almost always a hidden prompt at the start of a\nconversation.\n\nTypically, this includes an initial chunk of text that sets the tone, model\nconstraints, and goals, along with other dynamic information that is specific\nto the particular session – user name, location, time of day, etc...\n\nThe model is static and frozen at a point in time, so if you want it to know\ncurrent information, like the time or the weather, you must provide it.\n\nIf you’re using [the OpenAI Chat\nAPI](https://platform.openai.com/docs/guides/chat/introduction), they\ndelineate hidden prompt content by placing it in the `system` role.\n\nHere’s an example of a hidden prompt followed by interactions with the content\nin that prompt:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232416074-84ebcc10-2dfc-49e1-9f48-a240102877ee.png\" title=\" A very simple hidden prompt.\">\n</p>\n\nIn this example, you can see we explain to the bot the various roles, some\ncontext on the user, some dynamic data we want the bot to have access to, and\nthen guidance on how the bot should respond.\n\nIn practice, hidden prompts may be quite large. Here’s a larger prompt taken\nfrom a [ChatGPT command-line\nassistant](https://github.com/manno/chatgpt-linux-assistant/blob/main/system_prompt.txt):\n\n<details>\n  <summary>From: https://github.com/manno/chatgpt-linux-assistant </summary>\n\n```\nWe are a in a chatroom with 3 users. 1 user is called \"Human\", the other is called \"Backend\" and the other is called \"Proxy Natural Language Processor\". I will type what \"Human\" says and what \"Backend\" replies. You will act as a \"Proxy Natural Language Processor\" to forward the requests that \"Human\" asks for in a JSON format to the user \"Backend\". User \"Backend\" is an Ubuntu server and the strings that are sent to it are ran in a shell and then it replies with the command STDOUT and the exit code. The Ubuntu server is mine. When \"Backend\" replies with the STDOUT and exit code, you \"Proxy Natural Language Processor\" will parse and format that data into a simple English friendly way and send it to \"Human\". Here is an example:\n\nI ask as human:\nHuman: How many unedited videos are left?\nThen you send a command to the Backend:\nProxy Natural Language Processor: @Backend {\"command\":\"find ./Videos/Unedited/ -iname '*.mp4' | wc -l\"}\nThen the backend responds with the command STDOUT and exit code:\nBackend: {\"STDOUT\":\"5\", \"EXITCODE\":\"0\"}\nThen you reply to the user:\nProxy Natural Language Processor: @Human There are 5 unedited videos left.\n\nOnly reply what \"Proxy Natural Language Processor\" is supposed to say and nothing else. Not now nor in the future for any reason.\n\nAnother example:\n\nI ask as human:\nHuman: What is a PEM certificate?\nThen you send a command to the Backend:\nProxy Natural Language Processor: @Backend {\"command\":\"xdg-open 'https://en.wikipedia.org/wiki/Privacy-Enhanced_Mail'\"}\nThen the backend responds with the command STDOUT and exit code:\nBackend: {\"STDOUT\":\"\", \"EXITCODE\":\"0\"}\nThen you reply to the user:\nProxy Natural Language Processor: @Human I have opened a link which describes what a PEM certificate is.\n\n\nOnly reply what \"Proxy Natural Language Processor\" is supposed to say and nothing else. Not now nor in the future for any reason.\n\nDo NOT REPLY as Backend. DO NOT complete what Backend is supposed to reply. YOU ARE NOT TO COMPLETE what Backend is supposed to reply.\nAlso DO NOT give an explanation of what the command does or what the exit codes mean. DO NOT EVER, NOW OR IN THE FUTURE, REPLY AS BACKEND.\n\nOnly reply what \"Proxy Natural Language Processor\" is supposed to say and nothing else. Not now nor in the future for any reason.\n```\n</details>\n\nYou’ll see some good practices there, such as including lots of examples,\nrepetition for important behavioral aspects, constraining the replies, etc…\n\n> :warning: Always assume that any content in a hidden prompt can be seen by the user.\n\n### Tokens\n\nIf you thought tokens were :fire: in 2022, tokens in 2023 are on a whole\ndifferent plane of existence. The atomic unit of consumption for a language\nmodel is not a “word”, but rather a “token”. You can kind of think of tokens\nas syllables, and on average they work out to about 750 words per 1,000\ntokens. They represent many concepts beyond just alphabetical characters –\nsuch as punctuation, sentence boundaries, and the end of a document.\n\nHere’s an example of how GPT may tokenize a sequence:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232417569-8d562792-64b5-423d-a7a2-db7513dd4d61.png\" title=\"An example tokenization. You can experiment here: https://platform.openai.com/tokenizer \">\n</p>\n\nYou can experiment with a tokenizer here: [https://platform.openai.com/tokenizer](https://platform.openai.com/tokenizer)\n\nDifferent models will use different tokenizers with different levels of granularity. You could, in theory, just feed a model 0’s and 1’s – but then the model needs to learn the concept of characters from bits, and then the concept of words from characters, and so forth. Similarly, you could feed the model a stream of raw characters, but then the model needs to learn the concept of words, and punctuation, etc… and, in general, the models will perform worse.\n\nTo learn more, [Hugging Face has a wonderful introduction to tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) and why they need to exist.\n\nThere’s a lot of nuance around tokenization, such as vocabulary size or different languages treating sentence structure meaningfully different (e.g. words not being separated by spaces). Fortunately, language model APIs will almost always take raw text as input and tokenize it behind the scenes – *so you rarely need to think about tokens*.\n\n**Except for one important scenario, which we discuss next: token limits.**\n\n### Token Limits\n\nPrompts tend to be append-only, because you want the bot to have the entire context of previous messages in the conversation. Language models, in general, are stateless and won’t remember anything about previous requests to them, so you always need to include everything that it might need to know that is specific to the current session.\n\nA major downside of this is that the leading language model architecture, the Transformer, has a fixed input and output size – at a certain point the prompt can’t grow any larger. The total size of the prompt, sometimes referred to as the “context window”, is model dependent. For GPT-3, it is 4,096 tokens. For GPT-4, it is 8,192 tokens or 32,768 tokens depending on which variant you use.\n\nIf your context grows too large for the model, the most common tactic is the truncate the context in a sliding window fashion. If you think of a prompt as `hidden initialization prompt + messages[]`, usually the hidden prompt will remain unaltered, and the `messages[]` array will take the last N messages.\n\nYou may also see more clever tactics for prompt truncation – such as\ndiscarding only the user messages first, so that the bot's previous answers\nstay in the context for as long as possible, or asking an LLM to summarize the\nconversation and then replacing all of the messages with a single message\ncontaining that summary. There is no correct answer here and the solution will\ndepend on your application.\n\nImportantly, when truncating the context, you must truncate aggressively enough to **allow room for the response as well**. OpenAI’s token limits include both the length of the input and the length of the output. If your input to GPT-3 is 4,090 tokens, it can only generate 6 tokens in response.\n\n> 🧙‍♂️ If you’d like to count the number of tokens before sending the raw text to the model, the specific tokenizer to use will depend on which model you are using. OpenAI has a library called [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) that you can use with their models – though there is an important caveat that their internal tokenizer may vary slightly in count, and they may append other metadata, so consider this an approximation.\n> \n> If you’d like an approximation without having access to a tokenizer, `input.length / 4` will give a rough, but better than you’d expect, approximation for English inputs.\n\n### Prompt Hacking\n\nPrompt engineering and large language models are a fairly nascent field, so new ways to hack around them are being discovered every day. The two large classes of attacks are:\n\n1. Make the bot bypass any guidelines you have given it.\n2. Make the bot output hidden context that you didn’t intend for the user to see.\n\nThere are no known mechanisms to comprehensively stop these, so it is important that you assume the bot may do or say anything when interacting with an adversarial user. Fortunately, in practice, these are mostly cosmetic concerns.\n\nThink of prompts as a way to improve the normal user experience. **We design prompts so that normal users don’t stumble outside of our intended interactions – but always assume that a determined user will be able to bypass our prompt constraints.**\n\n#### Jailbreaks\n\nTypically hidden prompts will tell the bot to behave with a certain persona and focus on specific tasks or avoid certain words. It is generally safe to assume the bot will follow these guidelines for non-adversarial users, although non-adversarial users may accidentally bypass the guidelines too.\n\nFor  example, we can tell the bot:\n\n```\nYou are a helpful assistant, but you are never allowed to use the word \"computer\".\n```\n\nIf we then ask it a question about computers, it will refer to them as a “device used for computing” because it isn’t allowed to use the word “computer”.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232420043-ebe5bcf1-25d9-4a31-ba84-13e9e1f62de2.png\" title=\"GPT-4 trying hard to not say the word 'computer'.\">\n</p>\n\nIt will absolutely refuse to say the word:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232420306-6fcdd6e2-b107-45d5-a1ee-4132fbb5853e.png\">\n</p>\n\nBut we can bypass these instructions and get the model to happily use the word if we trick it by asking it to translate the pig latin version of “computer”.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232420600-56083a10-b382-46a7-be18-eb9c005b8371.png\">\n</p>\n\nThere are [a number of defensive measures](https://learnprompting.org/docs/prompt_hacking/defensive_measures/overview) you can take here, but typically the best bet is to reiterate your most important constraints as close to the end as possible. For the OpenAI chat API, this might mean including it as a `system` message after the last `user` message. Here’s an example:\n\n| ![image](https://user-images.githubusercontent.com/89960/232421097-adcaace3-0b21-4c1e-a5c8-46bb25faa2f7.png) | ![image](https://user-images.githubusercontent.com/89960/232421142-a47e75b4-5ff6-429d-9abd-a78dbc72466e.png) |\n| --- | --- |\n\nDespite OpenAI investing a lot into jailbreaks, there are [very clever work arounds](https://twitter.com/alexalbert__/status/1636488551817965568) being [shared every day](https://twitter.com/zswitten/status/1598088267789787136).\n\n#### Leaks\n\nIf you missed the previous warnings in this doc, **you should always assume that any data exposed to the language model will eventually be seen by the user**.\n\nAs part of constructing prompts, you will often embed a bunch of data in hidden prompts (a.k.a. system prompts). **The bot will happily relay this information to the user**:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232422860-731c1de2-9e77-4957-b257-b0bbda48558c.png\" title=\"The bot happily regurgitating the information it knows about the user.\">\n</p>\n\nEven if you instruct it not to reveal the information, and it obeys those instructions, there are millions of ways to leak data in the hidden prompt.\n\nHere we have an example where the bot should never mention my city, but a simple reframing of the question get’s it to spill the beans.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232423121-76568893-fa42-4ad8-b2bc-e1001327fa1e.png\" title=\"The bot refuses to reveal personal information, but we convince it to tell me what city I’m in regardless.\">\n</p>\n\nSimilarly, we get the bot to tell us what word it isn’t allowed to say without ever actually saying the word:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/232423283-1718f822-59d0-4d18-9a4d-22dd3a2672c0.png\" title=\"Technically, the bot never said 'computer', but I was still able to get it to tell me everything I needed to know about it.\">\n</p>\n\nYou should think of a hidden prompt as a means to make the user experience better or more inline with the persona you’re targeting. **Never place any information in a prompt that you wouldn’t visually render for someone to read on screen**.\n\n## Why do we need prompt engineering?\n\nUp above, we used an analogy of prompts as the “source code” that a language model “interprets”. **Prompt engineering is the art of writing prompts to get the language model to do what we want it to do** – just like software engineering is the art of writing source code to get computers to do what we want them to do.\n\nWhen writing good prompts, you have to account for the idiosyncrasies of the model(s) you’re working with. The strategies will vary with the complexity of the tasks. You’ll have to come up with mechanisms to constrain the model to achieve reliable results, incorporate dynamic data that the model can’t be trained on, account for limitations in the model’s training data, design around context limits, and many other dimensions.\n\nThere’s an old adage that computers will only do what you tell them to do. **Throw that advice out the window**. Prompt engineering inverts this wisdom. It’s like programming in natural language against a non-deterministic computer that will do anything that you haven’t guided it away from doing. \n\nThere are two broad buckets that prompt engineering approaches fall into.\n\n### Give a Bot a Fish\n\nThe “give a bot a fish” bucket is for scenarios when you can explicitly give the bot, in the hidden context, all of the information it needs to do whatever task is requested of it.\n\nFor example, if a user loaded up their dashboard and we wanted to show them a quick little friendly message about what task items they have outstanding, we could get the bot to summarize it as\n\n> You have 4 receipts/memos to upload. The most recent is from Target on March 5th, and the oldest is from Blink Fitness on January 17th. Thanks for staying on top of your expenses!\n\nby providing a list of the entire inbox and any other user context we’d like it to have.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233465165-e0c6b266-b347-4128-8eaa-73974e852e45.png\" title=\"GPT-3 summarizing a task inbox.\">\n</p>\n\nSimilarly, if you were helping a user book a trip, you could:\n\n- Ask the user their dates and destination.\n- Behind the scenes, search for flights and hotels.\n- Embed the flight and hotel search results in the hidden context.\n- Also embed the company’s travel policy in the hidden context.\n\nAnd then the bot will have real-time travel information + constraints that it\ncan use to answer questions for the user. Here’s an example of the bot\nrecommending options, and the user asking it to refine them:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233465425-9e06320c-b6d9-40ef-b5a4-c556861c1328.png\" title=\"GPT-4 helping a user book a trip.\">\n</p>\n<details>\n\n  <summary>(Full prompt)</summary>\n\n```\nBrex is a platform for managing business expenses. \n\nThe following is a travel expense policy on Brex:\n\n- Airline highest fare class for flights under 6 hours is economy.\n- Airline highest fare class for flights over 6 hours is premium economy.\n- Car rentals must have an average daily rate of $75 or under.\n- Lodging must have an average nightly rate of $400 or under.\n- Lodging must be rated 4 stars or higher.\n- Meals from restaurants, food delivery, grocery, bars & nightlife must be under $75\n- All other expenses must be under $5,000.\n- Reimbursements require review.\n\nThe hotel options are:\n| Hotel Name | Price | Reviews |\n| --- | --- | --- |\n| Hilton Financial District | $109/night | 3.9 stars |\n| Hotel VIA | $131/night | 4.4 stars |\n| Hyatt Place San Francisco | $186/night | 4.2 stars |\n| Hotel Zephyr | $119/night | 4.1 stars review |\n\nThe flight options are:\n| Airline | Flight Time | Duration | Number of Stops | Class | Price |\n| --- | --- | --- | --- | --- | --- |\n| United | 5:30am-7:37am | 2hr 7 min | Nonstop | Economy | $248 |\n| Delta | 1:20pm-3:36pm | 2hr 16 min | Nonstop | Economy | $248 |\n| Alaska | 9:50pm-11:58pm | 2hr 8 min | Nonstop | Premium | $512 |\n\nAn employee is booking travel to San Francisco for February 20th to February 25th.\n\nRecommend a hotel and flight that are in policy. Keep the recommendation concise, no longer than a sentence or two, but include pleasantries as though you are a friendly colleague helping me out:\n```\n \n</details>\n\nThis is the same approach that products like Microsoft Bing use to incorporate dynamic data. When you chat with Bing, it asks the bot to generate three search queries. Then they run three web searches and include the summarized results in the hidden context for the bot to use.\n\nSummarizing this section, the trick to making a good experience is to change the context dynamically in response to whatever the user is trying to do.\n\n> 🧙‍♂️ Giving a bot a fish is the most reliable way to ensure the bot gets a fish. You will get the most consistent and reliable results with this strategy. **Use this whenever you can.**\n\n#### Semantic Search\n\nIf you just need the bot to know a little more about the world, [a common approach is to perform a semantic search](https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb).\n\nA semantic search is oriented around a document embedding – which you can think of as a fixed-length array[^5] of numbers, where each number represents some aspect of the document (e.g. if it’s a science document, maybe the  843rd number is large, but if it’s an art document the 1,115th number is large – this is overly simplistic, but conveys the idea).[^6]\n\nIn addition to computing an embedding for a document, you can also compute an embedding for a user query using the same function. If the user asks “Why is the sky blue?” – you compute the embedding of that question and, in theory, this embedding will be more similar to embeddings of documents that mention the sky than embeddings that don’t talk about the sky.\n\nTo find documents related to the user query, you compute the embedding and then find the top-N documents that have the most similar embedding. Then we place these documents (or summaries of these documents) in the hidden context for the bot to reference.\n\nNotably, sometimes user queries are so short that the embedding isn’t particularly valuable. There is a clever technique described in [a paper published in December 2022](https://arxiv.org/pdf/2212.10496.pdf) called a “Hypothetical Document Embedding” or HyDE. Using this technique, you ask the model to generate a hypothetical document in response to the user’s query, and then compute the embedding for this generated document. The model  fabricates a document out of thin air – but the approach works!\n\nThe HyDE technique uses more calls to the model, but for many use cases has notable boosts in results.\n\n[^5]: Usually referred to as a vector.\n[^6]: The vector features are learned automatically, and the specific values aren’t directly interpretable by a human without some effort.\n\n### Teach a Bot to Fish\n\nSometimes you’ll want the bot to have the capability to perform actions on the user’s behalf, like adding a memo to a receipt or plotting a chart. Or perhaps we want it to retrieve data in more nuanced ways than semantic search would allow for, like retrieving the past 90 days of expenses.\n\nIn these scenarios, we need to teach the bot how to fish.\n\n#### Command Grammars\n\nWe can give the bot a list of commands for our system to interpret, along with descriptions and examples for the commands, and then have it produce programs composed of those commands.\n\nThere are many caveats to consider when going with this approach. With complex command grammars, the bot will tend to hallucinate commands or arguments that could plausibly exist, but don’t actually. The art to getting this right is enumerating commands that have relatively high levels of abstraction, while giving the bot sufficient flexibility to compose them in novel and useful ways.\n\nFor example, giving the bot a `plot-the-last-90-days-of-expenses` command is not particularly flexible or composable in what the bot can do with it. Similarly, a `draw-pixel-at-x-y [x] [y] [rgb]` command would be far too low-level. But giving the bot a `plot-expenses` and `list-expenses` command provides some good primitives that the bot has some flexibility with.\n\nIn an example below, we use this list of commands:\n\n| Command | Arguments | Description |\n| --- | --- | --- |\n| list-expenses | budget | Returns a list of expenses for a given budget |\n| converse | message | A message to show to the user |\n| plot-expenses | expenses[] | Plots a list of expenses |\n| get-budget-by-name | budget_name | Retrieves a budget by name |\n| list-budgets | | Returns a list of budgets the user has access to |\n| add-memo | inbox_item_id, memo message | Adds a memo to the provided inbox item |\n\nWe provide this table to the model in Markdown format, which the language model handles incredibly well – presumably because OpenAI trains heavily on data from GitHub.\n\nIn this example below, we ask the model to output the commands in [reverse polish notation](https://en.wikipedia.org/wiki/Reverse_Polish_notation)[^7].\n\n[^7]: The model handles the simplicity of RPN astoundingly well.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233505150-aef4409c-03ba-4669-95d7-6c48f3c2c3ea.png\" title=\"A bot happily generating commands to run in response to user queries.\">\n</p>\n\n> 🧠 There are some interesting subtle things going on in that example, beyond just command generation. When we ask it to add a memo to the “shake shack” expense, the model knows that the command `add-memo` takes an expense ID. But we never tell it the expense ID, so it looks up “Shake Shack” in the table of expenses we provided it, then grabs the ID from the corresponding ID column, and then uses that as an argument to `add-memo`.\n\nGetting command grammars working reliably in complex situations can be tricky. The best levers we have here are to provide lots of descriptions, and as **many examples** of usage as we can. Large language models are [few-shot learners](https://en.wikipedia.org/wiki/Few-shot_learning_(natural_language_processing)), meaning that they can learn a new task by being provided just a few examples. In general, the more examples you provide the better off you’ll be – but that also eats into your token budget, so it’s a balance.\n\nHere’s a more complex example, with the output specified in JSON instead of RPN. And we use Typescript to define the return types of commands.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233505696-fc440931-9baf-4d06-80e7-54801532d63f.png\" title=\"A bot happily generating commands to run in response to user queries.\">\n</p>\n\n<details>\n\n  <summary>(Full prompt)</summary>\n  \n~~~\nYou are a financial assistant working at Brex, but you are also an expert programmer.\n\nI am a customer of Brex.\n\nYou are to answer my questions by composing a series of commands.\n\nThe output types are:\n\n```typescript\ntype LinkedAccount = {\n    id: string,\n    bank_details: {\n        name: string,\n        type: string,\n    },\n    brex_account_id: string,\n    last_four: string,\n    available_balance: {\n        amount: number,\n        as_of_date: Date,\n    },\n    current_balance: {\n            amount: number,\n        as_of_date: Date,\n    },\n}\n\ntype Expense = {\n  id: string,\n  memo: string,\n  amount: number,\n}\n\ntype Budget = {\n  id: string,\n  name: string,\n  description: string,\n  limit: {\n    amount: number,\n    currency: string,\n  }\n}\n```\n\nThe commands you have available are:\n\n| Command | Arguments | Description | Output Format |\n| --- | --- | --- | --- |\n| nth | index, values[] | Return the nth item from an array | any |\n| push | value | Adds a value to the stack to be consumed by a future command | any |\n| value | key, object | Returns the value associated with a key | any |\n| values | key, object[] | Returns an array of values pulled from the corresponding key in array of objects | any[] |\n| sum | value[] | Sums an array of numbers | number |\n| plot | title, values[] | Plots the set of values in a chart with the given title | Plot |\n| list-linked-accounts |  | \"Lists all bank connections that are eligible to make ACH transfers to Brex cash account\" | LinkedAccount[] |\n| list-expenses | budget_id | Given a budget id, returns the list of expenses for it | Expense[]\n| get-budget-by-name | name | Given a name, returns the budget | Budget |\n| add-memo | expense_id, message | Adds a memo to an expense | bool |\n| converse | message | Send the user a message | null |\n\nOnly respond with commands.\n\nOutput the commands in JSON as an abstract syntax tree.\n\nIMPORTANT - Only respond with a program. Do not respond with any text that isn't part of a program. Do not write prose, even if instructed. Do not explain yourself.\n\nYou can only generate commands, but you are an expert at generating commands.\n~~~\n\n</details>\n\nThis version is a bit easier to parse and interpret if your language of choice has a `JSON.parse` function.\n\n> 🧙‍♂️ There is no industry established best format for defining a DSL for the model to generate programs. So consider this an area of active research. You will bump into limits. And as we overcome these limits, we may discover more optimal ways of defining commands.\n\n#### ReAct\n\nIn March of 2023, Princeton and Google released a paper “[ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/pdf/2210.03629.pdf)”, where they introduce a variant of command grammars that allows for fully autonomous interactive execution of actions and retrieval of data.\n\nThe model is instructed to return a `thought` and an `action` that it would like to perform. Another agent (e.g. our client) then performs the `action` and returns it to the model as an `observation`. The model will then loop to return more thoughts and actions until it returns an `answer`.\n\nThis is an incredibly powerful technique, effectively allowing the bot to be its own research assistant and possibly take actions on behalf of the user. Combined with a powerful command grammar, the bot should rapidly be able to answer a massive set of user requests.\n\nIn this example, we give the model a small set of commands related to getting employee data and searching wikipedia:\n\n| Command | Arguments | Description |\n| --- | --- | --- |\n| find_employee | name | Retrieves an employee by name |\n| get_employee | id | Retrieves an employee by ID |\n| get_location | id | Retrieves a location by ID |\n| get_reports | employee_id | Retrieves a list of employee ids that report to the employee associated with employee_id. |\n| wikipedia | article | Retrieves a wikipedia article on a topic. |\n\nWe then ask the bot a simple question, “Is my manager famous?”.\n\nWe see that the bot:\n\n1. First looks up our employee profile.\n2. From our profile, gets our manager’s id and looks up their profile.\n3. Extracts our manager’s name and searches for them on Wikipedia.\n    - I chose a fictional character for the manager in this scenario.\n4. The bot reads the wikipedia article and concludes that can’t be my manager since it is a fictional character.\n5. The bot then modifies its search to include (real person).\n6. Seeing that there are no results, the bot concludes that my manager is not famous.\n\n| ![image](https://user-images.githubusercontent.com/89960/233506839-5c8b2d77-1d78-464d-bc33-a725e12f2624.png) | ![image](https://user-images.githubusercontent.com/89960/233506870-05fc415d-efa2-48b7-aad9-b5035e535e6d.png) |\n| --- | --- |\n\n<details>\n<summary>(Full prompt)</summary>\n\n~~~\nYou are a helpful assistant. You run in a loop, seeking additional information to answer a user's question until you are able to answer the question.\n\nToday is June 1, 2025. My name is Fabian Seacaster. My employee ID is 82442.\n\nThe commands to seek information are:\n\n| Command | Arguments | Description |\n| --- | --- | --- |\n| find_employee | name | Retrieves an employee by name |\n| get_employee | id | Retrieves an employee by ID |\n| get_location | id | Retrieves a location by ID |\n| get_reports | employee_id | Retrieves a list of employee ids that report to the employee associated with `employee_id`. |\n| wikipedia | article | Retrieves a wikipedia article on a topic. |\n\nYour response will be in JSON and will include a \"Thought\" + \"Action\" to retrieve data that you need in order to answer the question, or it will include the \"Answer\". When data has been retrieved, it will be included as an \"Observation\".\n\nYou will continue generating thoughts and actions until you get to an answer, or conclude that you can't.\n\nExample 1:\n```\nUser: What is the population of Philadelphia?\n\nAssistant: {\n  \"thought\": \"Wikipedia likely has this information. I'm looking it up...\",\n  \"action\": {\"command\": \"wikipedia\", \"article\": \"Philadelphia\"}\n}\n\nAssistant: {\n  \"observation\": \"Philadelphia, often called Philly, is the largest city in the Commonwealth of Pennsylvania and the second-largest city in both the Northeast megalopolis and Mid-Atlantic regions after New York City. It is one of the most historically significant cities in the United States and served as the nation's capital city until 1800. Philadelphia is the nation's sixth-largest city with a population of 1,603,797 as of the 2020 census. Since 1854, the city has been coextensive with Philadelphia County, the most populous county in Pennsylvania and the urban core of the Delaware Valley, the nation's seventh-largest and one of the world's largest metropolitan regions with 6.245 million residents in 2020. Philadelphia is known for its extensive contributions to American history and for its role in the life sciences, business and industry, art, literature, and music.\"\n}\n\nAssistant: {\n  \"answer\": \"The population of Philadelphia, as of the 2020 census, is 1,603,797 people.\"\n}\n```\n\nExample 2:\n```\nUser: What is my manager's city?\n\nAssistant: {\n  \"thought\": \"Let me look up who your manager is. First, I need to look up your profile.\",\n  \"action\": {\"command\": \"get_employee\", \"id\": 92352}\n}\n\nAssistant: {\n  \"observation\": {\n    \"id\": 78334,\n    \"name\": \"Ms. Manager\",\n    \"location_id\": 8832\n  }\n}\n\nAssistant: {\n  \"thought\": \"Your manager is Ms. Manager. I'm looking up their location.\",\n  \"action\": {\"command\": \"get_location\", \"id\": 8832}\n}\n\nAssistant: {\n  \"observation\": {\n    \"id\": 8832,\n    \"name\": \"Philadelphia\"\n  }\n}\n\nAssistant: {\n  \"answer\": \"Your manager lives in Philadelphia.\"\n}\n```\n~~~\n</details>\n\n#### GPT-4 vs GPT-3.5\n\nIn most of the examples in this doc, the difference between GPT-3.5 and GPT-4 is negligible, but for “teaching a bot to fish” scenarios the difference between the models is notable.\n\nNone of the above examples of command grammars, for example, work without meaningful modifications for GPT-3.5. At a minimum, you have to provide a number of examples (at least one usage example per command) before you get any reasonable results. And, for complex sets of commands, it may hallucinate new commands or create fictional arguments.\n\nWith a sufficiently thorough hidden prompt, you should be able to overcome these limitations. GPT-4 is capable of far more consistent and complex logic with far simpler prompts (and can get by with zero or  small numbers of examples – though it is always beneficial to include as many as possible).\n\n## Strategies\n\nThis section contains examples and strategies for specific needs or problems. For successful prompt engineering, you will need to combine some subset of all of the strategies enumerated in this document. Don’t be afraid to mix and match things – or invent your own approaches.\n\n### Embedding Data\n\nIn hidden contexts, you’ll frequently want to embed all sorts of data. The specific strategy will vary depending on the type and quantity of data you are embedding.\n\n#### Simple Lists\n\nFor one-off objects, enumerating fields + values in a normal bulleted list works pretty well:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507156-0bdbc0af-d977-44e0-a8d5-b30538c5bbd9.png\" title=\"GPT-4 extracting Steve’s occupation from a list attributes.\">\n</p>\n\nIt will also work for larger sets of things, but there are other formats for lists of data that GPT handles more reliably. Regardless, here’s an example:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507223-9cda591e-62f3-4339-b227-a07c37b90724.png\" title=\"GPT-4 answering questions about a set of expenses.\">\n</p>\n\n#### Markdown Tables\n\nMarkdown tables are great for scenarios where you have many items of the same type to enumerate.\n\nFortunately, OpenAI’s models are exceptionally good at working with Markdown tables (presumably from the tons of GitHub data they’ve trained on).\n\nWe can reframe the above using Markdown tables instead:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507313-7ccd825c-71b9-46d3-80c9-30bf97a8e090.png\" title=\"GPT-4 answering questions about a set of expenses from a Markdown table.\">\n</p>\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507395-b8ecb641-726c-4e57-b85e-13f6b7717f22.png\" title=\"GPT-4 answering questions about a set of expenses from a Markdown table.\">\n</p>\n\n> 🧠 Note that in this last example, the items in the table have an explicit date, February 2nd. In our question, we asked about “today”. And earlier in the prompt we mentioned that today was Feb 2. The model correctly handled the transitive inference – converting “today” to “February 2nd” and then looking up “February 2nd” in the table.\n\n#### JSON\n\nMarkdown tables work really well for many use cases and should be preferred due to their density and ability for the model to handle them reliably, but you may run into scenarios where you have many columns and the model struggles with it or every item has some custom attributes and it doesn’t make sense to have dozens of columns of empty data.\n\nIn these scenarios, JSON is another format that the model handles really well. The close proximity of `keys` to their `values` makes it easy for the model to keep the mapping straight.\n\nHere is the same example from the Markdown table, but with JSON instead:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507559-26e6615d-4896-4a2c-b6ff-44cbd7d349dc.png\" title=\"GPT-4 answering questions about a set of expenses from a JSON blob.\">\n</p>\n\n#### Freeform Text\n\nOccasionally you’ll want to include freeform text in a prompt that you would like to delineate from the rest of the prompt – such as embedding a document for the bot to reference. In these scenarios, surrounding the document with triple backticks, ```, works well[^8].\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507684-93222728-e216-47b4-8554-04acf9ec6201.png\" title=\"GPT-4 answering questions about a set of expenses from a JSON blob.\">\n</p>\n\n[^8]: A good rule of thumb for anything you’re doing in prompts is to lean heavily on things the model would have learned from GitHub.\n\n#### Nested Data\n\nNot all data is flat and linear. Sometimes you’ll need to embed data that is nested or has relations to other data. In these scenarios, lean on `JSON`:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507758-7baffcaa-647b-4869-9cfb-a7cf8849c453.png\" title=\"GPT-4 handles nested JSON very reliably.\">\n</p>\n\n<details>\n<summary>(Full prompt)</summary>\n\n~~~\nYou are a helpful assistant. You answer questions about users. Here is what you know about them:\n\n{\n  \"users\": [\n    {\n      \"id\": 1,\n      \"name\": \"John Doe\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"123 Main St\",\n          \"city\": \"Anytown\",\n          \"state\": \"CA\",\n          \"zip\": \"12345\"\n        },\n        \"phone\": \"555-555-1234\",\n        \"email\": \"johndoe@example.com\"\n      }\n    },\n    {\n      \"id\": 2,\n      \"name\": \"Jane Smith\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"456 Elm St\",\n          \"city\": \"Sometown\",\n          \"state\": \"TX\",\n          \"zip\": \"54321\"\n        },\n        \"phone\": \"555-555-5678\",\n        \"email\": \"janesmith@example.com\"\n      }\n    },\n    {\n      \"id\": 3,\n      \"name\": \"Alice Johnson\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"789 Oak St\",\n          \"city\": \"Othertown\",\n          \"state\": \"NY\",\n          \"zip\": \"67890\"\n        },\n        \"phone\": \"555-555-2468\",\n        \"email\": \"alicejohnson@example.com\"\n      }\n    },\n    {\n      \"id\": 4,\n      \"name\": \"Bob Williams\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"135 Maple St\",\n          \"city\": \"Thistown\",\n          \"state\": \"FL\",\n          \"zip\": \"98765\"\n        },\n        \"phone\": \"555-555-8642\",\n        \"email\": \"bobwilliams@example.com\"\n      }\n    },\n    {\n      \"id\": 5,\n      \"name\": \"Charlie Brown\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"246 Pine St\",\n          \"city\": \"Thatstown\",\n          \"state\": \"WA\",\n          \"zip\": \"86420\"\n        },\n        \"phone\": \"555-555-7531\",\n        \"email\": \"charliebrown@example.com\"\n      }\n    },\n    {\n      \"id\": 6,\n      \"name\": \"Diane Davis\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"369 Willow St\",\n          \"city\": \"Sumtown\",\n          \"state\": \"CO\",\n          \"zip\": \"15980\"\n        },\n        \"phone\": \"555-555-9512\",\n        \"email\": \"dianedavis@example.com\"\n      }\n    },\n    {\n      \"id\": 7,\n      \"name\": \"Edward Martinez\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"482 Aspen St\",\n          \"city\": \"Newtown\",\n          \"state\": \"MI\",\n          \"zip\": \"35742\"\n        },\n        \"phone\": \"555-555-6813\",\n        \"email\": \"edwardmartinez@example.com\"\n      }\n    },\n    {\n      \"id\": 8,\n      \"name\": \"Fiona Taylor\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"531 Birch St\",\n          \"city\": \"Oldtown\",\n          \"state\": \"OH\",\n          \"zip\": \"85249\"\n        },\n        \"phone\": \"555-555-4268\",\n        \"email\": \"fionataylor@example.com\"\n      }\n    },\n    {\n      \"id\": 9,\n      \"name\": \"George Thompson\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"678 Cedar St\",\n          \"city\": \"Nexttown\",\n          \"state\": \"GA\",\n          \"zip\": \"74125\"\n        },\n        \"phone\": \"555-555-3142\",\n        \"email\": \"georgethompson@example.com\"\n      }\n    },\n    {\n      \"id\": 10,\n      \"name\": \"Helen White\",\n      \"contact\": {\n        \"address\": {\n          \"street\": \"852 Spruce St\",\n          \"city\": \"Lasttown\",\n          \"state\": \"VA\",\n          \"zip\": \"96321\"\n        },\n        \"phone\": \"555-555-7890\",\n        \"email\": \"helenwhite@example.com\"\n      }\n    }\n  ]\n}\n~~~\n</details>\n\nIf using nested `JSON` winds up being too verbose for your token budget, fallback to `relational tables` defined with `Markdown`:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233507968-a378587b-e468-4882-a1e8-678d9f3933d3.png\" title=\"GPT-4 handles relational tables pretty reliably too.\">\n</p>\n\n<details>\n<summary>(Full prompt)</summary>\n\n~~~\nYou are a helpful assistant. You answer questions about users. Here is what you know about them:\n\nTable 1: users\n| id (PK) | name          |\n|---------|---------------|\n| 1       | John Doe      |\n| 2       | Jane Smith    |\n| 3       | Alice Johnson |\n| 4       | Bob Williams  |\n| 5       | Charlie Brown |\n| 6       | Diane Davis   |\n| 7       | Edward Martinez |\n| 8       | Fiona Taylor  |\n| 9       | George Thompson |\n| 10      | Helen White   |\n\nTable 2: addresses\n| id (PK) | user_id (FK) | street      | city       | state | zip   |\n|---------|--------------|-------------|------------|-------|-------|\n| 1       | 1            | 123 Main St | Anytown    | CA    | 12345 |\n| 2       | 2            | 456 Elm St  | Sometown   | TX    | 54321 |\n| 3       | 3            | 789 Oak St  | Othertown  | NY    | 67890 |\n| 4       | 4            | 135 Maple St | Thistown  | FL    | 98765 |\n| 5       | 5            | 246 Pine St | Thatstown  | WA    | 86420 |\n| 6       | 6            | 369 Willow St | Sumtown  | CO    | 15980 |\n| 7       | 7            | 482 Aspen St | Newtown   | MI    | 35742 |\n| 8       | 8            | 531 Birch St | Oldtown   | OH    | 85249 |\n| 9       | 9            | 678 Cedar St | Nexttown  | GA    | 74125 |\n| 10      | 10           | 852 Spruce St | Lasttown | VA    | 96321 |\n\nTable 3: phone_numbers\n| id (PK) | user_id (FK) | phone       |\n|---------|--------------|-------------|\n| 1       | 1            | 555-555-1234 |\n| 2       | 2            | 555-555-5678 |\n| 3       | 3            | 555-555-2468 |\n| 4       | 4            | 555-555-8642 |\n| 5       | 5            | 555-555-7531 |\n| 6       | 6            | 555-555-9512 |\n| 7       | 7            | 555-555-6813 |\n| 8       | 8            | 555-555-4268 |\n| 9       | 9            | 555-555-3142 |\n| 10      | 10           | 555-555-7890 |\n\nTable 4: emails\n| id (PK) | user_id (FK) | email                 |\n|---------|--------------|-----------------------|\n| 1       | 1            | johndoe@example.com   |\n| 2       | 2            | janesmith@example.com |\n| 3       | 3            | alicejohnson@example.com |\n| 4       | 4            | bobwilliams@example.com |\n| 5       | 5            | charliebrown@example.com |\n| 6       | 6            | dianedavis@example.com |\n| 7       | 7            | edwardmartinez@example.com |\n| 8       | 8            | fionataylor@example.com |\n| 9       | 9            | georgethompson@example.com |\n| 10      | 10           | helenwhite@example.com |\n\nTable 5: cities\n| id (PK) | name         | state | population | median_income |\n|---------|--------------|-------|------------|---------------|\n| 1       | Anytown     | CA    | 50,000     | $70,000      |\n| 2       | Sometown    | TX    | 100,000    | $60,000      |\n| 3       | Othertown   | NY    | 25,000     | $80,000      |\n| 4       | Thistown    | FL    | 75,000     | $65,000      |\n| 5       | Thatstown   | WA    | 40,000     | $75,000      |\n| 6       | Sumtown     | CO    | 20,000     | $85,000      |\n| 7       | Newtown     | MI    | 60,000     | $55,000      |\n| 8       | Oldtown     | OH    | 30,000     | $70,000      |\n| 9       | Nexttown    | GA    | 15,000     | $90,000      |\n| 10      | Lasttown    | VA    | 10,000     | $100,000     |\n~~~\n\n</details>\n\n> 🧠 The model works well with data in [3rd normal form](https://en.wikipedia.org/wiki/Third_normal_form), but may struggle with too many joins. In experiments, it seems to do okay with at least three levels of nested joins. In the example above the model successfully joins from `users` to `addresses` to `cities` to infer the likely income for George – $90,000.\n\n### Citations\n\nFrequently, a natural language response isn’t sufficient on its own and you’ll want the model’s output to cite where it is getting data from. \n\nOne useful thing to note here is that anything you might want to cite should have a unique ID. The simplest approach is to just ask the model to link to anything it references:\n\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509069-1dcbffa2-8357-49b5-be43-9791f93bd0f8.png\" title=\"GPT-4 will reliably link to data if you ask it to.\">\n</p>\n\n### Programmatic Consumption\n\nBy default, language models output natural language text, but frequently we need to interact with this result in a programmatic way that goes beyond simply printing it out on screen. You can achieve this by  asking the model to output the results in your favorite serialization format (JSON and YAML seem to work best).\n\nMake sure you give the model an example of the output format you’d like. Building on our previous travel example above, we can augment our prompt to tell it:\n\n~~~\nProduce your output as JSON. The format should be:\n```\n{\n    message: \"The message to show the user\",\n    hotelId: 432,\n    flightId: 831\n}\n```\n\nDo not include the IDs in your message.\n~~~\n\nAnd now we’ll get interactions like this:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509174-be0c3bc5-08e3-4d1a-8841-52c401def770.png\" title=\"GPT-4 providing travel recommendations in an easy to work with format.\">\n</p>\n\nYou could imagine the UI for this rendering the message as normal text, but then also adding discrete buttons for booking the flight + hotel, or auto-filling a form for the user.\n\nAs another example, let’s build on the [citations](#citations) example – but move beyond Markdown links. We can ask it to produce JSON with a normal message along with a list of items used in the creation of that message. In this scenario you won’t know exactly where in the message the citations were leveraged, but you’ll know that they were used somewhere.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509280-59d9ff46-0e95-488a-b314-a7d2b7c9bfa3.png\" title=\"Asking the model to provide a list of citations is a reliable way to programmatically know what data the model leaned on in its response.\">\n</p>\n\n> 🧠 Interestingly, in the model’s response to “How much did I spend at Target?” it provides a single value, $188.16, but **importantly** in the `citations` array it lists the individual expenses that it used to compute that value.\n\n### Chain of Thought\n\nSometimes you will bang your head on a prompt trying to get the model to output reliable results, but, no matter what you do, it just won’t work. This will frequently happen when the bot’s final output requires intermediate thinking, but you ask the bot only for the output and nothing else.\n\nThe answer may surprise you: ask the bot to show its work. In October 2022, Google released a paper “[Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/pdf/2201.11903.pdf)” where they showed that if, in your hidden prompt, you give the bot examples of answering questions by showing your work, then when you ask the bot to answer something it will show its work and produce more reliable answers.\n\nJust a few weeks after that paper was published, at the end of October 2022, the University of Tokyo and Google released the paper “[Large Language Models are Zero-Shot Reasoners](https://openreview.net/pdf?id=e2TBb5y0yFf)”, where they show that you don’t even need to provide examples – **you simply have to ask the bot to think step-by-step**.\n\n#### Averaging\n\nHere is an example where we ask the bot to compute the average expense, excluding Target. The actual answer is $136.77 and the bot almost gets it correct with $136.43.\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509534-2b32c8dd-a1ee-42ea-82fb-4f84cfe7e9ba.png\" title=\"The model **almost** gets the average correct, but is a few cents off.\">\n</p>\n\nIf we simply add “Let’s think step-by-step”, the model gets the correct answer:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509608-6e53995b-668b-47f6-9b5e-67afad89f8bc.png\" title=\"When we ask the model to show its work, it gets the correct answer.\">\n</p>\n\n#### Interpreting Code\n\nLet’s revisit the Python example from earlier and apply chain-of-thought prompting to our question. As a reminder, when we asked the bot to evaluate the Python code it gets it slightly wrong. The correct answer is `Hello, Brex!!Brex!!Brex!!!` but the bot gets confused about the number of !'s to include. In below’s example, it outputs `Hello, Brex!!!Brex!!!Brex!!!`:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509724-8f3302f8-59eb-4d3b-8939-53d7f63b0299.png\" title=\"The bot almost interprets the Python code correctly, but is a little off.\">\n</p>\n\nIf we ask the bot to show its work, then it gets the correct answer:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509790-2a0f2189-d864-4d27-aacb-cfc936fad907.png\" title=\"The bot correctly interprets the Python code if you ask it to show its work.\">\n</p>\n\n#### Delimiters\n\nIn many scenarios, you may not want to show the end user all of the bot’s thinking and instead just want to show the final answer. You can ask the bot to delineate the final answer from its thinking. There are many ways to do this, but let’s use JSON to make it easy to parse:\n\n<p align=\"center\">\n  <img width=\"550\" src=\"https://user-images.githubusercontent.com/89960/233509865-4f3e7265-6645-4d43-8644-ecac5c0ca4a7.png\" title=\"The bot showing its work while also delimiting the final answer for easy extraction.\">\n</p>\n\nUsing Chain-of-Thought prompting will consume more tokens, resulting in increased price and latency, but the results are noticeably more reliable for many scenarios. It’s a valuable tool to use when you need the bot to do something complex and as reliably as possible.\n\n### Fine Tuning\n\nSometimes no matter what tricks you throw at the model, it just won’t do what you want it to do. In these scenarios you can **sometimes** fallback to fine-tuning. This should, in general, be a last resort.\n\n[Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning) is the process of taking an already trained model and then giving it thousands (or more) of example `input:output` pairs\n\nIt does not eliminate the need for hidden prompts, because you still need to embed dynamic data, but it may make the prompts smaller and more reliable.\n\n#### Downsides\n\nThere are many downsides to fine-tuning. If it is at all possible, take advantage of the nature of language models being [zero-shot, one-shot, and few-shot learners](https://en.wikipedia.org/wiki/Few-shot_learning_(natural_language_processing)) by teaching them to do something in their prompt rather than fine-tuning.\n\nSome of the downsides include:\n\n- **Not possible**: [GPT-3.5/GPT-4 isn’t fine tunable](https://platform.openai.com/docs/guides/chat/is-fine-tuning-available-for-gpt-3-5-turbo), which is the primary model / API we’ll be using, so we simply can’t lean in fine-tuning.\n- **Overhead**: Fine-tuning requires manually creating tons of data.\n- **Velocity**: The iteration loop becomes much slower – every time you want to add a new capability, instead of adding a few lines to a prompt, you need to create a bunch of fake data and then run the finetune process and then use the newly fine-tuned model.\n- **Cost**: It is up to 60x more expensive to use a fine-tuned GPT-3 model vs the stock `gpt-3.5-turbo` model. And it is 2x more expensive to use a fine-tuned GPT-3 model vs the stock GPT-4 model.\n\n> ⛔️ If you fine-tune a model, **never use real customer data**. Always use synthetic data. The model may memorize portions of the data you provide and may regurgitate private data to other users that shouldn’t be seeing it.\n>\n> If you never fine-tune a model, we don’t have to worry about accidentally leaking data into the model.\n\n## Additional Resources\n- :star2: [OpenAI Cookbook](https://github.com/openai/openai-cookbook) :star2:\n- :technologist: [Prompt Hacking](https://learnprompting.org/docs/category/-prompt-hacking) :technologist: \n- :books: [Dair.ai Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide) :books: \n"
  }
]