Table of Contents
- What Is Prompt Engineering for Generative AI? Key Techniques from Humanize by James Phoenix & Mike Taylor (ChatGPT + Midjourney)
- Recommendation
- Take-Aways
- Summary
- Follow basic principles to optimize your AI model.
- Learning from huge amounts of data, large language models (LLMs) can output human-like text.
- Specify context and experiment with different output formats to maximize ChatGPT’s results.
- “LangChain” can help you address more complex generative AI issues.
- Large language models can function as “autonomous agents.”
- Diffusion models are effective for generating images from text.
- For more creative control over image outputs, consider training your model for specific tasks.
- Using these prompting principles, you can build an exhaustive content-writing AI.
- About the Authors
What Is Prompt Engineering for Generative AI? Key Techniques from Humanize by James Phoenix & Mike Taylor (ChatGPT + Midjourney)
Learn practical prompt engineering methods to get reliable outputs from ChatGPT and diffusion image models. This review of Humanize by James Phoenix & Mike Taylor covers clarity, context, formats like JSON/YAML, LangChain, agents, and image prompting.
Keep reading for copy-and-paste prompt templates, formatting examples (JSON/YAML), and a simple checklist you can use today to get more accurate, consistent results from ChatGPT and image generators.
Recommendation
Generative artificial intelligence models like ChatGPT, trained as they are on large language or diffusion models, are on track to radically transform the way people live and work. But to get the accurate and relevant results you want and need from your AI model, you have to provide the right kinds of inputs — which is where prompt engineering skills come in. In this practical guide, packed with concrete examples, data scientist James Phoenix and generative AI instructor Mike Taylor teach the ins and outs of crafting text- and image-based prompts that will yield desirable outputs.
Take-Aways
- Follow basic principles to optimize your AI model.
- Learning from huge amounts of data, large language models (LLMs) can output human-like text.
- Specify context and experiment with different output formats to maximize ChatGPT’s results.
- “LangChain” can help you address more complex generative AI issues.
- Large language models can function as “autonomous agents.”
- Diffusion models are effective for generating images from text.
- For more creative control over image outputs, consider training your model for specific tasks.
- Using these prompting principles, you can build an exhaustive content-writing AI.
Summary
Follow basic principles to optimize your AI model.
Prompt engineering describes the techniques by which a user can develop prompts that spur an AI model like ChatGPT to generate a desirable output. Well-crafted prompts provide sets of instructions in text — either to large language models (LLMs) like ChatGPT or image-related diffusion AIs like Midjourney. The results of good prompt engineering will be substantial outputs. In general, LLMs are trained on essentially the entire internet but can be refined. Generic inputs will create predictable outputs, but carefully fashioned prompts can provoke more precise and compelling responses.
“There are many different ways to ask an AI model to do the same task, and even slight changes can make a big difference.”
Large language models essentially predict what happens next in a sequence — beginning with the prompt. Prompting — and prompt engineering, as an art and skill — observes basic principles:
- Provide clarity as to the kind of response you seek — If, for instance, you want a list of product names, tell the AI the category of the product and provide any additional context that can help boost output accuracy and relevance — for example, ask it to imitate a particular famous innovator or ask it to follow certain best practices for product names.
- Clarify the general format you have in mind — AI will only know the format in which to respond, whether that’s in ordinary language or a particular code, if you make it clear.
- Give specific examples — Providing more examples will increase output accuracy, but too many examples — particularly when they lack diversity — can hinder the creativity of the responses you receive.
- Assess the quality of the responses — Assessing responses can be more difficult when the requests are more subjective, or when they involve more technical issues, such as scientific questions. You might want to involve other AI models and other databases to evaluate the accuracy and potency of the response.
Learning from huge amounts of data, large language models (LLMs) can output human-like text.
The recent emphasis in AI research has been on large language models. LLMs, like ChatGPT, are able to learn from vast amounts of data. Those learnings are applicable to a variety of functions and capabilities, such as software development. Most prominently at this point, LLMs enable the generation of content in text form that is remarkably coherent, context-sensitive, and human-sounding .
“Text generation models utilize advanced algorithms to understand the meaning in text and produce outputs that are often indistinguishable from human work.”
Natural language processing and LLMs compress linguistic units of varying dimensions — they can be words or parts of words, phrases, or complete sentences — into “tokens.” The amount of data an AI model is trained on is relative to the number of tokens. Tokens can be fashioned by Byte-pair Encoding (BPE), which initially views text as a sequence of characters but then condenses frequently combined characters into individual tokens. Tokens can be assigned numbers or “vectors,” and transformer architecture articulates both the meaning and structure of the vectors. LLM models like ChatGPT are initially trained on massive amounts of data in order to instill a broad, flexible understanding of language. Then they are fine-tuned to adapt to more specialized areas and tasks.
Specify context and experiment with different output formats to maximize ChatGPT’s results.
Prompting techniques are everything, especially when it comes to generating text in appropriate formats from an LLM. You can work with ChatGPT in order to generate the text you want. For instance, start by asking ChatGPT to generate a list related to the topic at hand — such as a list of 20th-century visual artists or 21st-century mathematicians. Then, you might have to refine and narrow your search, as ChatGPT might provide lots of irrelevant names or unwanted commentary. If you want more detailed, nested results, like an outline for an article, requesting a hierarchical structure can be useful.
“As you work to extract more structured data from LLM responses, relying solely on regular expressions can make the control flow increasingly complicated.”
In order to address potential problems with LLM outputs, you might try one of several alternative output formats. Requests made in proper JSON (JavaScript Object Notation) format, for instance, can result in more elegant and straightforward responses. The LLM, however, may inadvertently add text if the application only works in a limited way with JSON. The serialization language YAML, on the other hand, tends to generate highly readable responses and is ideal for contexts in which you’re collaborating with others. ChatGPT can work with formats other than JSON and YAML as needed.
Advanced LLMs like ChatGPT-4 aren’t just mechanical text generators. Within a limited scope, they also have a certain amount of autonomous agency — and can make recommendations. If the model’s response is in some way inadequate, users may need to provide more context. You can make a habit of asking the AI what context it needs to answer your question then, once it responds, apply that information to your prompt to generate more accurate outputs.
“LangChain” can help you address more complex generative AI issues.
For many tasks, relatively straightforward prompt engineering approaches will work just fine. But for more complicated generative AI issues, you may need to equip yourself with something more specialized and powerful. This issue may come up if you’re looking for a precise book summary, attempting to construct an entire narrative, or if you’re asking the LLM to engage in sophisticated reasoning.
“To skillfully tackle such complex generative AI challenges, becoming acquainted with LangChain, an open source framework, is highly beneficial.”
LangChain is a software that helps integrate LLMs into other applications. It’s available with multiple packages, including Python. LangChain’s principal goal is to enable fluid interactions between LLMs and data sources and to provide the model with the capacity to engage with and shape its environment. As an overarching framework, LangChain makes it easier to work in multiple LLM models and indeed encourages experimentation with a variety of models, such as Anthropic, Vertex AI, and BedrockChat. Nonetheless, as your LLM applications grow larger, it’s worthwhile to use LangChain’s prompt templates. Unlike other formats, LangChain makes it easy to validate inputs, combine multiple prompts, and customize prompts in various ways.
Large language models can function as “autonomous agents.”
As AI evolves, it’s important for large language models to engage, essentially think through, and solve complicated problems in an autonomous way. Indeed, the capacity of LLMs to work through complex problems is crucial for creating viable and useful new applications. LLMs can address complex problems through “chain-of-thought reasoning (CoT)” in which multifaceted problems are broken down into smaller constituent parts that are simpler to deal with. Chain-of-thought reasoning will most likely involve the LLM explaining its individual recommendations and strategizing steps for arriving at a conclusion. It’s especially important for the LLM to move through the steps necessary in a deliberate and meticulous way, one after the next, before arriving at a conclusion. As always, adequate context is necessary if you want a useful response.
“Generative AI models have given rise to an agent-based architecture.”
Agents perceive their environment and act or make decisions in pursuit of specific goals within that environment. An LLM agent will have a given set of inputs — such as text, images, or audio files — a goal, defined rewards for achieving the goal, and actions that are permitted. In the case of self-driving cars, for instance, the input is the car’s myriad sensors providing streaming data about the environment it’s moving through, the goal is arriving at a destination safely and observing driving laws, and the permitted actions are acceleration, deceleration, turning, and so forth, all within the context of safety and the law.
Diffusion models are effective for generating images from text.
Diffusion models are generative AI models that are especially adept at generating high-quality images from text inputs. Examples of diffusion models include DALL-E 3, which has been integrated into ChatGPT, open source models like Stable Diffusion, and Midjourney. In a process derived from particle physics, diffusion models are trained by introducing “noise” into images and then, based on an image description, predicting how to remove the noise. The model basically takes the noise and turns it into an image that matches the verbal description.
“Within the domain of diffusion models, prompt engineering can be seen as navigating the latent space, searching for an image that matches your vision, out of all the possible images available.”
In fact, diffusion models are trained on massive data sets derived from the internet, and for that reason, they can imitate most artistic styles. This feature of diffusion models has inspired alarm among people concerned with copyright infringement, but the images aren’t literal imitations of images or styles — they are derived from patterns detected among a vast array of images.
Based on a modification of ChatGPT-3, OpenAI’s DALL-E was revolutionary, but it wasn’t open source and had built-in limitations — initially, you weren’t allowed to generate images with people in them, and there were sensitive words banned as prompts. Released shortly after DALL-E 2, Midjourney, by contrast, was open source from the beginning. And Midjourney was community driven. Most prompts and generated images were shared on an open channel, available to other users to learn from. However, at this point, most of the focus on AI image generation is likely to move away from text-to-image generation toward text-to-video and image-to-video generation.
For more creative control over image outputs, consider training your model for specific tasks.
The best way to prompt diffusion models to get the image you’re looking for will depend upon the model you’re working with: Every model has its own idiosyncrasies, depending on its structure and the data it was trained on. Many artists and researchers have worked with the various models to develop reliable prompting tools.
The most rudimentary, first-level approach to AI image generation is to specify the desired image format. Generative AI image models can produce a number of image formats — stock press photos, for instance, or traditional oil paintings. And you can prompt the AI to apply any number of well-known artistic techniques to a generated image, altering how it looks by, say, making it appear to have visible brush strokes. AI image models can essentially replicate just about any known art style, whether that’s Baroque, Impressionism, or Cubism. Given copyright issues, people should be mindful when generating images based on the style of a living artist. For those with limited knowledge of art and art history, Midcentury allows you to reverse engineer a prompt from an image: You upload a sample image and ask the AI to describe it to you. You can then use that information to craft a prompt to create another image in the sample’s style.
“Most work with AI images only requires simple prompt engineering techniques, but there are more powerful tools available.”
Since it’s an open-source image generation model, you can run Stable Diffusion on your computer for free. Indeed, you can run it locally or in the cloud, and you can also customize it to better suit your needs. But the coding involved in such customization can be complicated and is best only undertaken by very advanced users. For serious Stable Diffusion users, the web user interface AUTOMATIC1111 can be especially appealing because it opens people up to the broader Stable Diffusion open-source community, which has invested itself in integrating advanced features. In addition, AUTOMATIC1111 makes it possible to elevate images to a higher resolution level, but with significant controls. Finally, using DreamBooth, you can fine-tune your model so it can understand unfamiliar ideas and concepts that appear in the training data.
Using these prompting principles, you can build an exhaustive content-writing AI.
If you want an AI-generated blog post on a particular topic, it would be easy enough to just prompt ChatGPT to generate one. You would be likely to get back something coherent but lacking any singular or illuminating point of view. What you likely want, however, is an AI-generated blog post that mimics your writing style and tone, and that articulates roughly the way you see the world. For that, you’ll need a substantial prompt, at a minimum, one that describes the appropriate writing tone and provides keywords.
“Now that you have your script working end to end, you probably want to make it a little easier to work with, and maybe even get it into the hands of people who can give you feedback.”
You should say something specific about the tone and the format you want, though detailed instructions may be ignored by ChatGPT. In this instance, you’re “blind prompting,” so it’s difficult for the model to evaluate its own quality. Indeed, without breaking the prompt into sub-sections, it’s hard to evaluate success at all. Providing at least one example (“one-shot” prompting) is likely to dramatically improve your blog’s quality. Nonetheless, using LLM components realized with LangChain, it’s possible to create a system that generates content that sounds like you and expresses your point of view. This will involve a detailed interview of the LLM, an outline of the blog post, generating the text — and real attention to the writing style. And you might want to add images, too.
About the Authors
James Phoenix has taught more than 60 data science courses at General Assembly and through his company, Vexpower. Mike Taylor created the international marketing agency Ladder and also teaches generative AI through Vexpower.