The Machinery of Learning
Machines are good at pattern-matching, but they need to be taught what to look for.
Anonymous
Chapter 1, “Ten Breakthroughs That Made Generative AI Possible,” walks through the major breakthroughs that made generative AI what it is today. Each new step forward was accompanied by new concepts and new terms that described new possibilities, new algorithms, new architectures and their components. Although machine learning and GenAI are still plowing full steam ahead and inventing new words and acronyms almost every day, the field has matured enough that key foundational concepts and terms can be organized into a well-defined taxonomy.
GenAI builds upon a rich legacy of machine learning methods, and large language models (LLMs) leverage tools and concepts that have been refined over decades. The terminology is somewhat complex, as common words like model or learning often carry different interpretations in academic research than they do in the media. In addition, terms have evolved and taken on new significance as new breakthroughs have redefined their usage. Mastering AI vocabulary is essential not only to deepen your grasp of GenAI techniques but so you can better understand where and how different approaches are used. In addition to clarifying key terminology and taxonomy, this chapter introduces the main categories of machine learning techniques: supervised, unsupervised, and reinforcement learning.
Types of Learning
All machine learning techniques share the same basic idea: A computer program learns and improves by observing or measuring data. This learning is then used to make predictions when new data is presented to the program. The process involves three key elements that work together: a model, a form of learning (training) that is applied to the model, and actions on the model once it is trained (inference). The type, or family, of AI or machine learning that you are working with depends on the goal of this process. This taxonomy can be a bit complex because technical breakthroughs are sometimes remembered as their own family categories, even though they are simply more efficient ways of achieving some well-known goals. However, the most common way to organize the machine learning families is to distinguish between supervised learning, unsupervised learning, and reinforcement learning techniques.
Supervised Learning
AI techniques are about prediction and probabilities. Just as in real life, predictions with AI can be about things we don’t know or that we don’t know yet. For example, we might want to predict the future price of a stock, the time it would take for a car to stop after pressing the brake pedal, or whether an image (a collection of pixels for a computer) represents a cat or a dog. In each case, an expert could use knowledge and experience to come up with a prediction. This is also how a machine learns to provide an answer: We provide a large amount of training data, representing the parameters available to the expert, along with the correct answer for each case. The machine then learns the relationships between the parameters and the correct answer in the data. This type of AI technique is known as supervised learning.
In a way, supervised learning is about teaching a machine how to predict the right answer to a future question based on past training data. Intuitively, the concept is similar to how we learn many topics at school. We see multiple examples and end up forming mental rules that help us recognize patterns in order to link the description of a problem to the right answer. The more we study, the better we get at answering questions.
Many AI techniques detailed in this book use supervised learning either directly or as part of a more complex process. For example, generative adversarial networks (GANs, described in Chapter 5, “Neural Network Architectures”), the early versions of DALL-E, and LLMs that predict and then generate words in a sentence all use supervised learning in some ways.
Supervised learning is broken down into two classes of algorithms: regression and classification. Regression focuses on predicting a continuous value, such as the price of a stock or a house (and takes its name from a common statistical technique used in this context). The learning process is similar to how humans learn: by observation. When you were young, you likely learned to distinguish between cats and dogs by seeing many examples over time and learning “what makes a cat a cat.” If you work in real estate, you have seen enough houses for sale to be able to determine the likely value of a house after inspection. AI systems learn the same way: Once they have been exposed to thousands of examples, they internalize the measurable features that are characteristic of a particular structure. They can tell what pixels are necessary for an image to be labeled a dog, and they can guess the likely price of a house. The key difference between how humans do this and how AI does is scale. AI systems can process millions of examples, based on which they can detect even very subtle patterns that might escape human notice.
Classification, on the other hand, is about determining whether a given input belongs to one category or another. The outcome can be binary (for example, yes or no) or drawn from a larger set of predefined classes. Consider a medical scenario, with a diagnosis about a tumor. There are usually only two answers of interest: malignant or benign. The goal is not to describe the tumor in detail but to classify it correctly based on observed patterns.
Like all other supervised learning, classification relies on labeled data that trains a system to learn relationships between inputs and known correct answers. The system examines many examples where the correct category is already known (baseball statistics, different flower species, or spam versus legitimate emails) and learns to identify the patterns that distinguish between these categories.
Sometimes, classification does not involve just predicting which group something belongs to but also defining the boundaries between groups. Boundaries between groups can be thought of as the borders between countries on a map (although in AI these borders are typically in a high-dimensional space). Techniques for identifying these boundaries are often named for the mathematical approaches they use, such as support vector machines (SVMs), Gaussian mixture models (GMMs), or linear discriminant analysis. These terms and methods are common in the GenAI field and are often employed in voice and speech generation to find a pattern that most closely matches samples of a person’s voice or text.
Unsupervised Learning
In some cases, the answer to a problem is not known in advance, and a prediction cannot be easily made. For example, imagine a botanist with an extensive collection of flower images who wants to uncover patterns that could help categorize the flowers into families. In a different field, a bank processing millions of transactions each day might want to detect whether a specific credit card charge is normal or potentially fraudulent. Or consider a streaming service that wants to analyze a user’s viewing history to recommend the next best movie. In these situations, even a domain expert might struggle to give a definitive answer, and when the problem scales to millions of flowers, transactions, or users, relying on human experts quickly becomes impractical and costly.
Training a machine to uncover such patterns is the natural solution. Unlike with earlier cases, the goal here is not to learn a fixed relationship between inputs and a known correct answer. Often, there is no single “correct” outcome; there are only patterns that emerge from the data. For example, with fraud detection, unusual transactions might signal fraud, but what counts as “unusual” can vary significantly from one person to another. This type of analysis involves examining multiple parameters to identify common trends and then flagging data points that deviate from those norms, which are known in AI as outliers.
This branch of artificial intelligence is called unsupervised learning because it involves letting the machine find patterns and deviations from common properties; it does not teach the machine “the right answer.” The learning is unsupervised because it does not rely on a dataset that includes parameters associated with a label or a value that indicates the correct answer (for example, a group of pixels and a label like “cat” or “dog”). Rather, the instruction from the human to the machine is merely to form clusters of parameters that seem to have roughly the same values and indicate when some entries are away from the cluster. The machine forms these groups without further supervision.
Forming clusters and finding elements that go well together is fundamental in many generative AI techniques. For image creation, techniques examined in Chapter 5, like variational autoencoders (VAEs) and stable diffusion, leverage clustering to learn common patterns in shapes. In this way, they learn that “whiskers” are series of lines away from a common center (a nose or snout). Large language models also commonly use unsupervised learning when they learn patterns in text. For example, if we give a model like BERT a segment of a sentence that includes “Apollo 11 landed on the…,” it will guess “Moon.” This guess is possible because the model learned through clustering that the terms “Apollo 11” and “Moon” commonly appear in a single sentence. LLMs also use unsupervised learning to group together words and sentences that have the same general meaning, and so they are efficient at summarizing and finding synonyms.
Reinforcement Learning
In some cases, we need machines to learn, but neither supervised nor unsupervised learning is applicable. This happens when the machine needs to learn a skill, such as playing chess or picking up an object of variable shape from a conveyor belt. The goal is not to find patterns or describe relationships between some parameters and a label but to interact with an environment in near real time and make movement decisions that have consequences. Make the wrong move, and you lose the chess game or crush the object on the conveyor belt.
One approach to teaching machines proper skills is to program all possible moves for all possible scenarios. Researchers tried this idea in many ways in the 20th century, only to conclude each time that it was not feasible: There are too many possibilities, requiring too much memory and processing power to be realistic…and, of course, there is a high likelihood of “this corner case we did not think about” showing up at the most critical moment. A better approach (discussed as one of the key breakthroughs in Chapter 1) is simply to try to teach the machine the way we teach humans: Try things and keep (that is, remember and then prefer next time) those that work. This technique is called reinforcement learning because each interaction with the world reinforces the learning: The attempt works or doesn’t, or it is better, simpler, or faster than the previous one, for example. Reinforcement learning is particularly powerful because it does not require a preexisting dataset to train a model. The dataset is literally generated in real time as the model explores its environment. This mechanism allows continuous training of the model, and it also allows the model to find optimal outcomes that could never be discovered through a prepared dataset.
For example, suppose a car manufacturer wants to build a self-driving system. With reinforcement learning, the manufacturer can implement a pilot in a virtual car in a simulated landscape and reward the algorithm when useful actions are taken (such as avoiding a pedestrian or stopping at a red light). Negative actions (such as hitting virtual walls and people) cause the algorithm to lose points. Even without specific driving instructions, the algorithm will soon learn the rules of the road, such as speed limits, priorities at intersections, and efficient parking techniques. The manufacturer can then deploy the algorithm in a real vehicle with a human assistant driver, whose corrective actions also serve as reinforcement inputs.
This approach is so unique that reinforcement learning forms its own unique family of learning. This approach is extremely practical for highly interactive applications and complex decision-making scenarios, where preexisting datasets, especially optimized ones, may not exist. It is also useful as a complement to other techniques. For example, each time you provide feedback to an LLM (with the thumbs up or down button) or to an image generator (“I prefer this image” or you simply stop asking for refinements), your input is fed back into the algorithm through reinforcement learning to teach the machine what you (as a particular person or as a user in general) really wanted and to make the model more efficient for the next query of the same type. This technique is called reinforcement learning from human feedback (RLHF) and is examined in detail in Chapter 12, “Fine-Tuning LLMs.” Large cloud LLM companies have entire teams dedicated to using RLHF to teach their models to behave in ways that are socially acceptable (for example, do not teach people how to make bombs even if they ask, prefer positive answers, do not denigrate or insult users).
