The Machine Learning Family Tree
The three families of learning we have discussed—supervised learning, unsupervised learning, and reinforcement learning—can be used to describe just about any technique in the field of machine learning, but you will find that many practitioners categorize the algorithms by goals instead of by learning families. Mapping the goals back to the families is useful to understand the technique that is most likely to be used by each approach, and Figure 2-1 provides such a map for the most common use cases.
An AI project often combines two or all three types of learning. For example, unsupervised learning is used during the pretraining phase for LLMs, where the model learns language patterns and semantic relationships. Supervised learning is then often employed during fine-tuning, where the model is trained on labeled datasets to perform specific tasks, such as summarization or question answering. Finally, reinforcement learning is used in the final stages to align the model’s behavior with human preferences, helping it respond in ways that appear more helpful, safe, and “human-like.”
Beyond LLMs, some language processing tools use supervised learning to transcribe sounds into syllables and then use unsupervised learning to group common sound and syllable structures to train on speech recognition. An online recommender system (for example, the next movie suggestion in your favorite streaming subscription service) might use a mix of supervised data (for example, rating and reviews) and unsupervised data (for example, click patterns) to come to conclusions on common points and make recommendations. Such hybrid techniques are unsurprisingly called semi-supervised learning. With this type of learning, a model is trained on a dataset that contains both labeled and unlabeled data, with the goal of leveraging the unlabeled data to improve performance.
The organization of machine learning families is well established. Unfortunately, many sources list families of machine learning method according to the technique, algorithm, or equations they use rather than organizing them by their goal. Worse, some sources mix everything together and list algorithms and goals as different families within the same structure. This type of subdivision can be confusing because different algorithms might achieve the same goal, making the taxonomy ambiguous. Some of the most confusing subdivisions also try to cram field names (like “natural language processing”) into the same structure. The result is a taxonomy of AI families that make the algorithms appears as a random collection of goals, techniques, and fields but that are listed as generally equivalent. With such mixing, an AI technique may appear with different names or in different “families,” as if the algorithm and its field of application have to be linked with a special name.
In this book, we have tried to avoid such confusion. However, you will see in the other chapters of this book that in many cases, a technique implements a particular variant of one of the three families, and it will be useful to get familiar with these variants. Figure 2-2 provides the names of the learning families, their goals, and some popular algorithms in each category.
Supervised learning techniques within the regression subfamily tend to be organized based on the type of relationships they explore. For example, when the relationship between an input and an output is directly proportional, it is referred to as linear regression. However, if the relationship is more complex, such as when modeling the link between many intricate parameters and a single outcome in the real world, it is called non-linear regression. GenAI techniques use both.
In some cases, regression techniques do not strictly involve numeric data but also use categorical data to guide decision-making. Consider predicting outcomes based on questionnaire responses with multiple-choice questions. In this scenario, the algorithm builds a tree structure by mapping all possible answers, allowing it to predict the most likely outcome based on a few key responses. This approach to predictive modeling based on tree structures is known as a decision tree. Trees are less common in GenAI. They are important for many other applications of machine learning, such as spam detection, risk assessment, loan approvals, and more.
The classification branch of supervised learning has a richer vocabulary than the regression branch, partly because there are many mathematical methods for determining whether two elements belong to the same category. One common approach is probabilistic learning, which relies on probability theory to make informed guesses about classification. This approach makes assumptions about the structure of a group and evaluates whether new data fits those assumptions. The field draws from the work of the 18th-century statistician Thomas Bayes, and many modern probabilistic techniques bear his name, such as Naïve Bayes, Bayesian networks, and Bayesian optimization. Although the math dates from the 18th century, the AI methods leveraging them remain highly relevant in today’s GenAI landscape.
In a supervised learning context, Naïve Bayes is often used to classify credit card transactions as safe or fraudulent. Probabilistic models are also widely used in generative applications, where the system doesn’t just classify existing content but generates new content based on patterns it has learned.
For example, consider a model trained on a large collection of short stories. Each story includes variables like genre, setting, main character type, plot arc, and ending style. A Bayesian network trained on this data can uncover how different variables interact. When asked to generate a new story, the model might determine that if the genre is “mystery,” there is a high probability that the main character would be a detective, and the plot would center around solving a crime. If the genre is “romance,” different narrative elements emerge with higher probability. Such inferred relationships form the essence of many generative models. The model does not just reproduce a story it was trained on; it assembles a new one by drawing from a network of statistically likely choices. The outcome may feel creative, but it is based on learned probabilities.
This probabilistic approach also extends beyond text. In image generation, if a character is described as wearing sunglasses, the model will infer with a high probability that the outdoor scene should be sunny. By learning such latent relationships, generative systems can produce outputs that feel coherent, even humanlike, without ever being explicitly programmed to do so.
Bayesian methods are not the dominant architecture behind modern LLMs or diffusion models, but they embody a critical concept in GenAI: Generative models rely on internal representations of patterns and probabilities.


