Video accessible from your Account page after purchase.
10+ Hours of Video Instruction
Overview
Learn how modern LLMs power and perform natural language processing.
Introduction to Transformer Models for NLP, Second Edition is designed to provide you with a deep understanding of how modern LLMs process natural language.
Learn How To
Who Should Take This Course
Course Requirements
Lesson Descriptions
Lesson 1, Introduction to Attention and Language Models: Lesson 1 lays the groundwork for the entire course by tracing the path from traditional NLP to the attention revolution. Sinan explores how attention mechanisms solve the bottlenecks of earlier architectures. He also examines the encoder-decoder framework that enabled sequence-to-sequence learning and understand how language models fundamentally process text.
Lesson 2, How Transformers Use Attention to Process Text: In Lesson 2, Sinan dives into the mechanics that power modern LLMs. He breaks down tokenization and embeddings, how text becomes numbers, and then explores scaled dot product attention, the elegant computation at the transformers core. You learn how multi-headed attention can capture diverse relationships and how masked attention mechanisms enable autoregressive generation and the modern advancements, pushing attention mechanisms even further.
Lesson 3, LLM Pre-Training and Recipes: Lesson 3 examines how raw neural networks become capable language models. Sinan walks you through the three-phase training pipeline and then contrasts BERTs masked language modeling with GPTs next token prediction, two fundamentally different approaches that shape the field of natural language processing. Youll trace BERTs evolution to modern BERT and explore the scaling laws that guide todays industry toward frontier systems.
Lesson 4, Natural Language Generation with GPT and More: This lesson maps the generative LLM landscape. Sinan surveys the closed source leaders including GPT, Claude, and Gemini, examining their capabilities, pricing, and positioning. Youll then explore the open weight ecosystem, Lama, Qwen, DeepSeek, Mistral, Kimi, and others that have rapidly closed the gap in performance. You will leave with the right context to choose the correct AI for your applications.
Lesson 5, Prompt Engineering as Craft: This lesson elevates prompting from casual instruction writing to systematic practice. Sinan explores why prompting remains underrated, even as models improve. Then he covers structured outputs and prompt chaining for complex workflows. You'll master chain of thought and few shot techniques and learn how inference parameters like temperature and top-p shape model behavior. Prompting well is a skill.
Lesson 6, Alignment and Post-Training: Lesson 6 explores how base models become helpful assistants. Sinan examines supervised training on instruction data and then dives deep into RLHF, the original preference optimization method. You learn about DPO, GRPO, and the modern post-training landscape that's making alignment more accessible than ever. It closes by examining that alignment spectrum from minimal safety to highly constrained behavior.
Lesson 7, Fine-Tuning Fundamentals: Lesson 7 equips you with practical fine-tuning skills. Sinan covers transfer learning principles and then focuses on techniques like LoRA and other parameter-efficient methods to make fine-tuning more accessible without massive compute spend. Youll learn about model distillation for creating smaller, faster models and develop a framework for choosing the right architecture, encoder, decoder, and combination encoder-decoder for your specific task.
Lesson 8, Multimodal Transformers: Lesson 8 extends beyond text to vision, audio, and a unified multimodal system. Sinan traces the path from the vision transformer to todays models that seamlessly handle images, text, and more. You explore current multimodal trends and come to understand how the same transformer principles have revolutionized not just NLP but are also transforming how AI perceives and generates across modalities.
Lesson 9, Reasoning Models: Lesson 9 examines the new frontier of AI reasoning. Sinan explores how models like Deepseek, GPT, and Claude with extended thinking actually work, trading inference speed for deeper problem solving. You learn how to benchmark reasoning capabilities and understand when these slower, more deliberate models justify their additional cost and latency.
Lesson 10, Deploying Transformer Models: Lesson 10 bridges the gap from prototype to production. It covers MLOps fundamentals for AI systems and then dives into quantization formats that enable local inference on consumer hardware. Youll learn about architectural patterns for putting models into production and the infrastructure considerations for serving them reliably at scale.
Lesson 11, RAG Plus Agents: This lesson covers the systems that extend LLMs beyond their training data. You build retrieval, augmented generation, or RAG pipelines from scratch, and then explore AI agents for dynamic workloads. You learn about MCP for connecting agents to external tools all around the world and how to architect multi-agent systems for complex tasks. Then you learn how to implement the reasoning and action (ReAct) agent pattern that powers modern agent loops.
Lesson 12, Evaluating LLMs and AI Systems: This lesson addresses one of the hardest problems in applied AI systems: knowing whether your AI system actually works. Sinan explores why evaluation is harder than it seems, covers techniques for assessing generative outputs, and examines LM as a judge approach alongside a human evaluation. You learn how to navigate benchmarks and leaderboards and how to track the production metrics that actually matter, like cost, latency, and reliability.
Lesson 13, The Future of AI: In this final lesson Sinan looks beyond current paradigms into what might come next. He examines the limits of autoregressive generation and explores alternatives like diffusion LLMs that generate tokens in parallel. Youll learn about world models that predict without generating and gain perspective on the trajectories that will shape AIs next chapter.
Lesson 1: Introduction to Attention and Language Models
1.1 A brief history of modern NLP
1.2 Paying attention with attention
1.3 Encoder-decoder architectures
1.4 How language models look at text
Lesson 2: How Transformers Use Attention to Process Text
2.1 Tokenization and embeddings
2.2 Scaled dot product attention
2.3 Multi-headed attention
2.4 Masked attention
2.5 Modern advancements in attention
Lesson 3: LLM Pre-Training and Recipes
3.1 The LLM training recipe book
3.2 How BERT is pre-trained and the path to ModernBERT
3.3 How GPT is pre-trained: Next token prediction at scale
3.4 Scaling laws and modern pre-training
Lesson 4: Natural Language Generation with GPT and More
4.1 The closed-source generative LLM landscape
4.2 The open-weight generative LLM landscape
Lesson 5: Prompt Engineering as Craft
5.1 Why prompting is still underrated
5.2 Structured outputs and prompt chaining
5.3 Chain-of-thought and few-shot prompting
5.4 LLM prompting and inference parameters
Lesson 6: Alignment and Post-Training
6.1 Supervised post-training
6.2 RLHF: The original preference post-training
6.3 DPO, GRPO, and the modern post-training landscape
6.4 The alignment spectrum
Lesson 7: Fine-Tuning Fundamentals
7.1 Introduction to transfer learning
7.2 LoRA and efficient fine-tuning
7.3 Model distillation
7.4 Choosing the best AI architecture for the task
Lesson 8: Multimodal Transformers
8.1 From ViT to unified multimodal architectures
8.2 Multimodal AI trends
Lesson 9: Reasoning Models
9.1 How reasoning models work
9.2 Benchmark reasoning models
Lesson 10: Deploying Transformer Models
10.1 Introduction to MLOps
10.2 Quantization formats and local inference
10.3 Putting models in production
Lesson 11: RAG Plus Agents
11.1 Introduction to retrieval augmented generation
11.2 Building a RAG pipeline from scratch
11.3 AI agents for dynamic workloads
11.4 MCP: Connecting agents to the world
11.5 Architecting multi-agent systems
Lesson 12: Evaluating LLMs and AI Systems
12.1 Why evaluation is harder than you think
12.2 LLMs-as-judges versus human evaluation
12.3 Benchmarks and leaderboards
Lesson 13: The Future of AI
13.1 The limits of autoregression
13.2 Diffusion and state space LLMs
13.3 What comes next
