- Introduction
- Case Study 3: From RAG to Agents
- When Should You Use Workflows Versus Agents?
- Case Study 4: A (Nearly) End-to-End SDR
- Evaluating Agents
- Conclusion
Case Study 4: A (Nearly) End-to-End SDR
A sales development representative (SDR) is a sales professional (usually a human) focused on finding and qualifying potential customers for a company. SDRs are the initial point of contact for leads, engaging with them and determining whether they are a good fit for a product or service. SDRs nurture leads, often passing qualified ones to other sales team members for closing. In this case study, we will attempt to automate as much of this process as possible using real-world systems like Hubspot (a CRM), Resend (an email-sending service), Google (web search), and Firecrawl (a web-crawling API).
Agent 1: Lead Generation
The idea we’ll use as a guiding star is to create an agent for each section of the SDR’s job. Our first agent’s job will be to go on the internet and find potentially good leads. All agents will have a generic prompt indicating how they are part of an SDR function. The lead generation agent is also given a specific prompt, as shown in Prompt 4.1.
Prompt 4.1 gives the agent specific instructions in order and reminds it of things that a human might find obvious, such as following the steps in order and not skipping any. It essentially tells the agent to find people who might use my book (the standard prompt has basic information about my book in it) and record their information in the CRM.
Figure 4.12 visualizes this agent and three of the tools I would want it to potentially have. However, before giving the agent access to tools like web search, web crawling, and contact management in the CRM, we should consider an increasingly popular method for letting agents discover tools—MCP.
Figure 4.12 The lead generation agent is tasked with finding potentially qualified leads by using simple web crawling and web searching tools alongside tools for updating the CRM.
MCP for Flexible Tool Discovery
In November 2024, Anthropic released the Model Context Protocol (MCP) to not much initial fanfare. Anthropic was attempting to standardize the way we introduce tools (and other resources—but mostly tools) to AI models. The idea behind MCP (visualized in Figure 4.13) is actually quite simple: The MCP server is an API server (in Python, JavaScript, or some other language—it doesn’t matter) that houses, among other things, definitions for tools and the capacity to execute the tools. The MCP server has API endpoints to list tools, execute tools, and so on. Put another way, MCP is a standardized API format placed in front of the tool definitions and execution code that makes it easier to develop agents across frameworks, programming languages, and businesses.
Figure 4.13 An agent is told about one or more MCP servers. On initialization, the agent reaches out to each one and asks which tools it has for it to use. From there, everything is the same as far as the LLM is concerned. It has tools and functions it can decide to call, no matter where they came from.
When an AI agent “wakes up,” it reaches out to the MCP servers included in a list that it was given and asks each of those servers which tools it can offer. Each MCP server then tells the agents the names of the functions and the arguments they take. The code to actually execute the code lives on the server, too. Once the AI agent grabs the tool definitions from the server, everything is effectively the same to the LLM—calling tools, writing content, and so on.
In summary, MCP is an agreed-upon standard that, if everyone follows it, can facilitate the development of tools for AI systems around the world. A developer in California can write an MCP server for a tool that they made (or didn’t!) and put it on GitHub, where a developer in Mumbai can find it and use it. In fact, in this case study, I will write just the first two MCP servers; the third one was created by a third-party team where I had no involvement. I just got to use it for free—how amazing!
There are caveats to MCP, of course. Clearly, tool design and selection are critical, as we saw with our SQL agent. Using the right tool can be efficient and is often a necessary step. Because MCP servers give an AI model access to a set of unknown external tools, however, the challenges will be compounded. The agent will inevitably encounter new tools with descriptions of varying quality. We can mitigate this risk by making sure the descriptions we write for our tools are sufficiently contextful, but even a single bad tool description can send agents down a completely wrong path.
With that warning, let’s define our first two MCP servers. These are the MCP servers that I wrote specifically for this case study.
MCP 1: Web Search + Web Crawling
The Python template for creating an MCP server generally has three parts:
An API route to list the tools available on the server (names, arguments, etc.)
An API route to accept arguments to execute a tool call
Logic to call and execute each tool
Listing 4.5 shows an abbreviated code snippet of our first MCP server delivering the ability to Google something (using the Serp API product) and to crawl most websites on the planet (using Firecrawl’s web-crawling service as an API to make this easier). Note that we are using the official Python MCP implementation here: https://github.com/modelcontextprotocol/python-sdk. Using this official implementation means that we just have to write the Python functions correlating to listing tools, calling and executing tools, and so on.
Listing 4.5 Creating the first agent in LangGraph
from mcp.server.models import InitializationOptions
from mcp.server import NotificationOptions, Server
from mcp.types import Tool, TextContent
import mcp.types as types ...
# Import required libraries
from serpapi import GoogleSearch
from firecrawl import FirecrawlApp
# Create server instance
server = Server("research-mcp-server")
def search_with_serpapi(query: str) -> str:
"""Search the web for the query using SerpApi."""
api_key = os.environ.get("SERP_API_KEY")
if not api_key:
return "Error: SERP_API_KEY environment variable is required"
try:
search = GoogleSearch({
"q": query,
"api_key": api_key,
...
return "\n".join(formatted_results)
except Exception as e:
return f"Error performing search: {str(e)}"
def scrape_with_firecrawl(url: str, format_type: str = "markdown") -> str:
"""Scrape a webpage using Firecrawl."""
api_key = os.environ.get("FIRECRAWL_API_KEY")
if not api_key:
return "Error: FIRECRAWL_API_KEY environment variable is required"
if format_type not in ["markdown", "links"]:
format_type = "markdown"
try:
firecrawl = FirecrawlApp(api_key=api_key)
...
return response if response else "No content found"
except Exception as e:
return f"Error scraping URL: {str(e)}"
@server.list_tools()
async def handle_list_tools() -> List[Tool]:
"""List available research tools."""
tools = []
tools.append(Tool(
name="web_search",
description="Search the web for information using SerpApi. Returns top 3 search
results with titles, snippets, and URLs.",
inputSchema={
"type": "object",
...
return tools
@server.call_tool()
async def handle_call_tool(name: str, arguments: Dict[str, Any]) -> List[types.
TextContent]:
"""Handle tool calls for research operations."""
if name == "web_search":
query = arguments.get("query")
if not query:
return [types.TextContent(type="text", text="Error: Query is required")]
result = search_with_serpapi(query)
return [types.TextContent(type="text", text=result)]
elif name == "scrape_website":
url = arguments.get("url")
format_type = arguments.get("format", "markdown")
result = scrape_with_firecrawl(url, format_type)
return [types.TextContent(type="text", text=result)]
else:
return [types.TextContent(type="text", text=f"Unknown tool: {name}")]
async def main():
... code to actually run the server
if __name__ == "__main__":
asyncio.run(main())
That was a lot of code, but don’t worry: I won’t be showing much more MCP code from here on out (it’s all in the book’s GitHub). In Listing 4.5, you can see the three main parts there: functions to run each of our two tools, a route to list tools, and a route to accept arguments to execute a tool. With that, we have our first MCP server, giving an AI agent the ability to look things up and visit web pages.
Figure 4.14 visualizes our first MCP server with two tools. A lead generation tool will certainly need to look things up and visit web pages. It will also need a system of record to write everything down in.
Figure 4.14 Our simple homegrown MCP server has only two tools: web search via the Serp API (Googling something as an API) and web crawling (a service provided by Firecrawl).
MCP 2: Hubspot Management
This isn’t our first discussion about a system of record for an AI agent. In Chapter 3, we watched an AI system improve at a task over time after we gave it a notepad to write things down in. In this case, we don’t really need the agent to get better over time (at least for now), but we should be able to audit what is going on in the CRM. I chose Hubspot for this purpose (feel free to choose a different CRM) mostly because the APIs were easy to pick up. Figure 4.15 shows the five tools I wrote to connect to the Hubspot API:
Create contact: Create a new contact in the CRM.
Update contact: Update an existing contact.
Add note to contact: Add a free text note to a contact.
Retrieve notes for contact: List all notes for a given contact.
Fetch contacts: Use search criteria to grab lists of contacts.
Figure 4.15 Our more complicated homegrown MCP server has five tools for interacting with the CRM so the agents can keep us in the loop about their work.
I did try to find a Hubspot MCP server on the internet, but the few I found either didn’t work as written or didn’t have the tools I needed (in particular, the ability to create a contact). So, in this case, I asked an AI agent to write an MCP server for me given the first MCP I wrote as an example—and it worked great!
Now we have an agent with seven tools, and Figure 4.16 shows the current state of our lead generator agent. Now, we need to qualify these leads.
Figure 4.16 The Lead Generator agent gets its seven tools from two MCP servers.
Agent 2: Lead Qualification
Once the lead generation agent adds a new contact to the CRM, the lead qualifying agent (pictured in Figure 4.17) will take over. It will both double-check the work of the lead generation agent and check some more criteria that I added into the special prompt area. At a glance, it might seem as if the lead qualifier agent is redundant given the lead generation agent. They have the same tools and basically the same goal (to identify qualified leads), but the key difference lies in the context of the agent. Chapter 7 formally dives into the idea of context engineering—the concept of providing an AI model or agent with the necessary information, context, and tools to perform a task effectively.
Figure 4.17 Our lead qualifying agent will have the same tools as the lead generation agent. It will double-check the lead generator’s work while also doing more research.
The lead generator is given a small set of rules by which to judge candidates and is asked to pull potential leads from the wide world of the open internet. In contrast, the qualifier agent is given both a single person’s information and a longer set of requirements to check before moving on to the next stage. In other words, the lead generator has the easier job of pointing a finger at someone and saying, “They seem right”—a task that can be performed by a smaller, cheaper, faster LLM. The cost of a false positive in this case is low because we know a second agent will double-check its results. The lead qualifier agent should be a larger, slower, more expensive LLM because its job is arguably harder: It must go through several pieces of information; read and understand the candidate’s syllabus, CV, and other data; and make the final judgment whether the person should receive an email. The cost of a false positive is high here, because I don’t want to bother people who aren’t good fits for my book.
Like the lead generation agent, the qualifying agent will have some extra information in its system prompt. A snippet of this information can be seen in Prompt 4.2.
Figure 4.18 shows that this agent will have the same MCP server access as the lead generation agent. In some cases, it might need to double-check some information online and update information/notes.
Figure 4.18 Our lead qualifier has the same MCP servers as the lead generator. Its job is to do deeper research on the leads and make sure they are good candidates.
At this point, the lead generator has pulled in some leads, and the lead qualifier has double-checked the information and marked the lead as qualifying. Now it’s time for a third agent to take over and send the initial cold email.
Agent 3: Lead Emailing
The lead emailing agent (Figure 4.19) involves more than just a system prompt change, with a single-shot example (recall from Chapter 1 that a k-shot/few-shot prompt means I am placing in-context examples in the prompt to guide the AI) showing how I want the email to roughly sound (as seen in Prompt 4.3). The format of the email in the single-shot email is not as important here because we expect the agent to use a tool via MCP with structured inputs. What’s more important is the fact that it writes HTML-encoded emails and subject lines and sends them to the right tool.
Figure 4.19 The lead emailing agent needs to update the CRM, but also requires new capabilities to email leads on my behalf.
A quick note on Agent 3: A decent argument can be made for making this final agent a workflow. For one thing, Agent 3 isn’t given much agency. We know the lead is qualified given the previous two agents’ work, and all this agent has to do is write the email and call a few APIs in order—that sounds like a predefined pathway. The reason I wanted to make this a true agent over a workflow was simple: I assume this task is easy enough that a system prompt with clear instructions is enough for the agent to follow my instructions clearly. In Chapter 5, we will start to put some of these assumptions to the test. For now, by choosing an agent over a workflow, we are effectively saying either “This task requires the agent to make decisions on the fly that are too complicated/near impossible to code in a predefined pathway” or “This task is easy and straightforward enough that we can save development cycles by simply attaching preexisting MCP servers to an LLM with proper context.”
Assuming this job is left up to an agent and not a workflow, this agent will need a way to send emails. MCP comes to the rescue once more.
MCP 3: Resend for Email
If you’re not familiar with it, Resend is a developer-friendly email platform with APIs to simplify sending emails at scale. What’s even cooler is that it has an official MCP server on its GitHub. Granted, at the time of writing there is just a single tool on that MCP server, but it’s the tool I most care about: send-email. I could write my own MCP server for Resend, just as I did with the CRM. However, I want to showcase that our system is agnostic regarding who writes MCP servers and in which language the code launches the official Resend MCP server (written in Typescript, not Python). Our agent will be able to use both it and our homegrown Pythonic MCP servers.
Figure 4.20 shows the topline view of our third agent. The web research MCP has been replaced (because in theory it won’t be needed because of the past two agents’ work) with our new MCP server, and it is ready to send emails.
Figure 4.20 Our final MCP server has a single tool and was developed by the same team that created the email sending service Resend.
When to Use a Multi-Agent Versus a Single Agent
Why didn’t we just create a single agent with an elongated set of instructions to walk through the whole process of generation, qualifying, and emailing? Certainly, we could have: We could just make that agent wake up and attempt to find and email a new lead. In theory, we could spin up a hundred of them, giving each a different location and industry to focus on.
I chose a multi-agent system in this case study for the following reasons:
To reduce potential errors in which the agent emailed someone before qualifying them and adding notes to the CRM. In other words, what if the agent “forgets a step” along the way?
To minimize overlap in duties. What if two agents that aren’t talking to each other or are in some race condition end up trying to email the same person at the same time? That lead would not appreciate the double email and would probably be lost.
To leave room for experimenting with different flavors of LLMs (including LLMs fine-tuned for specific tasks) for each of the different tasks. Maybe GPT-5 is slower but better at emailing, whereas Llama 4-Scout is optimal for quicker lead generation.
It’s absolutely true that these tasks could be handled by a single agent, but by splitting them up, I gain more control over specific aspects of the funnel. This way, if qualifying is going poorly but everything else is okay, I can tweak and experiment on qualification alone without too much risk to the other portions of the pipeline. With a single agent, every prompt change risks a regression in another area. For this reason, a fair analogy to multi-agent systems would be a micro-service architecture. A change to a single service can be done in much more isolation as long as we are aware of how that single service (agent) fits into the larger picture.
We have now designed and coded three agents, each handling a different portion of the sales process (Figure 4.21). However, we lack a mechanism to test this engine or to keep it running, constantly finding new leads, qualifying them, and emailing them. We will tackle the latter issue in Chapter 5. To gut check the functionality of our agents, though, we should chat with them to see how they do.
Figure 4.21 The 10,000-foot view of our multi-agent task delegation. The lead generator will fill the top of the funnel, the qualifying agent will get enough information to be able to email qualified leads, and the emailing agent will send off that first cold email on my behalf.
Streamlit for Ad-Hoc Testing of the Three Agents
In the codebase for the multi-agent system, I created a visual interface where you can select one of the three agents and chat with it, asking it to perform tasks on demand. Figure 4.22 shows an example in which I asked an agent to list contacts in the CRM and it did so correctly (displaying the two fake contacts Hubspot created during account creation).
Figure 4.22 A Streamlit app (found in the book’s GitHub) showing basic capabilities of the agents for interacting with the CRM.
When I asked the lead generation agent to find a contact at a specific business school, it did. (This example isn’t shown here because it would invade that person’s privacy.) When I asked the qualifying agent to run on that contact, it changed the status and added a note to confirm the contact’s qualification. When I asked the email sending agent to send an email, it did; it then changed the contact’s status again, as expected.
At this point, I’m satisfied with the agents’ individual performances from basic ad-hoc testing. However, if we are to trust this system at scale, we need to consider a few more factors.
