- Traditional Bug Hunting Methods
- AI-Powered Automation in Bug Hunting
- AI Model Training, Fine-Tuning, and RAG for Bug Bounties
- Challenges of Using AI for Bug Bounty Hunting
- Test Your Skills
AI-Powered Automation in Bug Hunting
You’ve probably heard the phrase “the sky’s the limit.” When it comes to using AI for bug bounty hunting, that statement couldn’t be more accurate. The possibilities are amazing, and AI has the potential to revolutionize the way vulnerabilities are detected and exploited.
Figure 11-1 illustrates a simplified hierarchy of how bug bounty workflow entities are related.
At the top of the hierarchy are the bug bounty platforms like Bugcrowd, HackerOne, Intigriti, and others. Next, the program is the specific bug bounty program offered by a company or organization on the platform. For example, a company may set up a program with Bugcrowd to allow researchers to test their web applications for vulnerabilities.
The root domain represents the main domain or scope of the program. For example, if the company’s main website is websploit.org, this would be the root domain in scope for bug testing. Programs often list which root domains are in scope for testing.
Beneath the root domain are subdomains, which are more specific areas of the website (such as api.websploit.org or admin.websploit.org). Bug bounty programs may specify which subdomains are part of the scope because certain subdomains may expose different services or applications.
The IP address represents the network layer, where subdomains are resolved to one or more IP addresses. This allows ethical hackers to probe network-level configurations, services, and open ports that could be vulnerable.
The URL represents specific web pages or API endpoints within the subdomain. These URLs may have parameters, authentication mechanisms, or other inputs that hackers target for testing vulnerabilities like XSS or SQL injection.
Ports represent the specific network ports on the IP address that could be open and potentially expose services. For example, port 80 (HTTP) and port 443 (HTTPS) are often tested.
Let’s assume that now you want to automate the discovery, scanning, enumeration, and processing of these tasks using AI agents. You can use platforms like LangGraph and LangGraph Studio to create agents that will help accelerate these tasks with low-code requirements. Let’s take a look at the diagram in Figure 11-2.
Figure 11-2 represents the architecture of the building blocks for an automated system to automate reconnaissance and vulnerability in bug bounty programs.
The platforms table represents the various bug bounty platforms such as Bugcrowd, HackerOne, and similar platforms. The following are the platforms fields:
id: A unique identifier for each platform (primary key)
name: The name of the platform (such as Bugcrowd, HackerOne)
The programs table represents individual bug bounty programs run by different organizations on the platforms. Each program specifies the scope and rules for identifying and reporting vulnerabilities. These are specific bug bounties listed under the platforms. The following are the programs fields:
id: A unique identifier for each program (primary key)
name: The name of the bug bounty program
platformId: A foreign key that links the program to a specific platform
The root domains are the primary domains that are in scope for a bug bounty program. Each program may cover one or multiple root domains that you are allowed to test. The following are the root domains fields:
id: A unique identifier for each root domain (primary key)
name: The name of the root domain (such as websploit.org)
programId: A foreign key linking the root domain to a specific bug bounty program
These are the subdomains under the root domains that are also in scope for the bug bounty. A root domain like websploit.org may have subdomains such as api.websploit.org, which are part of the same scope for vulnerability testing. The following are the subdomains fields:
id: A unique identifier for each subdomain (primary key)
name: The name of the subdomain
rootDomainId: A foreign key linking the subdomain to a specific root domain
The IPs table stores IP addresses associated with subdomains. These IP addresses are the targets that may be probed for vulnerabilities such as open ports, services, and misconfigurations. The following are the fields in the IPs table:
id: A unique identifier for each IP address of the corresponding subdomain
address: The IP address of the subdomain
subdomainId: A foreign key linking the IP address to a specific subdomain
The ports table contains information about open ports on specific IP addresses. Identifying open ports is critical for understanding the services running on the target, which can reveal vulnerabilities such as misconfigurations or exposure of sensitive services. The following are the fields for ports:
id: A unique identifier for each port
number: The port number (such as 80, 443, 22)
ipId: A foreign key linking the port to a specific IP
The URLs table tracks specific web addresses within the subdomains that may need further inspection for vulnerabilities such as XSS, CSRF, or SQL injection. The following are the URLs fields:
id: A unique identifier for each URL
address: The specific URL address (such as https://api.websploit.org/login)
ipId: A foreign key linking the URL to a specific IP address
The vulnerabilities table stores information about specific vulnerabilities found during the bug hunting process. This is one of the core elements of this automated system because it represents the outcomes of ethical hacking and bug hunting efforts. The following are the URL fields:
id: A unique identifier for each vulnerability (primary key)
description: A textual description of the vulnerability (such as “SQL Injection in the login page”).
ipId: A foreign key linking the vulnerability to a specific IP
How can AI help with bug hunting in this architecture? AI can correlate information between different components, such as IPs, URLs, vulnerabilities flagged, and open ports, to paint a complete picture of potential attack surfaces. It can even analyze historical data from other scans to similar applications. AI could predict which services or ports are more likely to have security flaws based on the type of system, configuration, or past exploitation data.
AI models can be trained to identify common web vulnerabilities, such as XSS or SQL injection, by scanning and analyzing the URLs, parameters, and request-response patterns. AI can automate the process of identifying malicious input or output behavior, improving detection accuracy and reducing false positives.
AI can accelerate subdomain enumeration by using predictive models that recognize patterns in domain naming conventions, automatically suggesting potential subdomains or previously undiscovered targets. AI-driven DNS reconnaissance tools can quickly uncover new subdomains linked to root domains by cross-referencing large datasets from DNS history and certificates.
What if we give the explanation of Figure 11-2 to tools such as Cursor, GitHub Copilot, or OpenAI’s models such as o1? I used OpenAI’s o1, and it created Python code that uses SQLAlchemy, an object-relational mapping (ORM) library. Example 11-2 shows the Python code. Additionally, the code includes functions to process and interact with the data, such as adding records and querying vulnerabilities.
EXAMPLE 11-2 AI-Generated Code to Implement the Database to Support the Use Case of Figure 11-2
from sqlalchemy import (
create_engine, Column, Integer, String, ForeignKey, Text, UniqueConstraint
)
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
# Define the database
engine = create_engine('sqlite:///bug_bounty.db')
Base = declarative_base()
# Define the platforms
class Platform(Base):
__tablename__ = 'platforms'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
programs = relationship('Program', back_populates='platform')
class Program(Base):
__tablename__ = 'programs'
id = Column(Integer, primary_key=True)
name = Column(String)
platform_id = Column(Integer, ForeignKey('platforms.id'))
platform = relationship('Platform', back_populates='programs')
root_domains = relationship('RootDomain', back_populates='program')
class RootDomain(Base):
__tablename__ = 'root_domains'
id = Column(Integer, primary_key=True)
name = Column(String)
program_id = Column(Integer, ForeignKey('programs.id'))
program = relationship('Program', back_populates='root_domains')
subdomains = relationship('Subdomain', back_populates='root_domain')
class Subdomain(Base):
__tablename__ = 'subdomains'
id = Column(Integer, primary_key=True)
name = Column(String)
root_domain_id = Column(Integer, ForeignKey('root_domains.id'))
root_domain = relationship('RootDomain', back_populates='subdomains')
ips = relationship('IP', back_populates='subdomain')
class IP(Base):
__tablename__ = 'ips'
id = Column(Integer, primary_key=True)
address = Column(String)
subdomain_id = Column(Integer, ForeignKey('subdomains.id'))
subdomain = relationship('Subdomain', back_populates='ips')
ports = relationship('Port', back_populates='ip')
urls = relationship('URL', back_populates='ip')
vulnerabilities = relationship('Vulnerability', back_populates='ip')
class Port(Base):
__tablename__ = 'ports'
id = Column(Integer, primary_key=True)
number = Column(Integer)
ip_id = Column(Integer, ForeignKey('ips.id'))
ip = relationship('IP', back_populates='ports')
class URL(Base):
__tablename__ = 'urls'
id = Column(Integer, primary_key=True)
address = Column(String)
ip_id = Column(Integer, ForeignKey('ips.id'))
ip = relationship('IP', back_populates='urls')
class Vulnerability(Base):
__tablename__ = 'vulnerabilities'
id = Column(Integer, primary_key=True)
description = Column(Text)
ip_id = Column(Integer, ForeignKey('ips.id'))
ip = relationship('IP', back_populates='vulnerabilities')
# Create all tables
Base.metadata.create_all(engine)
# Create a session
Session = sessionmaker(bind=engine)
session = Session()
# Functions to process data
def add_platform(name):
platform = Platform(name=name)
session.add(platform)
session.commit()
print(f"Added platform: {name}")
def add_program(name, platform_name):
platform = session.query(Platform).filter_by(name=platform_name).first()
if not platform:
print(f"Platform {platform_name} not found.")
return
program = Program(name=name, platform=platform)
session.add(program)
session.commit()
print(f"Added program: {name} under platform {platform_name}")
def add_root_domain(name, program_name):
program = session.query(Program).filter_by(name=program_name).first()
if not program:
print(f"Program {program_name} not found.")
return
root_domain = RootDomain(name=name, program=program)
session.add(root_domain)
session.commit()
print(f"Added root domain: {name} under program {program_name}")
def add_subdomain(name, root_domain_name):
root_domain = session.query(RootDomain).filter_by(name=root_domain_name).
first()
if not root_domain:
print(f"Root domain {root_domain_name} not found.")
return
subdomain = Subdomain(name=name, root_domain=root_domain)
session.add(subdomain)
session.commit()
print(f"Added subdomain: {name} under root domain {root_domain_name}")
def add_ip(address, subdomain_name):
subdomain = session.query(Subdomain).filter_by(name=subdomain_name).first()
if not subdomain:
print(f"Subdomain {subdomain_name} not found.")
return
ip = IP(address=address, subdomain=subdomain)
session.add(ip)
session.commit()
print(f"Added IP: {address} under subdomain {subdomain_name}")
def add_port(number, ip_address):
ip = session.query(IP).filter_by(address=ip_address).first()
if not ip:
print(f"IP {ip_address} not found.")
return
port = Port(number=number, ip=ip)
session.add(port)
session.commit()
print(f"Added port: {number} to IP {ip_address}")
def add_url(address, ip_address):
ip = session.query(IP).filter_by(address=ip_address).first()
if not ip:
print(f"IP {ip_address} not found.")
return
url = URL(address=address, ip=ip)
session.add(url)
session.commit()
print(f"Added URL: {address} to IP {ip_address}")
def add_vulnerability(description, ip_address):
ip = session.query(IP).filter_by(address=ip_address).first()
if not ip:
print(f"IP {ip_address} not found.")
return
vulnerability = Vulnerability(description=description, ip=ip)
session.add(vulnerability)
session.commit()
print(f"Added vulnerability to IP {ip_address}: {description}")
def get_vulnerabilities_by_program(program_name):
program = session.query(Program).filter_by(name=program_name).first()
if not program:
print(f"Program {program_name} not found.")
return []
vulnerabilities = session.query(Vulnerability).join(IP).join(Subdomain).
join(RootDomain).filter(
RootDomain.program == program
).all()
return vulnerabilities
# Example usage
if __name__ == '__main__':
# Add data
add_platform('HackerOne')
add_program('Acme Corp Bug Bounty', 'HackerOne')
add_root_domain('acme.com', 'Acme Corp Bug Bounty')
add_subdomain('api.acme.com', 'acme.com')
add_ip('192.168.1.1', 'api.acme.com')
add_port(80, '192.168.1.1')
add_port(443, '192.168.1.1')
add_url('https://api.acme.com/login', '192.168.1.1')
add_vulnerability('SQL Injection in login page', '192.168.1.1')
# Retrieve vulnerabilities for a program
vulns = get_vulnerabilities_by_program('Acme Corp Bug Bounty')
for vuln in vulns:
print(f"Vulnerability ID: {vuln.id}, Description: {vuln.description}")
The code in Example 11-2 provides a foundational structure for your automated system. You can extend it by adding more processing functions, integrating actual reconnaissance tools, and implementing vulnerability scanning methods.
AI Capabilities of Bug Bounty Platforms
Bug bounty platforms like Bugcrowd are all leveraging AI to enhance their services, streamline operations, and provide more security solutions to their clients. AI’s integration into these platforms enables continuous monitoring, automated vulnerability detection, and intelligent threat analysis, among other capabilities. Bugcrowd has an AI-driven service called Continuous Attack Surface Penetration Testing (CASPT).
CASPT uses AI models to continuously scan and identify new or modified assets within an organization’s digital footprint. This effort includes detecting new applications, services, or infrastructure components that may have been added or altered since the last assessment.
AI helps establish a baseline of the organization’s attack surface by analyzing historical data. It then monitors for deviations or changes, ensuring that any new vulnerabilities introduced by modifications are promptly identified. Based on the continuous monitoring of assets, AI determines optimal times to initiate penetration tests. By doing so, it guarantees that testing is both timely and relevant, focusing on areas with the highest risk or recent changes.
Bugcrowd’s acquisition of a company named Informer helped it in advanced external attack surface management (EASM) capabilities. AI algorithms merge detailed asset information from Informer with Bugcrowd’s existing vulnerability databases. You can initiate new penetration tests directly from the EASM dashboards, with AI managing the integration and ensuring that tests are aligned with the current state of the attack surface.
A Comprehensive View of an Organization’s External Risk Exposure
An organization’s attack surface extends far beyond what is readily visible. The modern typical ecosystem consists of web domains, subdomains, IP addresses, cloud services, APIs of hundreds of applications (including AI applications), and more—each potentially serving as an entry point for an attacker. The first step in automated external attack surface management is to gain visibility into both known and unknown assets that an adversary might exploit. AI-powered tools can scan, map, and inventory these assets, giving you an “attacker’s perspective.”
Figure 11-3 illustrates a process flow or a framework for external attack surface discovery and exploitation. The figure is structured with concentric circles, which could symbolize different layers or steps in the process. Each layer is associated with one of four core concepts: discovering assets, monitoring changes, getting actionable insights, and amplifying security testing.
The first goal is to find and identify all external assets, known and unknown, that could be part of an organization’s footprint. Then you continuously watch for any alterations or updates in the external infrastructure or applications, ensuring that no new exposures go unnoticed. Subsequently, you turn the data gathered through monitoring into meaningful insights, which you can use to formulate your attack plan. You could integrate this process with red teaming, penetration testing, and crowdsourced bug bounties.
Vulnerability Prioritization Using AI
AI can assist in prioritizing vulnerabilities based on potential exploitability and risk. By analyzing factors such as Common Vulnerability Scoring System (CVSS) scores, the affected services, historical exploit data, and business impact, AI can help security teams focus on the most critical vulnerabilities first.
Let’s start by defining CVSS, the Exploit Prediction Scoring System (EPSS), and CISA’s Known Exploited Vulnerabilities (KEV) Catalog, and then examine how AI can assist in vulnerability prioritization using these systems.
A Quick Refresher About CVSS, EPSS, and CISA’s KEV
CVSS is an open framework for communicating the characteristics and severity of software vulnerabilities. It provides a way to capture the principal technical characteristics of a vulnerability and produce a numerical score reflecting its severity (0–10). The CVSS score can then be translated into a qualitative representation (such as low, medium, high, and critical) to help organizations properly assess and prioritize their vulnerability management processes. You can access the latest CVSS specification at https://www.first.org/cvss.
CVSS consists of four metric groups: Base, Threat, Environmental, and Supplemental. Base metrics represent the intrinsic qualities of a vulnerability. The Threat metric group represents the traits of a vulnerability connected to potential threats, which can evolve over time, though they may remain consistent across different user environments.
The Environmental metric group captures the features of a vulnerability that are specific and relevant to an individual consumer’s environment. This includes factors such as the existence of security controls that could reduce or eliminate the impact of a successful attack, as well as the criticality of the affected system within the overall technology infrastructure. The Supplemental metric group consists of metrics that offer additional context and measure external attributes of a vulnerability. The answers to these metrics are defined by the CVSS consumer, enabling them to use a risk analysis system tailored to their environment to assign locally relevant severity to the metrics and values.
EPSS is a data-driven effort for estimating the probability that a software vulnerability will be exploited in the wild. Unlike CVSS, which focuses on the technical severity of a vulnerability, EPSS aims to predict the likelihood of a vulnerability being exploited based on real-world data and machine learning techniques. You can access the latest EPSS specification at https://www.first.org/epss.
The following are a few key features of EPSS:
Provides a probability score between 0 and 1 (0% to 100%)
Uses current threat information and real-world exploit data
Updates scores daily based on new information
Helps prioritize vulnerabilities based on exploitation likelihood rather than just technical severity
The KEV Catalog is maintained by the Cybersecurity and Infrastructure Security Agency (CISA) and serves as an authoritative source of vulnerabilities that have been exploited in the wild. It is designed to help organizations prioritize remediation efforts for vulnerabilities that pose significant risks. You can access CISA’s KEV at https://www.cisa.gov/kev.
KEV focuses on vulnerabilities with known exploit code or active exploitation and provides remediation due dates for each vulnerability. This list is updated regularly as new exploited vulnerabilities are identified.
How AI Can Help Bug Hunters and Organizations Prioritize Vulnerabilities Using CVSS, EPSS, and KEV
AI can help integrate data from CVSS, EPSS, and KEV along with other relevant sources to create a comprehensive view of vulnerability risks. Machine learning algorithms can analyze this data to identify patterns and correlations that humans might miss, providing a more nuanced understanding of vulnerability priorities. Figure 11-4 shows how you could build AI systems to analyze and prioritize vulnerabilities on a large scale using data from CVSS, EPSS, and KEV.
By leveraging historical data from CVSS, EPSS, and KEV, AI can develop predictive models to estimate the likelihood of future exploits. AI can consider the specific context of an organization or bug hunter’s environment when prioritizing vulnerabilities. By analyzing factors such as the organization’s network architecture, deployed assets, and security controls, you can use AI to provide tailored prioritization recommendations to your client that go beyond generic CVSS or EPSS scores.
AI-enabled systems can continuously monitor threat intelligence feeds, including updates to the KEV Catalog, and automatically adjust vulnerability discovery priorities based on new information. You can leverage GenAI with NLP capabilities to analyze vulnerability descriptions, security advisories, and exploit discussions in forums to extract additional context and severity indicators. This information can be used to refine prioritization beyond what CVSS, EPSS, and KEV provide numerically.
By analyzing the characteristics of vulnerabilities that have progressed from CVSS scoring to EPSS high-probability to KEV listing, you can potentially predict which newly discovered vulnerabilities are most likely to follow a similar path. This analysis can help prioritize vulnerabilities that may not yet have high EPSS scores or KEV entries but have characteristics suggesting they may soon become critical.
Based on the vulnerability type and target system, AI could generate basic exploit templates or code skeletons, which an ethical hacker could then refine and customize. AI can parse and analyze large volumes of technical documentation, APIs, and code comments to identify potential weak points or unintended functionality that could be exploited.
AI-Created Scanner Templates
The ProjectDiscovery Nuclei Scanner is a fast and extensible open-source vulnerability scanner designed to identify and mitigate security vulnerabilities across modern applications, infrastructure, cloud platforms, and networks.
Nuclei uses a template-driven approach, where each template is a YAML file that defines the steps needed to detect specific vulnerabilities. This approach allows for highly customizable and targeted scans. It also supports a massive library of community-curated templates, which are continually updated to include the latest vulnerabilities in many systems.
The ProjectDiscovery Cloud Platform (PDCP) is a cloud-based security solution targeted at delivering ongoing visibility into your external attack surface by identifying exploitable vulnerabilities and misconfigurations. It is designed to address multiple use cases and can scale to support the essential workflows that bug bounty hunters and security teams require to find vulnerabilities. You can access PDCP at https://cloud.projectdiscovery.io.
Figure 11-5 shows the PDCP dashboard.
As previously mentioned, Nuclei supports a massive library of community-curated templates. Figure 11-6 shows an example of a community template.
The Nuclei template in Figure 11-6 is designed to identify Azure OpenAI Service instances that are not configured to use private endpoints. It highlights that failing to use private endpoints can leave OpenAI service instances exposed to external attacks.
Figure 11-7 shows the interface of ProjectDiscovery’s Cloud Platform, where Nuclei templates can be created using AI. In this example, the interface displays a vulnerability scan template related to an insecure direct object reference (IDOR) vulnerability.
Example 11-3 shows the full template generated by the AI feature illustrated in Figure 11-7.
EXAMPLE 11-3 The AI-Generated Nuclei Scanner Template
id: userprofile-idor
info:
name: Insecure Direct Object Reference (IDOR) in User Profile Page
author: ProjectDiscoveryAI
severity: high
description: |
The application exposes sensitive information of a user (ID: 2) who is not the
authenticated user (session: abcd1234), leading to an IDOR vulnerability.
http:
- raw:
- |
GET /profile?id=2 HTTP/1.1
Host: {{Hostname}}
User-Agent: Mozilla/5.0
Cookie: session=abcd1234
matchers-condition: and
matchers:
- type: word
words:
- "Welcome, otheruser"
- type: status
status:
- 200
The AI model used in Example 11-3 generated a proof-of-concept (PoC) template that provides an HTTP request to potentially find IDORs. The generated request targets a profile page (/profile?id=2), indicating that by manipulating the id parameter in the URL, a user could access another user’s profile.
After the template is generated, you can use it to scan a target application (websploit.org in the example shown in Figure 11-8).








