- Chapter Objectives
- Technical Aspects of AI in Threat Intelligence
- Case Study: Using CNNs for Malware Classification
- Case Study: Detecting and Analyzing Phishing Campaigns
- Leveraging AI to Automate STIX Document Creation for Threat Intelligence
- Case Study: Automating Threat Intelligence for a Financial Institution
- Autonomous AI Agents for Cyber Defense
- Case Study: Using MegaVul to Build an AI-Powered Vulnerability Detector
- AI Coding Agents
- Summary
- Multiple-Choice Questions
- Answers to Multiple-Choice Questions
- Exercises
Leveraging AI to Automate STIX Document Creation for Threat Intelligence
Standards such as Structured Threat Information eXpression (STIX) and Trusted Automated Exchange of Indicator Information (TAXII) have been developed to provide a common language and secure transport for threat data. You can use AI to automatically generate STIX documents from unstructured threat data, streamlining the entire intelligence lifecycle.
Understanding STIX and TAXII
STIX is a standardized language designed to represent cyber threat intelligence in a consistent, machine-readable format. It allows organizations to describe entities such as indicators, threat actors, campaigns, and observed data, providing rich context and relationships that facilitate automated analysis and sharing. By using STIX, analysts can translate diverse threat information into a common format that both humans and machines can process effectively.
TAXII, on the other hand, is a protocol that specifies how to exchange cyber threat intelligence (CTI) over HTTPS. TAXII defines the services and message exchanges—such as request/response (collections) and publish/subscribe (channels)—that allow organizations to securely share STIX-formatted threat intelligence with trusted partners.
Together, STIX and TAXII enable a robust, interoperable ecosystem for threat intelligence sharing, ensuring that critical information is both standardized and securely transmitted across various platforms and communities. You can find more information about the STIX and TAXII specifications at https://oasis-open.github.io/cti-documentation.
Using AI to Create STIX Documents
Advanced natural language processing and transformer-based models have dramatically enhanced how threat intelligence data is collected, processed, and shared. AI can
Extract Key Information: By processing unstructured text from blogs, reports, dark web forums, and other sources, AI models can identify key indicators of compromise (IoCs), threat actor details, attack patterns, and contextual information.
Map Unstructured Data to STIX Objects: AI algorithms can leverage predefined mapping rules to convert extracted data into standardized STIX objects—such as indicators, observed data, and threat actors—thus automatically generating comprehensive STIX bundles.
Automate Document Generation: With fine-tuning on cybersecurity corpora, transformer models (for example, OpenAI’s models, Claude, Llama) can generate threat intelligence narratives in STIX format. This capability is illustrated in Figure 4-5.
A typical AI-driven process to create STIX documents (as mostly illustrated in Figure 4-5) might involve the following steps:
Data Collection: Gather unstructured threat intelligence from multiple sources (for example, security blogs, social media, dark web data).
Data Processing: Use AI models to extract relevant entities and attributes from the text.
Mapping to STIX: Apply mapping logic to convert these extracted elements into STIX-compliant objects.
Bundle Generation: Assemble the STIX objects into a coherent STIX bundle (a collection of interconnected threat intelligence elements).
The Python script in Example 4-1 enables interaction with OpenAI’s models to generate STIX JSON documents from recent malware entries retrieved from the Malware Bazaar API. The script retrieves the latest malware entries and then uses the OpenAI API to generate STIX JSON documents for each entry.
Example 4-1 Automatically Creating STIX JSON Documents Using AI
# Import Required Libraries
import os
import requests
import json
from openai import OpenAI
# Retrieve your OpenAI API key from environment variables.
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
raise ValueError("Please set your OPENAI_API_KEY environment variable.")
# Instantiate the OpenAI client.
client = OpenAI(api_key=api_key)
# Malware Bazaar API endpoint.
MALWARE_BAZAAR_API_URL = 'https://mb-api.abuse.ch/api/v1/'
def get_recent_malware_entries(limit=5):
"""
Retrieve recent malware entries from Malware Bazaar using the "selector": "100"
(which returns the latest 100 additions) and then return only the first `limit`
entries.
"""
payload = {
"query": "get_recent",
"selector": "100" # Using "100" to get the latest 100 additions.
}
try:
response = requests.post(MALWARE_BAZAAR_API_URL, data=payload, timeout=15)
response.raise_for_status()
data = response.json()
if data.get("query_status") == "ok" and "data" in data:
return data["data"][:limit]
else:
print("Malware Bazaar returned an error or no data:", data.get("query_
status"))
return []
except requests.RequestException as e:
print("Error contacting Malware Bazaar API:", e)
return []
def generate_stix_document(malware_entry):
"""
Use OpenAI's GPT model (via the new client interface) to generate a STIX 2.1
JSON document
from a single malware entry.
"""
prompt = (
"””
Convert the following malware intelligence entry into a
valid STIX 2.1 JSON document.
Include relevant STIX objects such as Malware, Indicator,
and Observed Data with proper relationships.
Ensure the output is valid JSON and conforms to STIX 2.1
standards.\n\n”
"Malware Entry:\n”
f"{json.dumps(malware_entry, indent=2)}\n\n"
"Output the complete STIX JSON document."
)
try:
chat_completion = client.chat.completions.create(
model="gpt-4o-mini", #Or any other larger or newer model.
messages=[
{"role": "system", "content": "You are an expert in cyber threat
intelligence and STIX 2.1."},
{"role": "user", "content": prompt}
],
temperature=0.1,
max_tokens=16000,
)
# Use dot notation to access the response content.
stix_json = chat_completion.choices[0].message.content
return stix_json
except Exception as e:
print("Error generating STIX document:", e)
return None
def main():
# Retrieve the last 5 malware entries from Malware Bazaar.
recent_entries = get_recent_malware_entries(limit=5)
if not recent_entries:
print("No recent malware entries found.")
return
for entry in recent_entries:
sha256 = entry.get("sha256_hash", "unknown")
print("Processing malware entry with SHA256:", sha256)
stix_doc = generate_stix_document(entry)
if stix_doc:
file_name = f"stix_{sha256}.json"
with open(file_name, "w") as f:
f.write(stix_doc)
print(f"Saved STIX document to {file_name}\n")
else:
print("Failed to generate STIX document for this entry.\n")
print("Completed processing recent malware entries.")
if __name__ == "__main__":
main()
A copy of the script in Example 4-1 and its output are available in the GitHub repository at https://github.com/The-Art-of-Hacking/h4cker/tree/master/threat_intelligence. The following is a breakdown of what each part of the script in Example 4-1 does:
The first part of the script imports the required libraries and sets up the API key configuration—the os library for accessing environment variables, requests for making HTTP requests, and json for handling JSON data. The script retrieves the OpenAI API key from the environment variable OPENAI_API_KEY. If the key isn’t set, it raises an error.
An instance of the OpenAI client is created using the retrieved API key. This client will be used to interact with the OpenAI API for generating the STIX documents.
The script defines a constant MALWARE_BAZAAR_API_URL, which points to the Malware Bazaar API endpoint. This API provides the latest five malware intelligence data entries using the function get_recent_malware_entries(limit=5).
The generate_stix_document(malware_entry) function converts a single malware entry into a valid STIX 2.1 JSON document.
A prompt is constructed that includes instructions to convert the malware entry into a valid STIX JSON document. The prompt instructs the model to include relevant STIX objects like Malware, Indicator, and Observed Data.
The malware entry is added to the prompt in a nicely formatted JSON string.
A request is then made to OpenAI’s API using the chat.completions.create method, specifying the AI model (such as gpt-4o-mini, but other more recent models can be used) along with system and user messages to guide the model.
The API response is expected to contain the generated STIX document in the response message. The function returns the generated STIX JSON document or None if an error occurs.
The script checks whether it’s being run as the main program (that is, not imported as a module) and then calls the main() function to execute the workflow.
This integration of AI with established standards not only improves efficiency but also helps ensure that threat intelligence remains timely, accurate, and actionable.

