Skip to content
Archwarden
Go back
Lab Completed

AI Threat Intel Agent — Part 2: Stack Awareness

Skills PythonNVD APICISA KEVAI AgentsThreat IntelligenceAutomationOllamaProxmox

CVE Agent

Overview

Part 1 ended with DANEEL running daily on a Proxmox LXC, generating a briefing, and emailing it every morning. The pipeline worked. The briefings were not useful.

The first real email arrived with a list of CVEs, a recommendation to “immediately assess and address critical findings,” and zero indication of whether any of it was relevant to anything I actually run. It was exactly the kind of generic noise I built this to replace.

Part 2 fixes that. DANEEL now knows what software is installed on my Mac, compares CVE affected version ranges against what I have, and classifies each match as VULNERABLE, PATCHED, NO PATCH, or UNCONFIRMED. The email only sends when there is something worth acting on.


The Problem

The first briefing arrived. The macOS version matched two CVEs. The recommendation was to update macOS. I was already on the latest version.

A daily email that tells you to update software you already updated is noise. After a few days of that, you start ignoring the emails — which defeats the entire point.

The fix requires three things:

  1. Inventory — know what software is installed and what version
  2. Version comparison — check whether the installed version is actually in the affected range
  3. Smart delivery — only send email when there is a real action to take

Building the Mac Inventory

The inventory system is built as a pluggable collector architecture. Each host type — macOS, Linux, Proxmox, TrueNAS — gets its own collector class. Adding a new host later is one entry in a config file; the agent handles the rest.

The Mac collector pulls from three sources:

The brew leaves distinction matters. brew list --versions returns every package in the dependency tree — brotli, c-ares, libnghttp2, and 30 other libraries nobody cares about for CVE purposes. brew leaves returns only what was explicitly installed. That takes the list from 51 entries to 23 actionable ones.

The inventory script runs on the Mac and writes stack.json. That file gets pushed to daneel before the morning run, giving the agent a live picture of what is installed.

# brew leaves -- only user-installed packages, not their dependencies
leaves_result = subprocess.run(["brew", "leaves"], capture_output=True, text=True)
top_level = {line.strip() for line in leaves_result.stdout.splitlines()}

First Test — Two Problems Immediately

Wiring the stack into the agent and running it produced 30 matches and a briefing full of CVEs from 2021.

Problem one: 30 matches. The keyword “code” from VS Code was matching “code execution” in every RCE description. “Node” was matching network CVEs. The keyword matching was far too broad.

First stack test -- 30 false positive matches and hallucinated CVEs

The fix: a stoplist of generic words that appear everywhere and mean nothing for matching purposes, plus a minimum keyword length of 5 characters. “code”, “node”, “manager”, “stream”, “browser” — all excluded.

Problem two: hallucinations. The briefing contained Log4j CVEs from 2021 with accurate-looking descriptions. Mistral was pulling from its training data instead of analyzing the CVE list we passed it. The system prompt needed a harder constraint:

CRITICAL: Only reference CVEs explicitly listed in the user message.
Do not add, invent, or recall CVEs from memory or training data.

After the fixes, matches dropped from 30 to 1.


The Generic Advice Problem

One match, correct CVE IDs — but the briefing read like a security awareness training module.

To mitigate these risks, it is essential to keep your software and systems up-to-date with the latest security patches and follow secure coding practices during development.

Second test -- correct CVEs, useless generic advice

Mistral 7B tends toward this when given an open-ended prompt. The fix is to stop asking it to write prose and instead give it a rigid template to fill in. Each section specifies the exact format, line count, and what information to include. The model becomes a form-filler rather than an essayist.

Third test -- structured output, specific findings

The briefing was now producing real CVE IDs with specific version numbers and actionable steps. Still not perfect — the LLM was still occasionally adding CVEs from the general list into the Stack Matches section — but the structure was working.


Version-Aware Matching

The keyword matcher tells you that a CVE mentions software in your stack. It does not tell you whether you are actually vulnerable.

NVD includes configurations data in each CVE entry with CPE (Common Platform Enumeration) records that specify affected version ranges:

{
  "vulnerable": true,
  "criteria": "cpe:2.3:o:apple:macos:*:*:*:*:*:*:*:*",
  "versionStartIncluding": "26.0",
  "versionEndExcluding": "26.5"
}

versionEndExcluding: "26.5" means 26.5 is the first safe version. If you are on 26.5, you are patched. If you are on 26.4, you are vulnerable. If there is no versionEndExcluding at all, no patch exists yet.

This produces four status values:

StatusMeaning
VULNERABLEYour version is in the affected range. A fix exists. Update now.
PATCHEDA fix exists and you are already on it. No action needed.
NO_PATCHNo fix version in NVD. A patch may not exist yet. Apply mitigations.
NO_DATANVD has not yet populated version ranges. Common for brand-new CVEs. Verify manually.

The matching and classification live in a separate matcher.py module to keep the agent code clean. Adding version comparison for a new host type later means extending the matcher, not rewriting the agent.

Fourth test -- version classification working, but Stack Matches still has LLM additions


Taking the Stack Matches Section Away from the LLM

The LLM kept adding entries from the general CVE list into the Stack Matches section even after multiple prompt revisions. Every instruction to “only include CVEs from the STACK MATCHES section above” got ignored.

I finally just decided to stop asking.

The Stack Matches section is now built entirely in Python from the classified match data. The LLM never sees it as something to write — it is already written before the LLM gets involved. The LLM only generates Critical Findings, Notable CVEs, and the Recommendation, where its analysis actually adds value.

# Stack Matches built in code -- no LLM, no hallucination possible
stack_section = self._build_stack_matches_section(partitioned)
llm_output    = self._call_llm(context, matches)

full_briefing = stack_section + "\n\n" + llm_output

Fifth test -- Stack Matches clean, no LLM additions

Stack Matches now contains exactly what the classifier found. Nothing else.


Platform Capability Keywords

The Bluetooth LE CVE (CVE-2026-8676, CVSS 8.8) did not match on the first pass. Bluetooth is not a piece of software in the installed apps list — it is a hardware capability of the Mac. The keyword “bluetooth” was not in the stack.

Every Mac has Bluetooth. Every Mac with Bluetooth LE is potentially affected by a Bluetooth LE CVE. The inventory system needed to account for OS-level capabilities alongside installed software.

The solution: when the host OS is macOS, the software map automatically includes capability keywords that map to the OS version for version comparison:

MACOS_CAPABILITY_KEYWORDS = [
    "macos", "mac", "apple", "safari", "webkit",
    "bluetooth", "wifi", "kernel", "thunderbolt",
    "gatekeeper", "filevault", "coreaudio",
]

All of these resolve to the macOS version for comparison. A CVE mentioning “bluetooth” in the context of macOS now correctly matches and compares against the installed macOS version.


Smart Email Logic

With version classification in place, sending an email every morning regardless of findings made no sense. The email logic now gates on status:

When the email arrives, it means something. A [REVIEW NEEDED] subject means look at the NVD links. An [ACTION REQUIRED] subject means update something today.

Days where everything is already patched produce no email and a saved briefing on daneel for the log.


Current State

The full stack-aware pipeline:

Future updates will extend inventory collection to the homelab — Proxmox, daneel, and TrueNAS — so the briefing covers the full environment, not just the Mac.


Key Takeaways



Next
AI Threat Intel Agent — Part 1: Building DANEEL