How I Use Wazuh + AI to Turn SIEM Alerts into Actionable Playbooks with LLMs

Wazuh + AI: Turn SIEM Alerts into Actionable Playbooks with LLMs

I describe a practical pipeline I built to move raw security signals into clear, repeatable steps that my team can run during incidents. I focus on hands-on setup and what I actually run in production, not just theory.

I rely on file integrity monitoring to spot new or changed files, then trigger a signature scan and containment. I combine those detections with an artificial intelligence model to add context that matters to the business.

My flow ties server and endpoint logs together so analysts see correlated evidence fast. I sketch the components, show where cloud and local options fit, and explain how llms speed triage while keeping human review.

Get your copy now. PowerShell Essentials for Beginners – With Script Samples

PowreShell Essentials for Beginners

Get your copy now. PowerShell Essentials for Beginners – With Script Samples

Covered Points

  • Practical pipeline: FIM → signature scan → automated response → enrichment.
  • Enrichment from a model reduces repetitive work and clarifies severity for analysts.
  • Logs are the backbone for correlation and faster hunting across Linux and Windows.
  • Balance cloud and local models to weigh privacy, cost, and latency.
  • The result: fewer false positives and faster mean time to respond.

Why I’m turning Wazuh alerts into AI-powered playbooks right now

The problem I solve is translating high-volume signals into precise, safe actions for responders. I process large amounts of security information so my team can focus on real risk, not noise.

How this helps: a model speeds identification by synthesizing logs and spotting subtle anomalies that humans can miss. That reduces false positives and shortens the detection-to-response loop.

I map clear roles so everyone knows who validates recommendations, who executes changes, and who documents outcomes. I also use search over historical logs to place a suspicious file or process in context before escalation.

  • I prioritize file integrity events and changes in sensitive directory paths.
  • I ask the model for concise mitigation steps, likely TTPs, and indicators for deeper hunting.
  • I prevent common error patterns by scoping automated response and forbidding broad deletions.

Result: narrower wazuh agent configs, higher signal quality, and faster, better-documented responses that strengthen security posture over time.

My reference architecture: Wazuh server, agents, dashboards, and LLM options

A modern, sleek Wazuh server stands prominently in the foreground, its clean lines and brushed metal exterior reflecting the soft, indirect lighting that bathes the scene. In the middle ground, a collection of Wazuh agents, each represented by a stylized icon, are neatly arranged, communicating with the central server. The background features a minimalist, monochromatic landscape, with subtle hints of a data center or network infrastructure, all rendered with a high-quality, photorealistic aesthetic. The overall mood is one of efficiency, technology, and professionalism. The brand name "techquantus.com" is subtly integrated into the image, lending an air of authority and expertise.

I map a compact reference architecture that keeps logs flowing, models accessible, and endpoints monitored for fast triage.

I run central components (server, indexer, and dashboard) on Ubuntu 24.04. The baseline build is 4.12.0 with at least 16 GB RAM and 4 CPUs to support scalable log ingestion and efficient search.

I enroll Ubuntu and Windows 11 endpoint agents to capture diverse telemetry. Agent policies target key directories and file types so FIM events are consistent across OS types.

Model choices and deployment constraints

Models include a cloud option (OpenAI ChatGPT), a local Llama 3 via Ollama for private inference, and Claude Haiku via Amazon Bedrock integrated into the dashboard UI. I size local models to match CPU/RAM limits to keep response times predictable.

  • Logs and storage: store indexed events so detection rules and enrichment can reference historical records.
  • Security: separate keys from code and restrict connector privileges.
  • Validation: stage curl health checks and post requests to verify connectors and model endpoints.
ComponentExampleNotes
Central serverUbuntu 24.04, 16 GB RAM, 4 CPUsHolds server, indexer, dashboard for ingestion and search
EndpointUbuntu / Windows 11 agentsMonitored directories standardized across OS
ModelOpenAI ChatGPT, Llama 3 (Ollama), Claude Haiku (Bedrock)Mix of cloud and local inference; name groups for maintenance
Checkscurl health and _post testsValidates connector responses and expected formats

Finally, I document the minimal install and following steps so teams can replicate the setup. I keep names clear for models, connectors, and groups to simplify later updates and audits.

Preparing endpoints: Wazuh agent configuration and file monitoring scope

I set a tight monitoring surface so my team sees high-quality signals and fewer false positives. Small, intentional changes to agent scope cut noise and speed validation.

Ubuntu agent: syscheck directories and realtime monitoring

I add this snippet to /var/ossec/etc/ossec.conf to monitor home directories in real time:

<directories realtime=”yes”>/home</directories>

After editing, I restart the agent to apply configuration changes and watch for any error in the startup logs.

Windows agent: monitoring Users\Downloads and permissions

On Windows I add:

<directories realtime=”yes”>C:\Users\*\Downloads</directories>

I confirm service permissions so the agent can read long file names and nested paths. Then I run Restart-Service -Name wazuh.

Verifying logs: locating archives and parsing fields

Archives live in /var/ossec/logs/archives as date-based JSON or JSON.GZ files. I validate that new events appear there and on the wazuh server.

I check parameters in ossec.conf to confirm the directory scope and then parse fields like file path, directory, and rule name for downstream automation.

Active response pipeline: YARA detections enriched by ChatGPT for immediate action

A sleek, modular command center with a prominent "techquantus&. Holographic displays flicker with real-time security data, while a central interface allows for dynamic threat response. Crisp lighting illuminates the brushed metal surfaces, creating an atmosphere of precision and urgency. The active response module stands ready, equipped with advanced sensors and automated defense protocols, poised to rapidly mitigate emerging cyber threats. Elegant curves and clean lines convey a sense of technological sophistication, hinting at the powerful AI-driven capabilities within.

I built a compact active response flow that ties YARA detection to enrichment and immediate remediation. The goal is clear: detect a suspicious file, enrich the finding, attempt a measured response, and write a consistent log entry for the server and analysts.

Installing YARA and rules

I install YARA from source, fetch community rules via curl and valhallaAPI, then set ownership to root:wazuh and permissions to 750. I validate that each rule contains description metadata so enrichment can reference human-friendly context.

Ubuntu: yara.sh behavior

The yara.sh script reads parameters, waits for file writes to complete, runs YARA, captures output, and handles error conditions. On a match it attempts deletion and posts results to the model endpoint. All combined YARA and model text is appended to logs/active-responses.log.

Windows: yara.py → yara.exe

I install Python and YARA, download Valhalla rules, and convert yara.py to yara.exe via PyInstaller for the Active Response bin. The script sends a POST with headers (Authorization, Content-Type), handles 401 invalid key responses, logs the event, and records deletion attempts.

PlatformScriptKey behavior
Ubuntuyara.shExternal key, flags invalid key, writes logs
Windowsyara.exeHeaders, 401 handling, audit logging
ServerLog storeHarmonized fields for decoders

Security notes

I store the API key outside source, explicitly handle invalid key responses so the pipeline never stalls, and verify consistent logs per endpoint. These steps keep the response predictable and auditable.

Server-side intelligence: custom decoders, rules, and the active response module

I centralize server-side parsing so each detection yields structured fields an analyst can trust.

Decoders in local_decoder.xml extract many named values from YARA text. I parse log_type, rule_name, rule_description, author, reference, date, threat_score, api_customer, file_hash, tags, minimum_YARA_version, scanned_file, chatgpt_response, and deletion indicators.

Decoders and rule mapping

I add custom decoders that extract the rule name, description, tags, and threat_score. I also capture the model-generated chatgpt_response field for downstream review.

Rules and monitored paths

I write rules in local_rules.xml to trigger on FIM events in /home and C:\Users\*\Downloads (IDs 100300–100303) and on YARA groups (108000–108003).

Active response linkage

The active response module is configured in ossec.conf to run YARA and orchestrate the response when those rules fire. This ties detection to remediation and logging.

“Version your configuration so changes to name mappings or expected values can be rolled back if an error appears.”

  • I verify logs after deployment to confirm normalized fields for dashboards and searches.
  • I specify the role each rule plays: detection, enrichment, or action.
  • I include checks for model fields so the pipeline stays consistent when enrichment is offline.

Wazuh + AI: Turn SIEM Alerts into Actionable Playbooks with LLMs

My approach starts by shaping short prompts that extract impact, scope, and clear fixes from each detection. I keep prompts focused so the model returns a compact mitigation suggestion, a risk value, and a short rationale.

Designing the prompt strategy

I use templates that feed the model: rule name, file path, file hash, timestamps, and a short excerpt from the detection message. These fields force the model to ground its content in the evidence I provide.

Structuring outputs

Standardized outputs include a one-line response, key parameters, a value statement about risk reduction, and 2–3 remediation steps. Consistent structure reduces analyst decisions and speeds action.

Linking detection-to-action

I map each rule hit to a specific role and endpoint. That mapping becomes the playbook step so the operator sees who acts, which host to target, and what to run.

Documenting results

I log every model query and response, group changes, and playbook name versions. This creates an audit trail for security reviews and supports iterative improvements.

“Keep prompts short, explicit, and tied to evidence so recommendations remain practical and verifiable.”

  • I include a lightweight query step to confirm assumptions before any automated action.
  • I document naming conventions for playbooks and file contexts to aid search and maintenance.
  • I require the model to return parameters in a fixed JSON-like format to simplify parsing and logging.

Running LLMs locally: Llama 3 with Ollama for private, fast threat hunting

I host the model on my server so semantic search runs close to the logs and returns fast results.

Ollama setup and resource checks

I install Ollama using the curl installer, then pull llama3 and confirm CPU and RAM are sufficient for responsive inference.

threat_hunter.py: embeddings and FAISS

I enable archives at /var/ossec/logs/archives/archives.json. threat_hunter.py loads those files, builds embeddings with all-MiniLM-L6-v2, and creates a FAISS vector store for fast semantic search.

WebSocket chatbot and commands

I run a WebSocket chatbot that accepts /help, /reload, /set days, and /stat. Short queries and focused messages yield better output and faster retrieval.

Remote mode and secure access

Remote mode uses SSH with group permissions assigned to wazuh. I limit machine access, audit messages, and verify endpoint permissions before exposing file reads.

“Keep queries concise and ground them in evidence so retrieval surfaces meaningful clusters across time windows.”

  • I test that model output surfaces suspicious file activity and that different endpoints affect relevance.
  • I record initialization messages and validate the environment before running as a daemon.

LLM in the wazuh dashboard: integrating Claude 3.5 Haiku via OpenSearch Assistant

I explain the steps to enable Claude 3.5 Haiku in Bedrock, wire it into OpenSearch Assistant, and make the chat available in the wazuh dashboard.

First, I create an IAM user, generate access keys, and attach AmazonBedrockFullAccess plus a custom marketplace policy so the model can be invoked. I keep keys out of source and record them in a secrets store.

On the host I copy the OpenSearch dashboard plugins (observabilityDashboards, mlCommonsDashboards, assistantDashboards), set ownership and permissions, and enable assistant.chat.enabled. On the indexer I install opensearch-flow-framework and opensearch-skills so model calls can route correctly.

Connector, model group, deploy, and test

Using DevTools I set ML to run on any node, then POST a connector with secure headers, access_key, secret_key, region (use us-west-2 for Haiku), anthropic_version, and model parameters. I register a model group, deploy the model, and test with _predict to verify output and response time.

  • I register an agent, map that agent to OpenSearch Assistant, and refresh the wazuh dashboard UI so analysts see the chat near logs.
  • I watch connector logs to catch authorization or region error responses and retry in us-west-2 if needed.
  • I maintain audit records of keys, group assignments, and model deployments so intelligence is traceable and compliant.

“Test _predict and logs first; a healthy connector produces consistent output and clear response codes.”

From alerts to repeatable runbooks: how I operationalize AI guidance

I operationalize model recommendations by turning them into precise, auditable runbooks. Each playbook lists the actor, the file or directory target, and the exact commands to run. This makes the guidance repeatable and reviewable.

Codifying actions: active response module steps, curl-based checks, and rollback plans

I map model output into an active response module entry that contains clear steps. Each step includes a dry-run command, a curl-based health check for the server, and a final remediation command for the matching file.

I always include an explicit rollback and approval gate for high-impact actions. Low-risk types get automated deletion when YARA matches and the model confidence is high. High-risk changes require human sign-off before the response executes.

Measuring impact: fewer false positives, faster response, clearer mitigation playbooks

I track metrics on logs volume, time-to-detect, and time-to-contain to demonstrate the value of each change. The Llama 3 threat_hunter.py app speeds log review while Claude Haiku provides in-dashboard Q&A for analysts.

  • I codify a single type of action per scenario and map it to the responsible role and endpoint.
  • I standardize server-side validation so file remediation runs reliably across platforms.
  • I schedule periodic restart wazuh windows to deploy tuned rules without disrupting operations.
  • I review prompts, decoders, and rules quarterly to reduce false positives and keep playbooks crisp.

“Automate small, reversible changes first; widen scope only after you measure consistent, low-risk outcomes.”

Conclusion

Conclusion:

This method ties raw events to specific roles, machines, and commands so work is consistent and traceable. I combined detection, enrichment, and action to produce repeatable runbooks that teams can execute with confidence.

I keep disciplined configuration, schedule periodic restart wazuh windows, and store each key and secret in a dedicated vault. I log all attempts, approvals, and the resulting post entries to satisfy audits.

Users see in-dashboard content and text interactions tied to agent group mappings. I capture machine and information context for every file or directory action. Human oversight remains central: operators validate suggestions, refine prompts, and keep role clarity.

Lightweight post-checks and guardrails limit unintended changes. Next, I expect tighter feedback loops, broader coverage, and steady, measurable improvement across the pipeline.

FAQ

What is my goal when I combine Wazuh and artificial intelligence to create playbooks?

I aim to convert detection events into clear, repeatable remediation steps. I use large language models to enrich alerts with context, proposed commands, and role-based tasks so analysts can act faster and with confidence.

Why am I prioritizing AI-enhanced playbooks now?

I want to reduce mean time to resolution and lower cognitive load for security teams. With more telemetry and complex detections, I find automated intelligence helps triage, suggest safe fixes, and maintain audit trails.

What core components make up my reference architecture?

I run a server with the manager, an indexer (OpenSearch/Elasticsearch), and the dashboard on Ubuntu. Agents on Ubuntu and Windows report logs and file integrity events. For models I evaluate OpenAI ChatGPT, local Llama 3 via Ollama, and Anthropic Claude via Amazon Bedrock.

How do I configure monitored endpoints and file monitoring scope?

On Ubuntu agents I set syscheck directories and enable real-time monitoring for critical paths. On Windows I monitor Users\Downloads and other high-risk folders and ensure permissions allow the agent to read target files. I apply configuration changes centrally and push them to agents.

How do I verify logs and locate relevant fields?

I inspect archived logs and look for key fields: log, file, name, directory, server, and type. I use the dashboard and CLI tools to filter by agent ID, timestamp, and event type to confirm telemetry completeness before enrichment.

What does my active response pipeline for YARA detections look like?

I deploy YARA rules on endpoints, capture detections, and pass metadata to an LLM for enrichment. The pipeline runs an Ubuntu shell helper (yara.sh) or a Windows helper (yara.py → yara.exe) to gather context, call the model API, and take safe actions such as quarantining or alerting.

How do I install YARA and manage rule requirements?

I install YARA via package manager or build from source, ensure executable permissions, and include required rule metadata. I use curl to fetch rules or valhallaAPI integrations and validate rule format and file permissions before deployment.

What parameters and error handling do I include in endpoint scripts?

My scripts accept parameters like extra_args, output paths, and message payloads. I add robust error handling for timeouts and permission errors, log all attempts, and report stdout/stderr so the manager and dashboard show precise results.

How do Windows helpers handle API calls and delete attempts?

The Windows helper includes API key headers, builds POST requests, and retries safe delete attempts if allowed. I log API responses and increment counters for failed deletions so I can audit automatic remediation actions.

How do I protect API keys and sensitive data in this workflow?

I store keys in secure vaults or use OS-level protected files with strict permissions. I limit key scope, rotate keys regularly, and monitor audit logs for invalid key responses or suspicious usage.

How do I extract useful fields with custom decoders?

I write decoders to parse events and extract rule_name, description, tags, threat_score, and chatgpt_response. Those fields feed rules and the assistant so the model sees structured context instead of raw text blobs.

What rule strategy do I use to trigger actions?

I configure rules to trigger on FIM events under /home and Users\Downloads with a level appropriate to the threat score. Higher levels can auto-run the active response module; lower ones generate analyst prompts for review.

How do I link detections to orchestration and response modules?

When a rule fires, I map it to a module that executes scripts, calls the model, and writes a consistent response object. That object includes parameters, suggested commands, and a severity tag so operators can follow playbook steps.

How do I design prompts so models return consistent remediation guidance?

I structure prompts to include role, endpoint type, detection metadata, and desired output format. I enforce a response schema with fields for steps, commands, rollback, and rationale so outputs are predictable and machine-parsable.

How do I document and version playbooks and results?

I store playbooks and enrichment outputs in a versioned repository or index. I log playbook runs, group actions by agent and incident, and snapshot model outputs to allow audits and repeated testing.

Can I run models locally for private threat hunting?

Yes. I run Llama 3 via Ollama on the same server to reduce latency and avoid external data exfiltration. I install and pull models, manage resource constraints, and isolate the runtime for security.

How do I build a threat hunting assistant with embeddings and FAISS?

I create embeddings from archives.json and store vectors in a FAISS index. My threat_hunter.py loads vectors, executes similarity searches, and formulates queries to the local model for focused hunting sessions.

What features does the WebSocket chatbot provide for operators?

My chatbot supports commands like /help, /reload, /set days, and /stat. It maintains conversation flow, reloads context from archives, and allows running targeted queries while maintaining audit logs.

How do I enable secure remote access and permissions for agents?

I use SSH with key-based auth, restrict sudo where possible, and ensure agents run under least-privilege accounts. I audit access logs and revoke credentials when not needed.

How do I integrate Claude Haiku via OpenSearch Assistant and Bedrock?

I create an IAM user with restricted policies, enable the model in the targeted region, and register the model group in OpenSearch Assistant. Then I create a connector, deploy the predictor, and test using the _predict endpoint before mapping it to agents in the dashboard.

What OpenSearch plugins and connectors are required?

I enable mlCommons, skills, observability, and the assistant plugin. I create connectors for data sources, register model groups, and deploy them so the assistant can return enriched predictions inside the dashboard.

How do I register agents and map them to the assistant in the dashboard?

I register agents in the OpenSearch Assistant, map them to model groups, and refresh the dashboard index. This allows direct agent context to appear in assistant queries and links detections to conversation snippets.

How do I codify actions into repeatable runbooks?

I convert model suggestions into active response steps that include curl-based checks, safe commands, and rollback plans. Each runbook includes preconditions, impact statements, and manual approval gates where needed.

How do I measure the impact of AI-augmented playbooks?

I track metrics like mean time to acknowledge, mean time to remediate, false positive rates, and number of automated vs. manual interventions. These metrics show reduced noise and faster, clearer mitigations over time.

🌐 Language
This blog uses cookies to ensure a better experience. If you continue, we will assume that you are satisfied with it.