Research Reference

The definitive practitioner reference for LLM security risks, prompt injection, insecure output handling, supply chain vulnerabilities, and more.

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications

First systematic analysis of prompt injection in real-world LLM integrations, introducing the concept of indirect injection via external data sources.

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Evaluates detection and mitigation strategies against adversarial prompt attacks including perplexity filtering and paraphrasing.

Securing LLM Systems Against Prompt Injection

Practical defense architectures for LLM deployments, covering sandboxing, output validation, and privilege separation patterns.

Red-Teaming Large Language Models

Anthropic's framework for systematic adversarial evaluation of LLMs, covering threat taxonomies and structured attack methodologies.

AI Security Risk Assessment Framework

report

NIST guidance for assessing and managing risks specific to AI systems across the full development and deployment lifecycle.

SaTML '24: Security and Privacy of Machine Learning

conference

Proceedings covering adversarial robustness, privacy attacks on ML models, and emerging AI security research from the 2024 IEEE SaTML conference.

Agentic AI8 references

OWASP Top 10 for Agentic AI Applications

Security risks specific to autonomous AI agents, unsafe tool invocation, goal drift, multi-agent trust exploitation, and data exfiltration patterns.

ReAct: Synergizing Reasoning and Acting in Language Models

Foundational paper introducing the ReAct pattern that enables LLMs to interleave reasoning and action, forming the basis of modern agentic architectures.

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

ToolEmu: automated evaluation of LM agent risks across tool use scenarios, revealing systematic failure modes in production-like environments.

Agent Security Bench: Evaluating the Security of LLM Agents

Comprehensive benchmark covering 10 agent attack types across 55 test cases, providing a structured framework for measuring agentic AI security posture.

Attacking Vision-Language Computer Use Agents with Indirect Prompt Injection

Demonstrates indirect prompt injection attacks against computer-use agents via manipulated screen content, highlighting a new attack surface.

The Danger of Fully Autonomous AI Agents

Analysis of goal misalignment and catastrophic risk in fully autonomous AI systems, with recommendations for oversight mechanisms.

Model Context Protocol (MCP) Security Considerations

Security analysis of the MCP standard for AI tool integration, covering trust boundaries, authorization risks, and recommended mitigations.

Multi-Agent Systems Security: Trust and Verification Challenges

Research on authentication and trust propagation in multi-agent pipelines, covering orchestrator compromise and inter-agent injection vectors.

Detection Engineering8 references

MITRE ATT&CK: Design and Philosophy

paper

The foundational paper describing the ATT&CK framework's design, intended use for adversary behavior modeling and detection gap analysis.

Sigma: Generic Signature Format for SIEM Systems

Original Sigma specification paper describing the YAML-based signature format for SIEM-agnostic detection rule authoring.

Detection Engineering Maturity Matrix

Kyle Bailey's framework for measuring and advancing detection engineering capability maturity across people, process, and technology dimensions.

Alerting and Detection Strategy Framework

paper

Palantir's ADS framework for structuring detection hypotheses, covering goal, categorization, technical context, and response guidance.

The Pyramid of Pain

David Bianco's classic model illustrating the relative difficulty of denying adversaries different types of IOCs, from hash values to TTPs.

Detection Engineering with PySpark at Scale

Engineering patterns for implementing distributed behavioral detection pipelines using Apache Spark, handling class imbalance, windowing, and UDFs.

Evasion Attacks Against Machine Learning at Test Time

Foundational adversarial ML paper on evasion attacks, essential for understanding how detection ML models can be evaded.

SOC Prime Threat Detection Marketplace: Community Detection Patterns

Analysis of production detection rule patterns across enterprise deployments, covering rule quality metrics and coverage distribution.

Insider Threat7 references

Common Sense Guide to Mitigating Insider Threats (7th Edition)

report

CERT/CC's authoritative guide to insider threat program development, incident analysis, and technical controls for detection.

Insider Threat Indicator Ontology

paper

CERT research defining a structured ontology for insider threat indicators across technical and behavioral signal categories.

User and Entity Behavior Analytics (UEBA): Baseline Construction and Drift Detection

Statistical methods for building behavioral baselines, detecting drift, and managing false positive rates in enterprise UEBA deployments.

Detecting Malicious Insider Threat: A Survey

Comprehensive survey of insider threat detection techniques including anomaly detection, machine learning, and psychological indicators.

Graph-Based Anomaly Detection for Insider Threat

Using graph analytics on enterprise activity data to detect lateral movement and privilege abuse patterns indicative of insider threats.

CERT Insider Threat Dataset (CERT/CC)

report

Reference documentation for the widely-used CERT synthetic insider threat dataset, covering log types, scenario descriptions, and evaluation methodology.

Handling Class Imbalance in Security Anomaly Detection

Techniques for training anomaly detectors under extreme positive-class scarcity, oversampling, cost-sensitive learning, and evaluation framework.

LLM Security8 references

Prompt Injection Attacks Against GPT-3

First systematic study of prompt injection in deployed GPT-3 applications, introducing direct and indirect injection taxonomy.

Jailbroken: How Does LLM Safety Training Fail?

Analysis of failure modes in RLHF safety training, identifying competing objectives and generalization failures that enable jailbreaks.

Backdoor Attacks on Language Models

Survey of training-time backdoor attacks against LLMs, poisoning datasets to create triggered behaviors that bypass safety evaluations.

Privacy Side Channels in Machine Learning Systems

How LLMs memorize and leak training data through inference-time attacks, membership inference, extraction attacks, and mitigations.

Do Anything Now (DAN): Jailbreak Taxonomy and Defenses

Comprehensive taxonomy of LLM jailbreak techniques in the wild, with analysis of defense effectiveness across attack categories.

Extracting Training Data from Large Language Models

Demonstrates that LLMs memorize and reproduce verbatim training sequences, enabling PII and sensitive data extraction at scale.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Randomized smoothing defense against adversarial prompt attacks, the first provably robust defense for aligned LLMs.

LLM Security: Threat Modeling for Production Deployments