Research
Research Reference
Curated papers, reports, and resources across AI security, agentic AI, detection engineering, and LLM security.
Universal and Transferable Adversarial Attacks on Aligned Language Models
Demonstrates that suffix-based adversarial attacks can reliably jailbreak aligned LLMs, challenging the robustness of RLHF-based safety training.
OWASP Top 10 for Large Language Model Applications (2025)
The definitive practitioner reference for LLM security risks — prompt injection, insecure output handling, supply chain vulnerabilities, and more.
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications
First systematic analysis of prompt injection in real-world LLM integrations, introducing the concept of indirect injection via external data sources.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Evaluates detection and mitigation strategies against adversarial prompt attacks including perplexity filtering and paraphrasing.
Securing LLM Systems Against Prompt Injection
Practical defense architectures for LLM deployments, covering sandboxing, output validation, and privilege separation patterns.
Red-Teaming Large Language Models
Anthropic's framework for systematic adversarial evaluation of LLMs, covering threat taxonomies and structured attack methodologies.
AI Security Risk Assessment Framework
NIST guidance for assessing and managing risks specific to AI systems across the full development and deployment lifecycle.
SaTML '24: Security and Privacy of Machine Learning
Proceedings covering adversarial robustness, privacy attacks on ML models, and emerging AI security research from the 2024 IEEE SaTML conference.
OWASP Top 10 for Agentic AI Applications
Security risks specific to autonomous AI agents — unsafe tool invocation, goal drift, multi-agent trust exploitation, and data exfiltration patterns.
ReAct: Synergizing Reasoning and Acting in Language Models
Foundational paper introducing the ReAct pattern that enables LLMs to interleave reasoning and action, forming the basis of modern agentic architectures.
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
ToolEmu: automated evaluation of LM agent risks across tool use scenarios, revealing systematic failure modes in production-like environments.
Agent Security Bench: Evaluating the Security of LLM Agents
Comprehensive benchmark covering 10 agent attack types across 55 test cases, providing a structured framework for measuring agentic AI security posture.
Attacking Vision-Language Computer Use Agents with Indirect Prompt Injection
Demonstrates indirect prompt injection attacks against computer-use agents via manipulated screen content, highlighting a new attack surface.
The Danger of Fully Autonomous AI Agents
Analysis of goal misalignment and catastrophic risk in fully autonomous AI systems, with recommendations for oversight mechanisms.
Model Context Protocol (MCP) Security Considerations
Security analysis of the MCP standard for AI tool integration, covering trust boundaries, authorization risks, and recommended mitigations.
Multi-Agent Systems Security: Trust and Verification Challenges
Research on authentication and trust propagation in multi-agent pipelines, covering orchestrator compromise and inter-agent injection vectors.
MITRE ATT&CK: Design and Philosophy
The foundational paper describing the ATT&CK framework's design, intended use for adversary behavior modeling and detection gap analysis.
Sigma: Generic Signature Format for SIEM Systems
Original Sigma specification paper describing the YAML-based signature format for SIEM-agnostic detection rule authoring.
Detection Engineering Maturity Matrix
Kyle Bailey's framework for measuring and advancing detection engineering capability maturity across people, process, and technology dimensions.
Alerting and Detection Strategy Framework
Palantir's ADS framework for structuring detection hypotheses, covering goal, categorization, technical context, and response guidance.
The Pyramid of Pain
David Bianco's classic model illustrating the relative difficulty of denying adversaries different types of IOCs — from hash values to TTPs.
Detection Engineering with PySpark at Scale
Engineering patterns for implementing distributed behavioral detection pipelines using Apache Spark — handling class imbalance, windowing, and UDFs.
Evasion Attacks Against Machine Learning at Test Time
Foundational adversarial ML paper on evasion attacks — essential for understanding how detection ML models can be evaded.
SOC Prime Threat Detection Marketplace: Community Detection Patterns
Analysis of production detection rule patterns across enterprise deployments, covering rule quality metrics and coverage distribution.
Common Sense Guide to Mitigating Insider Threats (7th Edition)
CERT/CC's authoritative guide to insider threat program development, incident analysis, and technical controls for detection.
Insider Threat Indicator Ontology
CERT research defining a structured ontology for insider threat indicators across technical and behavioral signal categories.
User and Entity Behavior Analytics (UEBA): Baseline Construction and Drift Detection
Statistical methods for building behavioral baselines, detecting drift, and managing false positive rates in enterprise UEBA deployments.
Detecting Malicious Insider Threat: A Survey
Comprehensive survey of insider threat detection techniques including anomaly detection, machine learning, and psychological indicators.
Graph-Based Anomaly Detection for Insider Threat
Using graph analytics on enterprise activity data to detect lateral movement and privilege abuse patterns indicative of insider threats.
CERT Insider Threat Dataset (CERT/CC)
Reference documentation for the widely-used CERT synthetic insider threat dataset, covering log types, scenario descriptions, and evaluation methodology.
Handling Class Imbalance in Security Anomaly Detection
Techniques for training anomaly detectors under extreme positive-class scarcity — oversampling, cost-sensitive learning, and evaluation framework.
Prompt Injection Attacks Against GPT-3
First systematic study of prompt injection in deployed GPT-3 applications, introducing direct and indirect injection taxonomy.
Jailbroken: How Does LLM Safety Training Fail?
Analysis of failure modes in RLHF safety training, identifying competing objectives and generalization failures that enable jailbreaks.
Backdoor Attacks on Language Models
Survey of training-time backdoor attacks against LLMs — poisoning datasets to create triggered behaviors that bypass safety evaluations.
Privacy Side Channels in Machine Learning Systems
How LLMs memorize and leak training data through inference-time attacks — membership inference, extraction attacks, and mitigations.
Do Anything Now (DAN): Jailbreak Taxonomy and Defenses
Comprehensive taxonomy of LLM jailbreak techniques in the wild, with analysis of defense effectiveness across attack categories.
Extracting Training Data from Large Language Models
Demonstrates that LLMs memorize and reproduce verbatim training sequences, enabling PII and sensitive data extraction at scale.
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Randomized smoothing defense against adversarial prompt attacks — the first provably robust defense for aligned LLMs.
LLM Security: Threat Modeling for Production Deployments
Practical threat modeling guide for LLM-integrated applications, covering attack surfaces, trust boundaries, and monitoring recommendations.