Research

Research Reference

Curated papers, reports, and resources across AI security, agentic AI, detection engineering, and LLM security.

AI Security8 references

Universal and Transferable Adversarial Attacks on Aligned Language Models

arXiv

Demonstrates that suffix-based adversarial attacks can reliably jailbreak aligned LLMs, challenging the robustness of RLHF-based safety training.

OWASP Top 10 for Large Language Model Applications (2025)

standard

The definitive practitioner reference for LLM security risks — prompt injection, insecure output handling, supply chain vulnerabilities, and more.

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications

arXiv

First systematic analysis of prompt injection in real-world LLM integrations, introducing the concept of indirect injection via external data sources.

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

arXiv

Evaluates detection and mitigation strategies against adversarial prompt attacks including perplexity filtering and paraphrasing.

Securing LLM Systems Against Prompt Injection

blog

Practical defense architectures for LLM deployments, covering sandboxing, output validation, and privilege separation patterns.

Red-Teaming Large Language Models

arXiv

Anthropic's framework for systematic adversarial evaluation of LLMs, covering threat taxonomies and structured attack methodologies.

AI Security Risk Assessment Framework

report

NIST guidance for assessing and managing risks specific to AI systems across the full development and deployment lifecycle.

SaTML '24: Security and Privacy of Machine Learning

conference

Proceedings covering adversarial robustness, privacy attacks on ML models, and emerging AI security research from the 2024 IEEE SaTML conference.

Agentic AI8 references

OWASP Top 10 for Agentic AI Applications

standard

Security risks specific to autonomous AI agents — unsafe tool invocation, goal drift, multi-agent trust exploitation, and data exfiltration patterns.

ReAct: Synergizing Reasoning and Acting in Language Models

arXiv

Foundational paper introducing the ReAct pattern that enables LLMs to interleave reasoning and action, forming the basis of modern agentic architectures.

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

arXiv

ToolEmu: automated evaluation of LM agent risks across tool use scenarios, revealing systematic failure modes in production-like environments.

Agent Security Bench: Evaluating the Security of LLM Agents

arXiv

Comprehensive benchmark covering 10 agent attack types across 55 test cases, providing a structured framework for measuring agentic AI security posture.

Attacking Vision-Language Computer Use Agents with Indirect Prompt Injection

arXiv

Demonstrates indirect prompt injection attacks against computer-use agents via manipulated screen content, highlighting a new attack surface.

The Danger of Fully Autonomous AI Agents

arXiv

Analysis of goal misalignment and catastrophic risk in fully autonomous AI systems, with recommendations for oversight mechanisms.

Model Context Protocol (MCP) Security Considerations

standard

Security analysis of the MCP standard for AI tool integration, covering trust boundaries, authorization risks, and recommended mitigations.

Multi-Agent Systems Security: Trust and Verification Challenges

arXiv

Research on authentication and trust propagation in multi-agent pipelines, covering orchestrator compromise and inter-agent injection vectors.

Detection Engineering8 references
Insider Threat7 references
LLM Security8 references