Detection Library
mediumexperimentalLinuxAI/MLT1565.001
Unexpected Process Editing Embedding Or Retrieval Data Store
Detects non-LLM system utilities (sed, awk, Python, Perl) writing to vector database or embedding store directories. This indicates out-of-band modification of the retrieval layer, a key data poisoning vector for RAG-based systems.
Updated Jan 15, 2025 · Detection Engineering Team
llmdata-poisoninglinuxvector-dbowasp-llm04
Problem Statement
Vector and embedding stores are the retrieval backbone of RAG systems. Out-of-band modification by unexpected processes can inject poisoned documents that cause the LLM to produce attacker-controlled outputs.
Sample Logs
{"timestamp":"2025-01-15T03:44:22Z","computer_name":"llm-host-01","user":"opc","image":"/usr/bin/python3","target_filename":"/opt/llm/vector/chroma/data.sqlite","event_type":"file_modify"}Required Fields
image
target_filename
user
computer_name
False Positives
- ·Approved embedding update pipelines that use Python scripts to refresh the vector store
Tuning Guidance
Create a process allowlist specific to the vector store management tooling. Alert on any process outside this list writing to embedding paths.