mediumexperimentalLinuxAI/MLT1565.001

Unexpected Process Editing Embedding Or Retrieval Data Store

Detects non-LLM system utilities (sed, awk, Python, Perl) writing to vector database or embedding store directories. This indicates out-of-band modification of the retrieval layer, a key data poisoning vector for RAG-based systems.

Updated Jan 15, 2025 · Detection Engineering Team

llmdata-poisoninglinuxvector-dbowasp-llm04

Problem Statement

Vector and embedding stores are the retrieval backbone of RAG systems. Out-of-band modification by unexpected processes can inject poisoned documents that cause the LLM to produce attacker-controlled outputs.

Sample Logs

{"timestamp":"2025-01-15T03:44:22Z","computer_name":"llm-host-01","user":"opc","image":"/usr/bin/python3","target_filename":"/opt/llm/vector/chroma/data.sqlite","event_type":"file_modify"}

Required Fields

image

target_filename

user

computer_name

False Positives

·Approved embedding update pipelines that use Python scripts to refresh the vector store

Tuning Guidance

Create a process allowlist specific to the vector store management tooling. Alert on any process outside this list writing to embedding paths.