highexperimentalLinuxAI/MLT1565.001

LLM Service Replacing Retrieval Corpus Files

Detects LLM service processes writing to knowledge base, corpus, or RAG document index directories. Replacement of the retrieval corpus is a direct mechanism for injecting misinformation into RAG-grounded LLM responses.

Updated Jan 15, 2025 · Detection Engineering Team

llmmisinformationlinuxcorpus-replaceowasp-llm09

Problem Statement

The retrieval corpus is the knowledge source that grounds RAG model responses. Replacing or modifying corpus files allows an attacker to inject false facts that the model will confidently cite as retrieved evidence.

Sample Logs

{"timestamp":"2025-01-15T04:55:20Z","computer_name":"llm-host-01","user":"llm_svc","image":"/opt/llm/app/corpus_updater.py","target_filename":"/opt/llm/rag/knowledge/company_policy.txt","event_type":"file_modify"}

Required Fields

image

target_filename

user

computer_name

False Positives

·Approved knowledge base update pipelines that refresh RAG document stores

Tuning Guidance

Corpus updates should follow a controlled pipeline with content validation and review. Alert on any write outside an approved deployment window or service account.