Detection Library
highexperimentalLinuxAI/MLT1565.001
LLM Training Or Fine-Tune Data Files Modified
Detects LLM service processes modifying training dataset files (JSONL, Parquet, CSV, Arrow) in training or fine-tuning directories. Modification of training data at runtime is a strong indicator of data poisoning.
Updated Jan 15, 2025 · Detection Engineering Team
llmdata-poisoninglinuxtraining-dataowasp-llm04
Problem Statement
Modifying training or fine-tuning datasets at runtime can cause subsequent model updates to produce biased, backdoored, or attacker-aligned outputs, representing a persistent and difficult-to-detect form of model compromise.
Sample Logs
{"timestamp":"2025-01-15T06:20:33Z","computer_name":"llm-host-01","user":"llm_svc","image":"/opt/llm/app/data_processor.py","target_filename":"/opt/llm/datasets/training/instructions.jsonl","event_type":"file_modify"}Required Fields
image
target_filename
user
computer_name
False Positives
- ·Approved active learning or online fine-tuning pipelines that update training data
Tuning Guidance
Training data directories should be immutable during inference. Any write event outside a designated training window should alert.