Detection Library
highexperimentalLinuxAI/MLT1565.001
LLM Dataset Replaced From Temporary Or User Home Path
Detects file copy, move, or rsync operations replacing training or embedding datasets with files sourced from temporary or user home directories. This is a classic pattern for staged data poisoning attacks.
Updated Jan 15, 2025 · Detection Engineering Team
llmdata-poisoninglinuxdataset-replaceowasp-llm04
Problem Statement
Replacing training datasets from staging areas is the final step of a data poisoning attack. Detecting this file movement prevents poisoned data from being used in the next model fine-tuning run.
Sample Logs
{"timestamp":"2025-01-15T04:02:55Z","computer_name":"llm-host-03","user":"opc","image":"/bin/cp","command_line":"cp /tmp/poisoned_data.jsonl /opt/llm/datasets/training/instructions.jsonl"}Required Fields
image
command_line
user
computer_name
False Positives
- ·Legitimate data preparation scripts that stage files in /tmp before moving to dataset paths
Tuning Guidance
Cross-reference the source file with known approved data pipeline outputs. Alert on any replacement of core instruction-tuning or RLHF datasets.