Detection Library
mediumexperimentalLinuxAI/MLT1562.001
LLM Service Updating Policy Or Moderation Rules Before Serving
Detects LLM service processes writing to moderation, policy, safety rule, or response filter files. Runtime modification of these controls can disable safety guardrails, enabling the model to produce harmful or misleading outputs.
Updated Jan 15, 2025 · Detection Engineering Team
llmmisinformationlinuxpolicy-tamperowasp-llm09
Problem Statement
Moderation and safety rules are the last line of defense against harmful LLM outputs. Runtime modification of these files effectively disables safety controls, allowing the model to produce content it would otherwise refuse.
Sample Logs
{"timestamp":"2025-01-15T11:08:30Z","computer_name":"llm-host-02","user":"llm_svc","image":"/opt/llm/app/config_updater.py","target_filename":"/opt/llm/config/moderation_rules.yaml","event_type":"file_modify"}Required Fields
image
target_filename
user
computer_name
False Positives
- ·Legitimate moderation rule updates deployed via the approved configuration management pipeline
Tuning Guidance
Moderation and safety rule files should be treated as immutable during serving. Alert on any write outside a defined maintenance window.