Detection Library
mediumexperimentalLinuxAI/MLT1562.001

LLM Service Updating Policy Or Moderation Rules Before Serving

Detects LLM service processes writing to moderation, policy, safety rule, or response filter files. Runtime modification of these controls can disable safety guardrails, enabling the model to produce harmful or misleading outputs.

Updated Jan 15, 2025 · Detection Engineering Team

llmmisinformationlinuxpolicy-tamperowasp-llm09

Problem Statement

Moderation and safety rules are the last line of defense against harmful LLM outputs. Runtime modification of these files effectively disables safety controls, allowing the model to produce content it would otherwise refuse.

Sample Logs

{"timestamp":"2025-01-15T11:08:30Z","computer_name":"llm-host-02","user":"llm_svc","image":"/opt/llm/app/config_updater.py","target_filename":"/opt/llm/config/moderation_rules.yaml","event_type":"file_modify"}

Required Fields

image
target_filename
user
computer_name

False Positives

  • ·Legitimate moderation rule updates deployed via the approved configuration management pipeline

Tuning Guidance

Moderation and safety rule files should be treated as immutable during serving. Alert on any write outside a defined maintenance window.