Detection Library
mediumexperimentalLinuxAI/MLOCIT1105
OCI CLI Writing New Training Data From Object Storage
Detects the OCI CLI being spawned from an LLM service process to download data from object storage to training or dataset paths. This pattern indicates an attempt to replace or supplement training data with potentially poisoned content from external storage.
Updated Jan 15, 2025 · Detection Engineering Team
llmdata-poisoninglinuxociowasp-llm04
Problem Statement
An LLM service downloading training data from object storage at runtime, outside an approved pipeline, suggests an attempt to introduce poisoned data that will bias or backdoor model behaviour after retraining.
Sample Logs
{"timestamp":"2025-01-15T05:30:11Z","computer_name":"llm-host-02","user":"llm_svc","image":"/usr/local/bin/oci","command_line":"oci os object bulk-download --bucket-name ext-data --dest-dir /opt/llm/datasets/training/","parent_image":"/opt/llm/app/data_sync.py"}Required Fields
image
command_line
parent_image
user
computer_name
False Positives
- ·Approved data pipeline jobs that pull updated training datasets from OCI Object Storage
Tuning Guidance
Baseline the expected OCI bucket names and destination paths for approved data pipelines. Alert on any bucket or destination path outside this baseline.