mediumexperimentalLinuxAI/MLOCIT1105

OCI CLI Writing New Training Data From Object Storage

Detects the OCI CLI being spawned from an LLM service process to download data from object storage to training or dataset paths. This pattern indicates an attempt to replace or supplement training data with potentially poisoned content from external storage.

Updated Jan 15, 2025 · Detection Engineering Team

llmdata-poisoninglinuxociowasp-llm04

Problem Statement

An LLM service downloading training data from object storage at runtime, outside an approved pipeline, suggests an attempt to introduce poisoned data that will bias or backdoor model behaviour after retraining.

Sample Logs

{"timestamp":"2025-01-15T05:30:11Z","computer_name":"llm-host-02","user":"llm_svc","image":"/usr/local/bin/oci","command_line":"oci os object bulk-download --bucket-name ext-data --dest-dir /opt/llm/datasets/training/","parent_image":"/opt/llm/app/data_sync.py"}

Required Fields

image

command_line

parent_image

user

computer_name

False Positives

·Approved data pipeline jobs that pull updated training datasets from OCI Object Storage

Tuning Guidance

Baseline the expected OCI bucket names and destination paths for approved data pipelines. Alert on any bucket or destination path outside this baseline.