← All Publications
Annals of Telecom2024

Large Language Models and Unsupervised Feature Learning: Implications for Log Analysis

Exploring LLM embeddings for distinguishing behaviors in log files via unsupervised learning for anomaly detection.

Published at Annals of Telecommunications

Authors

Egil Karlsen, Xiao Luo, Nur Zincir-Heywood, Malcolm I. Heywood

Abstract

Log file analysis using large language models provides mechanisms for discovering embeddings that distinguish between different behaviors present in system logs. This work investigates unsupervised learning approaches for discriminating between normal and anomalous behaviors.

Key Contributions

  1. Unsupervised Anomaly Detection: We explore how LLM-derived embeddings can be used for unsupervised anomaly detection without requiring labeled training data.
  1. Feature Learning Analysis: We analyze how different LLM architectures learn features relevant to security log analysis.
  1. Practical Guidelines: We provide recommendations for practitioners deploying unsupervised log analysis systems.

Methodology

  • Extract embeddings from pre-trained and fine-tuned LLMs
  • Apply clustering algorithms to identify behavioral patterns
  • Evaluate anomaly detection performance across multiple datasets

Key Findings

LLM embeddings capture semantic relationships in log data that enable effective unsupervised learning. The learned representations naturally separate normal operational patterns from anomalous events without explicit supervision.

Applications

This approach is particularly valuable in scenarios where labeled anomaly data is scarce or unavailable, which is common in real-world security operations.

Cite This Work

@article{Karlsen2024LLMUnsupervised,
    title     = {Large language models and unsupervised feature learning: implications for log analysis},
    author    = {Karlsen, Egil and Luo, Xiao and Zincir-Heywood, Nur and Heywood, Malcolm I.},
    journal   = {Annals of Telecommunications},
    volume    = {79},
    pages     = {711--729},
    year      = {2024},
    doi       = {10.1007/s12243-024-01028-2}
}