Abstract
Security log analysis is critical for detecting threats and anomalies in modern systems. This work explores how Large Language Models with different architectures can be leveraged to better analyze application and system log files for security purposes.
Key Contributions
- LLM4Sec Pipeline: We propose and implement a new experimentation pipeline that leverages LLMs for log analysis experimentation, evaluation, and analysis.
- Comprehensive Benchmarking: We deploy and benchmark 60 fine-tuned language models across six datasets from web application and system log sources.
- State-of-the-Art Results: Our best-performing fine-tuned model (DistilRoBERTa) achieves an average F1-Score of 0.998, outperforming previous approaches.
Models Evaluated
- BERT
- RoBERTa
- DistilRoBERTa
- GPT-2
- GPT-Neo
Key Findings
The results demonstrate that LLMs can perform log analysis effectively, with fine-tuning being particularly important for appropriate domain adaptation to specific log types. The transformer-based architectures show strong capability in understanding the semantic structure of log messages.
Impact
This work provides security and ML practitioners with deeper insight when selecting features and algorithms for log analysis tasks, establishing benchmarks for future research in this domain.