Exploring Syntactical Features for Anomaly Detection in Application Logs

Abstract

This work analyzes the effect of lightweight syntactical feature extraction techniques from the field of information retrieval for log abstraction in information security applications.

Key Contributions

Feature Extraction Techniques: We evaluate three different syntactical feature extraction methods for log analysis.

Clustering Algorithm Comparison: We compare three clustering algorithms for anomaly detection on extracted features.

Multi-Dataset Evaluation: We evaluate on four different security datasets to ensure generalizability.

Methods

Traditional vector space models (TF-IDF)
Log template extraction
N-gram based features

Clustering Approaches

K-means clustering
DBSCAN
Hierarchical clustering

Key Findings

Lightweight syntactical features provide a good balance between computational efficiency and detection performance. These methods are particularly suitable for resource-constrained environments or real-time analysis requirements.

Practical Impact

The techniques explored are deployable in production security operations centers where computational resources and latency are important considerations.

Cite This Work

@article{Copstein2022Syntactical, title = {Exploring syntactical features for anomaly detection in application logs}, author = {Copstein, Rafael and Karlsen, Egil and Schwartzentruber, Jeff and Zincir-Heywood, Nur and Heywood, Malcolm}, journal = {it - Information Technology}, volume = {64}, number = {1-2}, pages = {15--27}, year = {2022}, doi = {10.1515/itit-2021-0064} }

Authors