← All Publications
it - Info Tech2022

Exploring Syntactical Features for Anomaly Detection in Application Logs

Analyzing lightweight syntactical feature extraction techniques from information retrieval for log abstraction in security.

Published at it - Information Technology

Authors

Rafael Copstein, Egil Karlsen, Jeff Schwartzentruber, Nur Zincir-Heywood, Malcolm I. Heywood

Abstract

This work analyzes the effect of lightweight syntactical feature extraction techniques from the field of information retrieval for log abstraction in information security applications.

Key Contributions

  1. Feature Extraction Techniques: We evaluate three different syntactical feature extraction methods for log analysis.
  1. Clustering Algorithm Comparison: We compare three clustering algorithms for anomaly detection on extracted features.
  1. Multi-Dataset Evaluation: We evaluate on four different security datasets to ensure generalizability.

Methods

  • Traditional vector space models (TF-IDF)
  • Log template extraction
  • N-gram based features

Clustering Approaches

  • K-means clustering
  • DBSCAN
  • Hierarchical clustering

Key Findings

Lightweight syntactical features provide a good balance between computational efficiency and detection performance. These methods are particularly suitable for resource-constrained environments or real-time analysis requirements.

Practical Impact

The techniques explored are deployable in production security operations centers where computational resources and latency are important considerations.

Cite This Work

@article{Copstein2022Syntactical,
    title     = {Exploring syntactical features for anomaly detection in application logs},
    author    = {Copstein, Rafael and Karlsen, Egil and Schwartzentruber, Jeff and Zincir-Heywood, Nur and Heywood, Malcolm},
    journal   = {it - Information Technology},
    volume    = {64},
    number    = {1-2},
    pages     = {15--27},
    year      = {2022},
    doi       = {10.1515/itit-2021-0064}
}