← All Publications
CSNet2023

Exploring Semantic vs. Syntactic Features for Unsupervised Learning on Application Log Files

Comparing semantic and syntactic feature extraction approaches for unsupervised anomaly detection in application logs.

Published at 7th Cyber Security in Networking Conference

Authors

Egil Karlsen, Rafael Copstein, Xiao Luo, Jeff Schwartzentruber, Bradley Niblett, Andrew Johnston, Malcolm I. Heywood, Nur Zincir-Heywood

Abstract

Application log files contain rich information about system behavior, but extracting meaningful features for anomaly detection remains challenging. This work compares semantic and syntactic approaches to feature extraction for unsupervised learning on log data.

Key Contributions

  1. Feature Extraction Comparison: We systematically compare semantic (transformer-based) and syntactic (pattern-based) feature extraction methods.
  1. Unsupervised Evaluation: We evaluate both approaches in unsupervised settings across multiple log datasets.
  1. Practical Recommendations: We provide guidance on when to use each approach based on log characteristics.

Methods Compared

  • Semantic Features: Transformer-based embeddings that capture meaning
  • Syntactic Features: Pattern-based extraction using log templates and structure

Key Findings

Semantic features excel at capturing complex behavioral patterns, while syntactic features provide more interpretable results. The optimal choice depends on the specific use case and requirements for explainability.

Datasets

Experiments conducted on application logs from various sources including web servers and system services.

Cite This Work

@inproceedings{Karlsen2023SemanticSyntactic,
    author    = {Karlsen, Egil and Copstein, Rafael and Luo, Xiao and Schwartzentruber, Jeff and Niblett, Bradley and Johnston, Andrew and Heywood, Malcolm I. and Zincir-Heywood, Nur},
    title     = {Exploring Semantic vs. Syntactic Features for Unsupervised Learning on Application Log Files},
    booktitle = {2023 7th Cyber Security in Networking Conference (CSNet)},
    year      = {2023},
    pages     = {219--225},
    doi       = {10.1109/CSNet59123.2023.10339765}
}