Large Scale Malicious Domain Classification with Storm, Random Forrests, and Markov Models

At Endgame we have been working on a system for large scale malicious DNS detection, and Myself and John Munro recently presented some of this work at FloCon.

Abstract:

Clairvoyant Squirrel: Large Scale Malicious Domain Classification

Large scale classification of domain names has many applications in network monitoring, intrusion detection, and forensics. The goal with this research is to predict a domain’s maliciousness solely based on the domain string itself, and to perform this classification on domains seen in real-time on high traffic networks, giving network administrators insight into possible intrusions. Our classification model uses the Random Forest algorithm with a 22-feature vector of domain string characteristics. Most of these features are numeric and are quick to calculate. Our model is currently trained off-line on a corpus of highly malicious domains gathered from DNS traffic originating from a malware execution sandbox and benign, popular domains from a high traffic DNS sensor. For stream classification, we use an internally developed platform for distributed high speed event processing that was built over Twitter's recently open sourced Storm project. We discuss the system architecture as well as the logic behind our model's features and sampling techniques that have led to 97% classification accuracy on our dataset and the model's performance within our streaming environment.

Here are the slides in case you’re interested.

–Jason

Large Scale Malicious Domain Classification with Storm, Random Forrests, and Markov Models

January 31, 2013

9 Short links on Network Beacon Detection

10 Short links on Cybersquatting domain detection

Four Short Links on Malicious Lateral Movement Detection