A short listing of cyber security data science research papers I’ve discovered recently. Each of them uses machine learning or enables ML (i.e. providing training data or enabling creation of training data) to solve various security usecases, and many provide open source code as well.
- BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware. [data]. Other malware related training data can be found here.
- Compromised or Attacker-Owned: A Large Scale Classification and Study of Hosting Domains of Malicious URLs. [code] referenced in paper, but not live as of 4/24/2021.
- DeepHunter: A Graph Neural Network Based Approach for Robust Cyber Threat Hunting. This uses an open source EDR tool named BLUESPAWN that I had not heard of before.
- DeepReflect: Discovering Malicious Functionality through Binary Reconstruction. [code]
- Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. [code]
- EXTRACTOR: Extracting Attack Behavior from Threat Reports. [code]
- On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware.
- Stratosphere: Finding Vulnerable Cloud Storage Buckets. [code]
If you’re interested in discovering more interesting papers like these, use the method I outlined here.
The “short links” format was inspired by O’Reilly’s Four Short Links series.