May 05, 2019Jason Trost • Comment

Security Data Science Learning Resources

This short post catalogs some resources that may be useful for those interested in security data science. It is not meant to be an exhaustive list. It is meant to be a curated list to help you get started.

Staying Current with Security Data Science

Here is my current strategy for staying current with security data science research. It leans heavier towards academic research since this is what interests me at the moment.

Google Scholar Publication alerts on known respected researchers.
Google Scholar Citation alerts on interesting or noteworthy papers.
Follow security ML researchers on Twitter and Medium. They frequently share interesting and cutting edge research papers / videos / blogs.
Periodically review proceedings from noteworthy security conferences.
Skim published security conference videos from Irongeek looking for topics of interest.

Google Scholar alerts

Citation Alerts on these papers:

“Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence”
“AI^ 2: training a big data machine to defend”
“APT Infection Discovery using DNS Data”
“Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks”
“Deep neural network based malware detection using two dimensional binary program features”
“Detecting malicious domains via graph inference”
“Detecting malware based on DNS graph mining”
“Detecting structurally anomalous logins in Enterprise Networks”
“Discovering malicious domains through passive DNS data graph analysis”
“EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”
“Enabling network security through active DNS datasets”
“Feature-based transfer learning for network security”
“Gotcha-Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System”
“Guilt by association: large scale malware detection by mining file-relation graphs”
“Identifying suspicious activities through dns failure graph analysis”
“Polonium: Tera-scale graph mining and inference for malware detection”
“Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks”

New article alerts on these authors with the bolded being the most relevant / interesting to me.

Alina Oprea - heavily focused on operational security ML.
Josh Saxe, Rich Harang, and Konstantin Berlin - heavily focused on Malware detection/analytics using ML. Also a published book author.
Manos Antonakakis and Roberto Perdisci - heavily focused on network security analytics using ML with a specialty in DNS traffic.
Balduzzi Marco
Battista Biggio
Chaz Lever
Christopher Kruegel
Damon McCoy
David Dagon
David Freeman
Gianluca Stringhini
Giovanni Vigna
Guofei Gu
Han Yufei
Hossein Siadati
Issa Khalil
Jason (Iasonas) Polakis
Michael Donald Bailey
Michael Iannacone
Nick Feamster
Niels Provos
Nir Nissim
Patrick McDaniel
Stefan Savage
Steven Noel
Terry Nelms
Ting-Fang Yen
Vern Paxson
Wenke Lee
Yacin Nadji
Yanfang (Fanny) Ye
Yizheng Chen
Yuval Elovici

Twitter

Twitter can be a gold mine for new and relevant ideas, blogs, presentations, etc for security data science. You just need to make sure you continually follow the right folks. Here is a short list of thought leaders in this space (if I left you off it is my oversight so please don’t take offense).

For a more exhaustive list of others I would recommend following on Twitter, see this gist. This list is focused on Threat Intel, Threat Hunting, Detection Engineering, IR, and Security Engineering. It is not exhaustive, but is a good start.

Conferences

Below are several interesting security conferences where research is published on security data science topics. It is a good idea to be on the look out for the proceedings from these events.

This page is also an excellent resource in general for top academic security conferences: Top Academic Security conferences list. The major industry focused security conferences like Blackhat, RSA, Defcon, BSides*, DerbyCon, and ShmooCon all frequently have talks relevant to security data science, but this is not their primary focus, so they are not explicitly called out above.

Learning Resources

These resources will help you build a baseline of knowledge in Cyber Security and Machine Learning.

Books

Security:

Extrusion Detection: Security Monitoring for Internal Intrusions by Richard Bejtlich
Intelligence-Driven Incident Response: Outwitting the Adversary by Scott J. Roberts and Rebekah Brown
Counter Hack Reloaded: A Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd Edition) by Edward Skoudis and Tom Liston

Machine Learning / Data Science:

Network Security Through Data Analysis: Building Situational Awareness by Michael S Collins
Malware Data Science: Attack Detection and Attribution by Joshua Saxe and Hillary Sanders
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition by Sebastian Raschka and Vahid Mirjalili
Deep Learning with Python by Francois Chollet

Courses

I hope this is helpful, and I would be interested to hear about other resources that you find useful. Please leave a message here, on Medium, or @ me on twitter!

–Jason
@jason_trost

August 14, 2017Jason Trost • Comment

6 Short Links on PDNS Graph Analytics for Security

A short listing of research papers I’ve read or plan to read that use passive DNS (PDNS) data and graph analytics for identifying malicious domains.

Host-Domain Graphs

Host domain graphs are bipartite graphs mapping hosts/IPs to domains that they either resolved (passive DNS) or visited (web proxy logs). These graphs are used heavily in operational security machine learning papers on network threat hunting as they provide insight into the behavioral patterns across an enterprise or ISP.

Detecting Malicious Domains via Graph Inference P. K. Manadhata, S. Yadav, P. Rao, and W. Horne. In Proceedings of 19th European Symposium on Research in Computer Security, Wroclaw, Poland, September 7-11, 2014.

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H. Chin, and Sumyah Alrwais In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015.

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks Babak Rahbarinia and Manos Antonakakis In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Domain Resolution Graphs (Domain-IP Graphs)

A domain resolution graph is an undirected bipartite graph representing observed domain->IP DNS resolution from Passive DNS data.

Notos: Building a Dynamic Reputation System for DNS M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster. In the Proceedings of the 19th USENIX Security Symposium, Washington, DC, USA, August 11-13, 2010.

EXPOSURE: Finding Malicious Domains using Passive DNS Analysis L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. In Proceedings of the Network and Distributed System Security Symposium, San Diego, California, USA, February 2011.

Discovering Malicious Domains through Passive DNS Data Graph Analysis Issa Khalil, Ting Yu, and Bei Guan. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (ASIA CCS ‘16), 2016.

–Jason
@jason_trost

The “short links” format was inspired by O’Reilly’s Four Short Links series.

August 08, 2017Jason Trost • Comment

7 Short Links on Operational Security Machine Learning

Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, and Engin Kirda In Proceedings of Annual Computer Security Applications Conference (ACSAC), 2013

An Epidemiological Study of Malware Encounters in a Large Enterprise Ting-Fang Yen, Victor Heorhiadi, Alina Oprea, Michael K. Reiter, and Ari Juels In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2014

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H. Chin, and Sumyah Alrwais In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks Babak Rahbarinia and Manos Antonakakis In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Malicious Behavior Detection using Windows Audit Logs Konstantin Berlin, David Slater, Joshua Saxe In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security (AISec) 2015

Operational security log analytics for enterprise breach detection Zhou Li and Alina Oprea In Proceedings of the First IEEE Cybersecurity Development Conference (SecDev), 2016

Lens on the endpoint: Hunting for malicious software through endpoint data analysis. Ahmet Buyukkayhan, Alina Oprea, Zhou Li, and William Robertson. In Proceedings of Recent Advances in Intrusion Detection (RAID), 2017

–Jason
@jason_trost

PS …

many of these papers were found via Alina Oprea’s home page.
The “short links” format was inspired by O’Reilly’s Four Short Links series.

January 01, 2017Jason Trost • Comment

The Definitive Security Data Science and Machine Learning Guide

This is the Definitive Security Data Science and Machine Learning Guide. It includes books, tutorials, presentations, blog posts, and research papers about solving security problems using data science.

Machine Learning and Security Papers

Intrusion Detection Papers

Malware Papers

Data Collection Papers

Vulnerability Analysis/Reversing Papers

Anonymity/Privacy/OPSEC/Censorship Papers

Data Mining Papers

Cyber Crime Papers

CND/CNA/CNE/CNO Papers

Deep Learning and Security Papers

Deep Learning and Security Presentations

Security Data Science Blogs

Blogs that frequently cover topics on security data science, machine learning, etc. These are recommended for your RSS feed.

Security Data Science Blogposts / Tutorials

Security Data Science Projects

Open source projects and code applying data science/machine learning to security problems.

Clearcut - a tool that uses machine learning to help you focus on the log entries that really need manual review
Click Security’s Data Hacking Project
Combine - Tool to gather Threat Intelligence indicators from publicly available sources
dga_predict - Predicting Domain Generation Algorithms using LSTMs.
mlsec.org - Various Machine Learning and Computer Security Research projects from mlsec.org.
tiq-test - Threat Intelligence Quotient Test - Dataviz and Statistical Analysis of TI feeds.
CuckooML: Machine Learning for Cuckoo Sandbox https://honeynet.github.io/cuckooml/

Security Data

Collection of Security and Network Data Resources.

See Covert.io Data Page
See Covert.io Threat Intelligence Page
See secrepo.com is more comprehensive and should be checked as well.

Security Data Science Books

Security Data Science Presentations / Talks

Misc

awesome-ml-for-cybersecurity

December 29, 2016Jason Trost • Comment

Deep Learning Security Papers

Update (1/1/2017): I will not be updating this page and instead will make all updates to this page: The Definitive Security Data Science and Machine Learning Guide (see Deep Learning and Security Papers section).

This is another quick post. Over the past few months I started researching deep learning to determine if it may be useful for solving security problems. This post on The Unreasonable Effectiveness of Recurrent Neural Networks was what got me interested in this topic, and I highly recommend reading it in its entirety.

Throughout this research, I came across several security related academic and professional research papers on security topics that use Deep Learning as part of their research. What follows is a list of the papers/slides/videos that I found, and these may be useful to others. If you have others that you think should be added to this list, please ping me: @jason_trost.

covert.io

security + big data + machine learning

Staying Current with Security Data Science

Google Scholar alerts

Twitter

Conferences

Learning Resources

Books

Security:

Machine Learning / Data Science:

Courses

Host-Domain Graphs

Domain Resolution Graphs (Domain-IP Graphs)

Table of Contents

Machine Learning and Security Papers

Intrusion Detection Papers

Malware Papers

Data Collection Papers

Vulnerability Analysis/Reversing Papers

Anonymity/Privacy/OPSEC/Censorship Papers

Data Mining Papers

Cyber Crime Papers

CND/CNA/CNE/CNO Papers

Deep Learning and Security Papers

Deep Learning and Security Presentations

Security Data Science Blogs

Security Data Science Blogposts / Tutorials

Security Data Science Projects

Security Data

Security Data Science Books

Security Data Science Presentations / Talks

Misc

Deep Learning Papers on Security

Deep Learning Presentations on Security

Security Machine Learning Resources:

General Deep Learning Resources: