Security Data Science Learning Resources

This short post catalogs some resources that may be useful for those interested in security data science. It is not meant to be an exhaustive list. It is meant to be a curated list to help you get started.

Staying Current with Security Data Science

Here is my current strategy for staying current with security data science research. It leans heavier towards academic research since this is what interests me at the moment.

Google Scholar Publication alerts on known respected researchers.
Google Scholar Citation alerts on interesting or noteworthy papers.
Follow security ML researchers on Twitter and Medium. They frequently share interesting and cutting edge research papers / videos / blogs.
Periodically review proceedings from noteworthy security conferences.
Skim published security conference videos from Irongeek looking for topics of interest.

Google Scholar alerts

Citation Alerts on these papers:

“Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence”
“AI^ 2: training a big data machine to defend”
“APT Infection Discovery using DNS Data”
“Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks”
“Deep neural network based malware detection using two dimensional binary program features”
“Detecting malicious domains via graph inference”
“Detecting malware based on DNS graph mining”
“Detecting structurally anomalous logins in Enterprise Networks”
“Discovering malicious domains through passive DNS data graph analysis”
“EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”
“Enabling network security through active DNS datasets”
“Feature-based transfer learning for network security”
“Gotcha-Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System”
“Guilt by association: large scale malware detection by mining file-relation graphs”
“Identifying suspicious activities through dns failure graph analysis”
“Polonium: Tera-scale graph mining and inference for malware detection”
“Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks”

New article alerts on these authors with the bolded being the most relevant / interesting to me.

Alina Oprea - heavily focused on operational security ML.
Josh Saxe, Rich Harang, and Konstantin Berlin - heavily focused on Malware detection/analytics using ML. Also a published book author.
Manos Antonakakis and Roberto Perdisci - heavily focused on network security analytics using ML with a specialty in DNS traffic.
Balduzzi Marco
Battista Biggio
Chaz Lever
Christopher Kruegel
Damon McCoy
David Dagon
David Freeman
Gianluca Stringhini
Giovanni Vigna
Guofei Gu
Han Yufei
Hossein Siadati
Issa Khalil
Jason (Iasonas) Polakis
Michael Donald Bailey
Michael Iannacone
Nick Feamster
Niels Provos
Nir Nissim
Patrick McDaniel
Stefan Savage
Steven Noel
Terry Nelms
Ting-Fang Yen
Vern Paxson
Wenke Lee
Yacin Nadji
Yanfang (Fanny) Ye
Yizheng Chen
Yuval Elovici

Twitter

Twitter can be a gold mine for new and relevant ideas, blogs, presentations, etc for security data science. You just need to make sure you continually follow the right folks. Here is a short list of thought leaders in this space (if I left you off it is my oversight so please don’t take offense).

For a more exhaustive list of others I would recommend following on Twitter, see this gist. This list is focused on Threat Intel, Threat Hunting, Detection Engineering, IR, and Security Engineering. It is not exhaustive, but is a good start.

Conferences

Below are several interesting security conferences where research is published on security data science topics. It is a good idea to be on the look out for the proceedings from these events.

This page is also an excellent resource in general for top academic security conferences: Top Academic Security conferences list. The major industry focused security conferences like Blackhat, RSA, Defcon, BSides*, DerbyCon, and ShmooCon all frequently have talks relevant to security data science, but this is not their primary focus, so they are not explicitly called out above.

Learning Resources

These resources will help you build a baseline of knowledge in Cyber Security and Machine Learning.

Books

Security:

Extrusion Detection: Security Monitoring for Internal Intrusions by Richard Bejtlich
Intelligence-Driven Incident Response: Outwitting the Adversary by Scott J. Roberts and Rebekah Brown
Counter Hack Reloaded: A Step-by-Step Guide to Computer Attacks and Effective Defenses (2nd Edition) by Edward Skoudis and Tom Liston

Machine Learning / Data Science:

Network Security Through Data Analysis: Building Situational Awareness by Michael S Collins
Malware Data Science: Attack Detection and Attribution by Joshua Saxe and Hillary Sanders
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition by Sebastian Raschka and Vahid Mirjalili
Deep Learning with Python by Francois Chollet

Courses

I hope this is helpful, and I would be interested to hear about other resources that you find useful. Please leave a message here, on Medium, or @ me on twitter!

–Jason
@jason_trost