This short post catalogs some resources that may be useful for those interested in security data science. It is not meant to be an exhaustive list. It is meant to be a curated list to help you get started.

Staying Current with Security Data Science

Here is my current strategy for staying current with security data science research. It leans heavier towards academic research since this is what interests me at the moment.

  1. Google Scholar Publication alerts on known respected researchers.
  2. Google Scholar Citation alerts on interesting or noteworthy papers.
  3. Follow security ML researchers on Twitter and Medium. They frequently share interesting and cutting edge research papers / videos / blogs.
  4. Periodically review proceedings from noteworthy security conferences.
  5. Skim published security conference videos from Irongeek looking for topics of interest.

Google Scholar alerts

Citation Alerts on these papers:

  • “Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence”
  • “AI^ 2: training a big data machine to defend”
  • “APT Infection Discovery using DNS Data”
  • “Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks”
  • “Deep neural network based malware detection using two dimensional binary program features”
  • “Detecting malicious domains via graph inference”
  • “Detecting malware based on DNS graph mining”
  • “Detecting structurally anomalous logins in Enterprise Networks”
  • “Discovering malicious domains through passive DNS data graph analysis”
  • “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”
  • “Enabling network security through active DNS datasets”
  • “Feature-based transfer learning for network security”
  • “Gotcha-Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System”
  • “Guilt by association: large scale malware detection by mining file-relation graphs”
  • “Identifying suspicious activities through dns failure graph analysis”
  • “Polonium: Tera-scale graph mining and inference for malware detection”
  • “Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks”

New article alerts on these authors with the bolded being the most relevant / interesting to me.

  • Alina Oprea - heavily focused on operational security ML.
  • Josh Saxe, Rich Harang, and Konstantin Berlin - heavily focused on Malware detection/analytics using ML. Also a published book author.
  • Manos Antonakakis and Roberto Perdisci - heavily focused on network security analytics using ML with a specialty in DNS traffic.
  • Balduzzi Marco
  • Battista Biggio
  • Chaz Lever
  • Christopher Kruegel
  • Damon McCoy
  • David Dagon
  • David Freeman
  • Gianluca Stringhini
  • Giovanni Vigna
  • Guofei Gu
  • Han Yufei
  • Hossein Siadati
  • Issa Khalil
  • Jason (Iasonas) Polakis
  • Michael Donald Bailey
  • Michael Iannacone
  • Nick Feamster
  • Niels Provos
  • Nir Nissim
  • Patrick McDaniel
  • Stefan Savage
  • Steven Noel
  • Terry Nelms
  • Ting-Fang Yen
  • Vern Paxson
  • Wenke Lee
  • Yacin Nadji
  • Yanfang (Fanny) Ye
  • Yizheng Chen
  • Yuval Elovici

Twitter

Twitter can be a gold mine for new and relevant ideas, blogs, presentations, etc for security data science. You just need to make sure you continually follow the right folks. Here is a short list of thought leaders in this space (if I left you off it is my oversight so please don’t take offense).

For a more exhaustive list of others I would recommend following on Twitter, see this gist. This list is focused on Threat Intel, Threat Hunting, Detection Engineering, IR, and Security Engineering. It is not exhaustive, but is a good start.

Conferences

Below are several interesting security conferences where research is published on security data science topics. It is a good idea to be on the look out for the proceedings from these events.

This page is also an excellent resource in general for top academic security conferences: Top Academic Security conferences list. The major industry focused security conferences like Blackhat, RSA, Defcon, BSides*, DerbyCon, and ShmooCon all frequently have talks relevant to security data science, but this is not their primary focus, so they are not explicitly called out above.

Learning Resources

These resources will help you build a baseline of knowledge in Cyber Security and Machine Learning.

Books

Security:

Machine Learning / Data Science:

Courses


I hope this is helpful, and I would be interested to hear about other resources that you find useful. Please leave a message here, on Medium, or @ me on twitter!

–Jason
@jason_trost

A short listing of research papers I’ve read or plan to read that use passive DNS (PDNS) data and graph analytics for identifying malicious domains.

Host-Domain Graphs

Host domain graphs are bipartite graphs mapping hosts/IPs to domains that they either resolved (passive DNS) or visited (web proxy logs). These graphs are used heavily in operational security machine learning papers on network threat hunting as they provide insight into the behavioral patterns across an enterprise or ISP.

Detecting Malicious Domains via Graph Inference P. K. Manadhata, S. Yadav, P. Rao, and W. Horne. In Proceedings of 19th European Symposium on Research in Computer Security, Wroclaw, Poland, September 7-11, 2014.

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H. Chin, and Sumyah Alrwais In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015.

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks Babak Rahbarinia and Manos Antonakakis In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Domain Resolution Graphs (Domain-IP Graphs)

A domain resolution graph is an undirected bipartite graph representing observed domain->IP DNS resolution from Passive DNS data.

Notos: Building a Dynamic Reputation System for DNS M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster. In the Proceedings of the 19th USENIX Security Symposium, Washington, DC, USA, August 11-13, 2010.

EXPOSURE: Finding Malicious Domains using Passive DNS Analysis L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. In Proceedings of the Network and Distributed System Security Symposium, San Diego, California, USA, February 2011.

Discovering Malicious Domains through Passive DNS Data Graph Analysis Issa Khalil, Ting Yu, and Bei Guan. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (ASIA CCS ‘16), 2016.

–Jason
@jason_trost

The “short links” format was inspired by O’Reilly’s Four Short Links series.

Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, and Engin Kirda In Proceedings of Annual Computer Security Applications Conference (ACSAC), 2013

An Epidemiological Study of Malware Encounters in a Large Enterprise Ting-Fang Yen, Victor Heorhiadi, Alina Oprea, Michael K. Reiter, and Ari Juels In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2014

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H. Chin, and Sumyah Alrwais In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks Babak Rahbarinia and Manos Antonakakis In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015

Malicious Behavior Detection using Windows Audit Logs Konstantin Berlin, David Slater, Joshua Saxe In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security (AISec) 2015

Operational security log analytics for enterprise breach detection Zhou Li and Alina Oprea In Proceedings of the First IEEE Cybersecurity Development Conference (SecDev), 2016

Lens on the endpoint: Hunting for malicious software through endpoint data analysis. Ahmet Buyukkayhan, Alina Oprea, Zhou Li, and William Robertson. In Proceedings of Recent Advances in Intrusion Detection (RAID), 2017

–Jason
@jason_trost

PS …

This is the Definitive Security Data Science and Machine Learning Guide. It includes books, tutorials, presentations, blog posts, and research papers about solving security problems using data science.

Table of Contents

Machine Learning and Security Papers

Intrusion Detection Papers

Malware Papers

Data Collection Papers

Vulnerability Analysis/Reversing Papers

Anonymity/Privacy/OPSEC/Censorship Papers

Data Mining Papers

Cyber Crime Papers

CND/CNA/CNE/CNO Papers

Deep Learning and Security Papers

Deep Learning and Security Presentations

Security Data Science Blogs

Blogs that frequently cover topics on security data science, machine learning, etc. These are recommended for your RSS feed.

Security Data Science Blogposts / Tutorials

Security Data Science Projects

Open source projects and code applying data science/machine learning to security problems.

Security Data

Collection of Security and Network Data Resources.

Security Data Science Books

Security Data Science Presentations / Talks

Misc

Update (1/1/2017): I will not be updating this page and instead will make all updates to this page: The Definitive Security Data Science and Machine Learning Guide (see Deep Learning and Security Papers section).

This is another quick post. Over the past few months I started researching deep learning to determine if it may be useful for solving security problems. This post on The Unreasonable Effectiveness of Recurrent Neural Networks was what got me interested in this topic, and I highly recommend reading it in its entirety.

Throughout this research, I came across several security related academic and professional research papers on security topics that use Deep Learning as part of their research. What follows is a list of the papers/slides/videos that I found, and these may be useful to others. If you have others that you think should be added to this list, please ping me: @jason_trost.

Deep Learning Papers on Security

Deep Learning Presentations on Security

Security Machine Learning Resources:

General Deep Learning Resources:

–Jason
@jason_trost