Mining DNS MX Records for Fun and Profit

If you have read my blog before, you may realize that I really love DNS data and dns analytics. In this post, I share some experiences in using mostly DNS data for identifying the visible footprint of popular email security providers.

This may not be terribly novel, but it was an interesting exploration during a time of boredom for me. This work was initially motivated by two events:

When the Proofpoint email protection machine learning vulnerability (CVE-2019-20634) was announced by Will Pearce and Nick Landers I got to wondering about how large their deployment footprint was and how one could figure this out, and
A friend at another company mentioned that they were using a specific startup email security provider and I wondered whether I could determine what other companies were also using this same provider.

Here is the methodology I devised for this:

Collect a large sample of MX records
Enrich MX records with IP intelligence and useful metadata
Sift through the enriched records and identify recognizable email provider’s domains through OSINT (whois, PDNS, Google) and market research.
Profit?!?!?

For step one, I downloaded the Alexa top 1M domains, Quantcast top 1m domains (from WaybackMachine), Domcop Top 10m domains, Majestic Million Domains and Cisco Umbrella top 1m domains. I identified the registered domain using tldextract for each of these and then combined them into a single de-duplicated list. This resulted in ~8.3M unique domain names. I then performed bulk MX lookups using adnshost against my own bind9 recursive nameserver. In my experience, adnshost works pretty well for bulk DNS resolution at this scale, and it will perform both the lookup requested (MX) as well as a domain resolution (A-lookup). When performing bulk DNS lookups at this scale it is important to add retry logic for failed resolutions as this tends to happen enough to be a problem. I did this using a simple bash script that retried failed lookups up to three times.

For step two, I then developed a simple Jupyter notebook to parse the adnshost logs and perform the enrichments using tldextract, PTR lookups (also using adnshost), Maxmind ASN, Maxmind City, Alexa ranking, and Cloud provider IP Ranges for AWS, Azure, and GCP. Side note: I also attempted to perform SOA lookups on the /24 networks of each IP after noticing some useful patterns with failed PTR lookups. This appears potentially useful for identifying some of uses of some of the IP space of the cloud providers, but this turned into a rabbit hole since adnshost appears to crash when trying to handle some of the results it received.

For step three, I did the following:

Performed market research on the top email security providers as well as emerging and niche providers. This site was helpful as well as just googling around and exploring PDNS/Whois data from PassiveTotal and SecurityTrails.
Scrutinized the top MX server registered domains and ASNs and tried to identify potential security providers.
Sifted through the remaining results trying to identify any obvious providers with “malware”, “phish”, “spam”, or “security” in their domain names.

I used this to build two mappings to email security providers: MX server base domains and ASN names. The mappings can be found here. Then I summarized the overall dataset and those results are presented below. Elephants in the room I purposefully did not include Microsoft, Google, and some of the bigger tech companies that provide email service as part of these mappings since I don’t consider them email security companies. This may be debatable since these companies do provide security features through their offerings.

Brief Intro to MX records

For those of you who may not be familiar with DNS MX records, these are DNS Resource Records (RRs) used to map a domain name to the Mail Exchange (MX) servers responsible for accepting email for that domain. MX records are used by Mail Transfer Agents (MTA) in order to identify where email should be sent for a given recipient email address. Below we use the command line utility “dig” to perform an MX lookup on gmail.com to find its Mail Exchange servers. As you can see, at the time of this writing, there are five MX domains that can accept email for gmail.com.

Besides being critical for identifying where email should be sent, MX records are also useful for mapping out infrastructure and can sometimes be used to identify which email security providers are being used by a company of interest. Below is an example for Florida State University (go Noles!) that reveals that, at the time of this writing, they are using Proofpoint to receive their email. How do we know this? Their mail exchanges are hosted on sub domains of pphosted.com which is owned by Proofpoint.

Some companies obscure their security providers by first receiving their email to other mail exchanges such as ones hosted in their own data center or ones hosted by Google or Microsoft. In this blog, we explore a large DNS dataset to identify interesting info about the visible footprint / market share of email security companies.

All code and data for this study can be found in this Github Repo: https://github.com/covert-labs/mx-intel.

Observations:

Email security provider OPSEC is remarkably bad in a lot of cases and it is often easy to determine which provider is being used. Anyone who works in cyber security knows it is generally not a good idea to broadcast which cyber security products you are using since it may provide information that can be exploited by the adversary. This is especially true when vulnerabilities are announced in security products.
Since email exchanges can be chained together, only the outermost layer is visible in DNS MX records. For this reason, this research will underestimate the size of each provider’s market share.
Some security providers supply very specialized services (like anti-phishing only) and because of this they are often not the first layer in the email exchange chain. They will be dramatically underrepresented in this study.

Results:

Summary:

8,395,595 domains (derived from several top domain lists)
12,910,550 unique MX records (from 5994452 unique domains)
2,901,843 Unique Mail server domains
1,940,993 Unique Mail server base domains
25,733 Unique Mail server ASNs
56 Unique Security Providers identified

Analytics:

Here are the questions I was hoping to answer with the tables presented below:

Who are the market leading security companies reflected in the data?
What is the visible market share of email security providers as reflected in DNS records?
What can be inferred from publicly available MX records about email security?
Which email security providers are leveraging cloud hosting? And which cloud hosting environments are used most?
Who are the visible customers of provider X?

Note: All tables below show the count of domains hosted, NOT companies; companies can own many domains. Fortune 1000 domains are from 2015 and are based on this file created by Bob Rudis.

Top Email Security Providers Overall

Fortune 1000 Email security providers

Fortune 100 domain, MX base domain, email security provider

Note: I ended up adding Google and Microsoft to this table since they were very well represented. As you can see, Proofpoint and self-hosting dominate the Fortune 100.

Alexa 1000 Email security providers

Alexa 100 domain, MX base domain, email security provider

Note: I ended up adding Google and Microsoft to this table since they were very well represented. As you can see, self-hosting, Google and Microsoft dominate the Alexa 100. Almost all of these domains are from large technology / web companies so this isn’t so surprising, but it is interesting as compared to the Fortune 100.

Top Email Security Providers Hosted in AWS

Many large email security companies are operating from AWS.

Top Email Security Providers Hosted in Azure

Only a small number of identifiable email security companies were operating from Azure.

Top Self-hosted Email Security Providers

Misc Findings

When mining this data I discovered a few interesting items.

Linode / CSC Digital Brand Services

One of the more popular email security providers, “CSC Digital Brand Services” (which service multiple Fortune 100 companies), uses Linode for their hosting. This was surprising since Linode seems like a much smaller player in the Cloud hosting market.

googlemial[.]com

When I initially collected this data, freecodecamp.org had a misconfigured MX domain pointing to googlemial[.]com. And this sketchy domain is not owned by Google and resolved to a GCP IP. Upon further inspection, this IP appears to be hosting a parking page for unregistered domains owned by GoDaddy. A quick PDNS check of other domains resolving to this IP reveals ~4.2M+ domains, and a quick DNS resolution on those domains with any subdomain shows that they all resolve to the same IP.

adnshost logs for freecodecamp.org

Future Work

I am not sure if I will return to this research or not, but I had some ideas that may be worth pursuing at some point, maybe during the next pandemic :)

Perform similar work against a much larger scale - using all major zone files (COM, NET, ORG) and ICANN’s CZDS as the inputs.
Or perform similar work using the Rapid7 Opendata DNS data sets.
Determine if port scans against MX servers could be useful to augment this.
Automate PDNS queries and analysis against the MX records found to identify other domains not found in the top domain lists.
Perform similar work, but collect SPF records and see what interesting insights could be gleaned about email sending trust (and whether vulns could be identified – like AWS IPs in the SPF that are stale and potentially obtainable).
Completely automate this entire process and use it to generate weekly reports.
Identify providers hidden by the first layer mail exchange. It may be possible to do this at scale (but only for some companies) if the companies send Bounced notifications to external email senders for non-existent recipients. These bounced messages often contain all the SMTP headers of the original message sent. These headers can reveal security products. This technique was used on a targeted basis by Will Pearce and Nick Landers in their DerbyCon research on Proofpoint. Trying to do this at scale may draw a lot of attention or get my research box put on some blacklists. It would also likely be a lot more effort to identify the SMTP headers associated with different security providers.

Resources

Data:

all-registered-domains.txt.gz - base domains extracted from combining several popular domains lists together and then uniqued.
all-popular-domains-MX-20200620.txt.unique.gz - adnshost logs from performing MX lookups on domains from all-registered-domains.txt.gz.
mailserver_registered_domain-NS-20200620.txt.gz - adnshost logs from performing NS lookups on all the MX base domains; used for enrichment.
mx-intel-enriched.csv.gz - the final enriched output from this work.

Notebooks, Code, and summary results: https://github.com/covert-labs/mx-intel.

–Jason
@jason_trost