Keywords

1 Introduction

Secure Sockets Layer (SSL) and its successor Transport Layer Security (TLS) are security protocols that provide security and data integrity for network communications (we refer to SSL/TLS as SSL for brevity in this paper). An X.509 certificate plays an important role in SSL Public Key Infrastructure, which is the basis of the SSL encryption framework. When establishing SSL connection, the server or/and the client is required to provide a certificate to the peer to prove its identity. Since the widespread use of SSL, issues of certificate come out one after another as well, such as compromised CAs, weak public key algorithm, forged certificates, and so on. In this paper, we focus on forged certificates that mainly used in MITM attacks on HTTPS web services.

When carrying out an SSL man-in-the-middle (MITM) attack, the attacker usually uses a forged certificate to pretend the compromised or malicious server and deceive careless users. And a victim’s negligence would then lead to the privacy disclosure and property loss. Since attempts to MITM attacks on https-encrypted web sites have never stopped, it’s necessary to conduct a comprehensive analysis to study the status quo of forged certificates in the real world.

Many researchers have published their work on the certificate ecosystem, providing different aspects view of X.509 certificates. And studies that try to reveal the negative side of the certificate have never been stopped. However, there’re few works focused on forged certificate used by MITM attacks. In this paper, we conduct a comprehensive study of forged certificates in the wild, and the contributions of our work are as follows: First, we implement a 20-month passive measurement to collect the real-world SSL certificates on two research networks to explore the forged certificate issue, which is up to now the largest scale long term study. Second, we analyze the forged certificates against Alexa top 10 thousand web sites by combining both the passive measurement results and the public certificate transparency (CT) logs [14, 18], which is highly representative and comprehensively. Third, we reveal the reasons, preferences of forged certificates, as well as distinct differences between the forged certificates and the benign ones in several attributes, offering valuable insights to researches on SSL/TLS security. Finally, several IP addresses are discovered which are probably used to carry out MITM attacks through a series of forged certificates tracing and traffic behavior analysis.

The remainder of this paper is structured as follows. Section 2 elaborates the related works. Section 3 describes our measurement and dataset used in this paper. We analyze and elaborate how forged certificates performed in the wild in Sect. 4 and compare to benign ones in Sect. 5. In Sect. 6, we try to trace and identify SSL MITM attacks and we conclude our work in Sect. 7.

2 Related Works

In recent years, many researchers have focused on the measurement of SSL encrypted traffic, and there’re two ways to implement this: active measurement and passive measurement. [8] performed 110 scans of the IPv4 address space on port 443 over 14 months to study the HTTPS certificate ecosystem. [7] also implemented scans of the public IPv4 address space to collect data, with the help of ZMap [9]. And the certificate they collected could be found in Censys [1]. What’s more, [19] found that a combination of Censys data and CT logs could account for more than 99% of their observed certificates. There’re many other active scans which provide datasets of certificates, such as Rapid7 SSL [13]. [10] implemented both active and passive measurement to present a comprehensive analysis of X.509 certificates in the wild. The authors conducted HTTPS scans of popular HTTPS servers listed in Alexa Top 1 Million over 1.5 years from 9 locations distributed over the world. They also monitored SSL traffic on a 10 Gbps uplink of a research network.

There’re also many works focused on the negative side of the certificate. [16] identified web-fraud using attributes extracted from certificates. [5] implemented an empirical study of certificates for depository institutions and showed the bad condition of bank websites in disposing SSL. [6] proposed a machine-learning approach to detect phishing websites utilizing features from their certificates. [2] studied invalid certificates in the wild, and revealed that most of the invalid certificates could be attributed to the reissues of a few types of end-user devices. Comparison between valid and invalid certificates was conducted as well. Several studies [4, 11] detected SSL MITM attacks using different methods. However, they didn’t focus on forged certificates used by MITM attackers. [12] implemented a method to detect the occurrence of SSL MITM attacks on Facebook, and their results indicated that 0.2% of the SSL connection they analyzed were tampered with forged SSL certificates. Their work only concentrated on Facebook, could not provide an overall view of forged certificates.

3 Measurement and Datasets

In this section, we describe our passive measurement and the datasets. The methodology of identifying forged certificates is elaborated as well.

3.1 Passive Measurement

In order to study the forged certificates, we implemented a passive measurement on two large research networks from November 2015 to June 2017. These networks could provide 100 Gbps bandwidth. Our program collected certificates and SSL sessions statistical information after an anonymous processing. Useful data would be added into the corresponding datasets, namely DsCrt and DsCnn respectively.

Table 1. Overview of certificate dataset

3.2 Datasets

Based on the measurement, we made up two datasets to store the certificates information and SSL sessions statistical information separately.

DsCrt contained all certificates we collected during the 20-month long measurement. Excluding tiny errors due to the high-speed network environment, we totally obtained 188,064,507 unique certificates, including 3,359,040 CAs (both root and intermediate) and 184,705,467 leaves. We extracted and identified these leaf certificates using the methodology mentioned in Sect. 3.3 and harvested 2,867,286 forged ones in 4,777,671 certificates that claimed to belong to Alexa Top 10k domains. After the identification, all of the gathered certificates were parsed into json format completely, referring to Censys data format [1]. The parsed attributes of a certificate include but are not limited to sha1 value, signature algorithm, public key information, issuer, subject, validation period, extensions, and so on.

DsCnn recorded the statistical information of SSL sessions detected during our measurement. For each SSL session, we stored server IP, server port, and some basic statistics such as bytes, packets, and packet interval, and of course the corresponding certificate SHA1 string. Server IP and port might help to trace the suspicious MITM attacks, and the basic statistics could be used to train machine-learning models in the future work.

3.3 Identifying Methodology

When identifying a leaf certificate was benign or not, we utilized the CAs in the Chrome root store (as of July 1, 2017). Since the measurement lasted such a long time, we ignored validation errors only due to expiration time. Thus, we recognized a leaf certificate was benign if the root CA of the corresponding certificate chain was credible. Otherwise it’s not. For the latter one, we then checked if it was self-signed, and then labeled the certificate using “is_benign” and “is_selfsigned” attributes.

Since many web service providers use self-signed certificates due to the balance of cost and safety, and the compromise or abuse of root and intermediate CAs, it’s really hard to identify whether a certificate was forged or not, especially for a self-signed one or a website in obscurity. Hence we chose the domains listed in Alexa Top 10k as target, and studied forged certificates of these well-known web services (if provided SSL encryption) picking the public CT logs [18] as a benchmark. For a certain certificate, if it wasn’t included in any public CT logs, we regarded it as a forged one. Based on this constraint, we extracted 4,777,671 certificates which claimed to belong to Alexa Top 10k domains, and verified them with the help of CT logs included in Chrome. Finally we harvested 2,867,286 forged certificates of 4,165 different web services or domains after the verification.

3.4 Ethical Considerations

Considering the privacy and ethical issues in the passive measurement, we implement an anonymous process while dealing with the data. The client IP of each connection has been anonymized before our collection in the measurement system. Thus, we do not know the real client IP address of each SSL session. We focus on certificates and corresponding servers, but not the user privacy.

Table 2. Top 20 issuers of forged certificates

4 Forged Certificates Status in Quo

Based on the measurement and identifying methodology, we obtained 2,867,286 forged certificates of 4,165 different web services/domains. These forged certificates contained 107,306 self-signed ones of 922 different subjects. Others belonged to 215,236 different issuers of 3,988 unique subjects. Details could be seen in Table 1.

In this section, we studied the forged certificates comprehensively, including issuers, subjects, public key, validity period, lifetime, and so on. We also compared these features of forged certificates to the benign ones’, tried to reveal the significant difference between them.

4.1 Issuers of Forged Certificates

We firstly analyzed the issuers of these forged certificates and tried to find out (or determine) the main causes. Table 2 lists the top 20 issuers CommonName (CN) [3] of forged certificates. No. 1 issuer CN indicated 189,912 issuers whose CNs satisfied the regular expression of [0-9a-z]{16}. No. 4 issuer CN indicated all self-signed forged certificates. No. 10 indicated the corresponding certificates didn’t have the attribute of issuer CN.

Table 3. Rough classification of issuers

These issuers could be divided into several classes. Some issuers are related to security products or anti-virus software, such as FortiGate CA (firewall), Cisco Umbrella Secondary SubCA *-SG (secure internet gateway), and www.netspark.com (content filter). Meanwhile, some others might be used for malicious services like MITM attacks, such as “[0-9a-z]{16}”. According to the attributes of issuers, the certificates they’ve issued, and the corresponding servers, we roughly classified these issuers into 4 classes: SecureService, Research, Proxy, and Suspicious, which were presented in Table 3. Issuers classified as SecureService were those deployed in security products for security protection or content audit. A Research issuer mainly claimed to belong to a research institute or a university, or indicated to a famous tool used to analyze HTTPS traffic. Actually, these tools might be used for malicious services, but we roughly classified the corresponding issuers to Research considering the mainly usage and the diversity of certificates faked by them. We simply considered an issuer claimed to be a proxy (mainly contained the word “proxy” in the Common Name) as Proxy. This category had the lowest priority due to its simplest classification basis. Suspicious issuers referred to those we suspected faking certificates for MITM attacks. The reason we named it Suspicious but not Malicious was that we could not directly prove that the corresponding services were malicious, and it’s really hard to confirm. We determined an issuer to be a suspicious one based on three aspects: (1) the certificates faked by this issuer limited to several species, mainly for financial types; (2) the account of corresponding SSL sessions was much less than the average; (3) the account of corresponding servers for an issuer (or a series of similar issuers) was much less than the average. Examples of these classes could be seen in Table 3 as well.

Fig. 1.
figure 1

Preference of Forged Certs. Abscissa axis lists the Top 10 subjects of forged certificates in different classes, vertical axis indicates the percentage of all forged certificates owned by each subject.

4.2 Preference of Forged Certificates

In addition from the issuer’s perspective, we also analyzed the forged certificate in the view of the subject, trying to figure out the preference of forged certificates. We analyzed subjects in four classes mentioned above, and found the preference of Suspicious issuers performed differently from the others. As showed in Fig. 1(c), more than 96% forged certificates issued by Suspicious issuers are related to three shopping websites which belonged to one e-commerce company Alibaba. *.tmall.com and *.taobao.com are mainly for domestic users, and *.aliexpress.com mainly provides service for global consumers. Without considering Suspicious issuers, the top 10 forged certificates each represents a well-known web service in different fields, and they perform relatively average in the count, as showed in Fig. 1(b). The reason for this result is obvious: criminals are more interested in the wallets of victims. And the other classes of issuers do not have a clear tendency when forging certificates. Their preferences only related to the popularity of each web service.

5 Comparison with Benign Certificates

We studied several attributes of forged certificates, including security attributes (such as certificate version, signature algorithm, and public key information), validity period, and lifetime. Compared to benign certificates, the forged ones performed extremely different in many aspects.

5.1 Security Attributes

We selected version, signature algorithm, public key algorithm and public key length to characterize the security of a certificate. Figure 2 shows the comparison of these attributes between forged and benign certificates.

Fig. 2.
figure 2

Comparison of security attributes between forged and benign certs

Version: There’re three versions of the X.509 certificate: v1, v2, and v3. v1 certificates were deprecated long time ago, considering the security. And v2 certificates were even not widely deployed on the Internet. v3 certificate is currently the most widely used in the wild, and our measurement result confirmed this. Forged and benign certificates performed similarly in the statistical characteristics of the version attribute, both of which had more than 99% of v3 certificates. The tiny difference is that compared to benign ones, more forged certificates were v1 (nearly 1%). What’s more, we find 102 benign certificates and 18 forged ones using v4. Since the subscript of version started from 0, which meant that “version = 0” indicated the v1 certificates, we speculated that most of the v4 certificates might be ascribed to misoperation.

Signature Algorithm: It’s well known that MD5 and SHA1 algorithms were both cracked many years ago. Especially the SHA1 algorithm, which was once widely used, is still widely used in the wild, due to our measurement result. As shown in Fig. 2, more than 90% benign certificates used SHA256, while only 6.43% used SHA1. Well for the forged certificates, only 58.05% used SHA256, much less than the benign ones. And surprisingly to us, 41.23% of the forged certificates was still using SHA1 algorithm. It was obvious that forged certificates had a much lower security in the attribute of the signature algorithm. We then analyzed forged certificates which used SHA1 algorithm, and found out more than 80% SHA1 certificates could be attributed to mitmproxy. mitmproxy [17] is a free and open source HTTPS proxy. It’s widely used for HTTPS proxies, security experiments, and of course MITM attacks. We found nearly 950,000 forged certificates signed by mitmproxy with the signature algorithm of SHA1 and 1024-bit RSA public key, while others used the signature algorithm of SHA256. Figure 3 is an example of certificate signed by mitmproxy. We found that though most of this kind of certificates used SHA1 certificates, they have changed to use SHA256 recently. We speculate that this may be attributed to version updates or source code rewrites.

Fig. 3.
figure 3

An example of mitmproxy signed certificate

Public Key Information: The situation of public key information is similar to the signature algorithm to some extent, as shown in Fig. 2. Though 1024-bit RSA key was not secure any more several years ago, 72.62% of forged certificates still use it to encrypt their SSL connection. Similarly, 45% of these insecure forged certificates can be attributed to mitmproxy, due to the reason mentioned above. And 48% of them were issued by ([0-9a-z]{16}). What’s more, we found that all forged certificates issued by ([0-9a-z]{16}) had 1024-bit RSA public keys and were signed by SHA1 signature algorithm. For benign certificates, 2048-bit RSA public key accounts for the mainstream with a percentage of 93.92%. And different from the status of signature algorithm, 5.05% proportion of ECDSA,256 in benign certificates indicates the trend of more secure public key algorithm. Actually, many famous companies, such as Google, Facebook, and Alibaba, have changed their public key algorithm to the more secure Elliptic curve cryptography (ECC) algorithm.

According to the results above, we can conclude that the security attributes of forged certificates performed much worse than the benign ones. Since forged certificates rarely considering security issues, the conclusion is in line with our expectation. What’s more, we found that the attributes of forged certificates are closely related to the top issuers listed in Table 2. And this conclusion also applied to the attribute of the certificate validity period.

5.2 Certificate Validity Period

We calculated the validity period (the time between the certificates’ Not Before and Not After attributes) of each certificate, and found a significant difference between forged certificates and the benign ones (shown in Fig. 4(a)). For benign certificates, most of their validity period were located in three intervals: 2–3 months, 6 months–1 year, and 1–2 years. While for forged ones, most of their validity period were located in two intervals: 1–2 months and 6 months–1 year. In detail, 46.07% forged certificates were valid for 32 days, and 11.86% were valid for 1 year. We then studied the three significant differences shown in Fig. 4(a), revealed that different validity periods of certificates might be caused by different issuers. We found that almost all forged certificates (more than 99.99%) which owned a 32-day validity period were issued by mitmproxy. What’s more, 99.99% of benign certificates with the validity period of 84 day were issued by Google Internet Authority G2, and more than 99.96% with the validity period of 90-day were issued by Let’s Encrypt Authority X3. Certificates validated for 2 years (730 days) cloud be attributed to issuers belonged to DigiCert (46.44%), Symantec (12.98%), VeriSign (7.27%), and so on.

Fig. 4.
figure 4

Validity period and lifetime of certs

5.3 Certificate Lifetime

In order to compare the lifetime (the days between the first time and the last time a certificate exposed in our sight) of forged and benign certificates, we selected a three-month period, from April 1 to June 30, to implement continuous observation. During the 91-day observation, we obtained that the average lifetime of forged certificates was 3.59 days, while for authorized ones it was 12.02 days. The result demonstrated that forged certificates had a much shorter lifetime than the benign ones, as shown in Fig. 4(b). We speculated that the MITM attackers need to update their forged certificates frequently to evade the detection of security products like IDS and firewall, or to replace the blacklisted ones. Figure 4(b) also shows that more than 85% forged certificates only appeared once in our 91-day observation. This could be attributed to security tests or researches which only performed once.

5.4 Conclusion

According to the comparisons mentioned above, we cloud conclude that most forged certificates didn’t care about security as the widely use of unsafe signature algorithm and public key algorithm. However, what they most concerned was evading the detection of anti-virus software. Hence the lifetime of forged certificates was much shorter than benign ones, as attackers required to update certificates frequently.

6 Tracking MITM Attacks

While the use of forged certificates can be diverse, MITM attacks directly threatened users’ privacy and security. Thus tracking MITM attacks is necessary, and many researchers have focused on this issue. In this paper, we discovered a MITM attack and performed a tracking with the help of forged certificate attributes and SSL session statistics.

Table 4. Suspicious servers

When performing a deep analysis, forged certificates with similar issuers that satisfied the regular expression of [0-9a-z]{16} caused our attention. According to our analysis, more than 96% certificates faked by these issuers belonged to three famous e-commerce websites in China, 52.38% for *.tmall.com, 40.72% for *.taobao.com, and 2.93% for *.aliexpress.com. And all three websites belonged to the same company, Alibaba, the most famous e-commerce company in China. The first two websites mainly served domestic users, while the last one provided global online shopping services. And that’s why the forged certificates of this website were far less than the former ones. According to the reasons above, we speculated that these issuers were related to the MITM attacks targeting the online shopping service of Alibaba. Thus we analyzed the statistical SSL session information of corresponding certificates to verify our suspicion.

Table 5. Information of suspicious servers

We selected 3 months connection data from 04/01/2017 to 06/30/2017, and extracted server information of the forged certificates which met the above conditions. Finally, we obtained 230 unique servers from 1,597,532 SSL sessions. Surprisingly, most sessions (99.98%) and certificates (99.77%) belonged to four server IPs, as showed in Table 4. We then looked up the information of these IPs with the help of MAXMIND [15], finding that they might locate in a same position and belonged to a same organization, as showed in Table 5. Thus, we suspected these IP addresses should be attributed to MITM attacks for the reasons below:

  1. 1.

    These IPs covered almost all forged certificates and SSL sessions.

  2. 2.

    Forged certificates used by these IPs are mainly related to 3 domains, which provided online shopping services. Obviously, MITM attackers cared more about the victim’s wallet.

  3. 3.

    Each IP was served on 500 different and contiguous ports, and the number of sessions for each (IP, port, cert) triple per day is much less than proxy-used forged certificates.

  4. 4.

    These IPs belong to a same organization and locate in the same country.

  5. 5.

    When searching poneytelecom.eu on Google, many results indicated that this domain/organization was related to web fraud behaviors.

7 Conclusion

In this paper, we implemented a 20-month passive measurement to study the status quo of forged certificates in the wild. Based on CT logs provided by Chrome, we identified forged certificates of the web services listed in Alexa Top 10k, and finally gathered 2,867,286 forged ones. We analyzed the forged certificates in the view of both issuer and subject. Based on the analysis of issuers, we revealed the causes of forged certificates by roughly classifying them into four categories. SecureService certificates were mainly deployed in security products for security protection or content audit. Research indicated issuers that claimed to belong to a research institute or a university, or famous tools used to analyze HTTPS traffic. Proxy might be used for proxy servers, and Suspicious issuers referred to those we suspected faking certificates for MITM attacks. The study of the subject showed us the preference of forged certificates. While Suspicious mainly focused on financial related services, others preferences only related to the popularity of each web service. When comparing forged certificates to the benign ones, we found significant differences in security attributes, validity period, and lifetime. Benign certificates used more secure signature and public key algorithm to ensure the security of SSL encrypted connection, while forged ones performed quite terrible. Forged certificates also had a much shorter validity period and lifetime. What’s more, we found the validity period of a certificate, no matter forged or benign, was related to its issuer. At last, we traced a series of Suspicious forged certificates and harvested four malicious server IP addresses through traffic behavior analysis.