Skip to main content

Measuring Cloud Service Health Using NetFlow/IPFIX: The WikiLeaks Case


The increasing trend of outsourcing services to cloud providers is changing the way computing power is delivered to enterprises and end users. Although cloud services offer several advantages, they also make cloud consumers strongly dependent on providers. Hence, consumers have a vital interest to be immediately informed about any problems in their services. This paper aims at a first step toward a network-based approach to monitor cloud services. We focus on severe problems that affect most services, such as outages or extreme server overload, and propose a method to monitor these problems that relies solely on the traffic exchanged between users and cloud providers. Our proposal is entirely based on NetFlow/IPFIX data and, therefore, explicitly targets high-speed networks. By combining a methodology to reassemble and classify flow records with stochastic estimations, our proposal has the distinct characteristic of being applicable to both sampled and non-sampled data. We validate our proposal and show its applicability using data collected at both the University of Twente and an international backbone during the WikiLeaks Cablegate. Our results show that, in contrast to Anonymous’ claims, the users of the targeted services have been only marginally affected by the attacks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. Only midstream traffic without any flags has been observed.

  2. Note that Algorithm 1 ignores the cases where \(\hat{n}_h > \hat{n}\) by chance.

  3. Some bias will remain when n i  > 0 for i > 2, similarly to Sect. 2.2.2.

  4. Bro has other states that should not be reached when all packets are observed. Less than 0.1% of the connections in our datasets are terminated in those states owing to packet loss.

  5. Since timeouts do not impact the results, other values would lead to similar conclusions.

  6. Similar figures would be obtained with other typical setups, such as the Cisco NetFlow default timeouts (15 and 1,800 s, respectively).


  1. Hajjat, M., Sun, X., Sung, Y.W.E., Maltz, D., Rao, S., Sripanidkulchai, K., Tawarmalani, M.: Cloudward bound: planning for beneficial migration of enterprise applications to the cloud. SIGCOMM Comput. Commun. Rev. 40(4), 243–254 (2010)

    Article  Google Scholar 

  2. Rish, I., Brodie, M., Odintsova, N., Ma, S., Grabarnik, G.: Real-Time Problem Determination in Distributed Systems Using Active Probing. In: Proceedings of the IEEE/IFIP Network Operations and Management Symposium, NOMS’04, pp. 133–146 (2004)

  3. Xu, K., Wang, F., Wang, H.: Lightweight and informative traffic metrics for data center monitoring. J. Netw. Syst. Manag. 20, 226–243 (2012)

    Article  Google Scholar 

  4. Clarke, R.: How reliable is cloudsourcing? A review of articles in the technical media 2005–11. Comput. Law Secur. Rev. 28(1), 90–95 (2012)

    Article  Google Scholar 

  5. Gehlen, V., Finamore, A., Mellia, M., Munafò, M.M.: Uncovering the Big Players of the Web. In: Proceedings of the 4th International Conference on Traffic Monitoring and Analysis, TMA’12, pp. 15–28 (2012)

  6. Claise, B.: Cisco Systems NetFlow Services Export Version 9. RFC 3954 (Informational) (2004)

  7. Claise, B.: Specification of the IP flow information export (IPFIX) protocol for the exchange of IP traffic flow information. RFC 5101 (Standards Track) (2008)

  8. Garcia-Dorado, J., Finamore, A., Mellia, M., Meo, M., Munafò, M.M.: Characterization of ISP traffic: trends, user habits, and access technology impact. IEEE Trans. Netw. Serv. Manag. 9(2), 142–155 (2012)

    Article  Google Scholar 

  9. Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, J., Jahanian, F.: Internet Inter-domain Traffic. In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM’10, pp. 75–86 (2010)

  10. Mansfield-Devine, S.: Anonymous: serious threat or mere annoyance? Netw. Secur. 2011(1), 4–10 (2011)

    Article  Google Scholar 

  11. Sommer, R., Feldmann, A.: NetFlow: Information Loss or Win? In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement, IMW’02, pp. 173–174 (2002)

  12. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)

    Article  Google Scholar 

  13. Duffield, N., Lund, C., Thorup, M.: Estimating flow distributions from sampled flow statistics. IEEE/ACM Trans. Netw. 13(5), 933–946 (2005)

    Article  MathSciNet  Google Scholar 

  14. Draper, N., Guttman, I.: Bayesian estimation of the binomial parameter. Technometrics 13(3), 667–673 (1971)

    Article  MATH  Google Scholar 

  15. Tang, V.K.T., Sindler, R.B., Shirven, R.M.: Bayesian estimation of n in a binomial distribution. Tech. Rep. CRM 87–185, Center for Naval Analyses (1987)

  16. Tang, V.K.T., Sindler, R.B.: Confidence interval for parameter n in a binomial distribution. Tech. Rep. CRM 86–265, Center for Naval Analyses (1987)

  17. Finamore, A., Mellia, M., Meo, M., Munafò, M.M., Rossi, D.: Experiences of Internet traffic monitoring with Tstat. IEEE Netw. 25(3), 8–14 (2011)

    Article  Google Scholar 

  18. Haag, P.: Watch your flows with NfSen and NFDUMP. 50th RIPE Meeting. (2005). Accessed June 2013

  19. Fullmer, M., Romig, S.: The OSU Flow-tools Package and CISCO NetFlow Logs. In: Proceedings of the 14th USENIX conference on System administration, LISA’00, pp. 291–304 (2000)

  20. Inacio, C.M., Trammell, B.: YAF: Yet Another Flowmeter. In: Proceedings of the 24th International Conference on Large Installation System Administration, LISA’10, pp. 1–16 (2010)

  21. MaxMind: GeoIP Organization. (2013). Accessed June 2013

  22. Limmer, T., Dressler, F.: Flow-Based TCP Connection Analysis. In: Proceedings of the 2nd IEEE International Workshop on Information and Data Assurance, WIDA’09, pp. 376–383 (2009)

  23. Arlitt, M., Williamson, C.: An analysis of TCP reset behaviour on the Internet. SIGCOMM Comput. Commun. Rev. 35(1), 37–44 (2005)

    Article  Google Scholar 

  24. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  25. van Rijsbergen, C.: Information Retrieval, 2 edn. Butterworth, London (1979)

    Google Scholar 

  26. Lampert, R.T., Sommer, C., Munz, G., Dressler, F.: Vermont—A Versatile Monitoring Toolkit for IPFIX and PSAMP. In: Proceedings of the IEEE/IST Workshop on Monitoring, Attack Detection and Mitigation, MonAM’06 (2006)

  27. Čeleda, P., Kováčik, M., Koníř, T., Krmíček, V., Špringl, P., Žádník, M.: FlowMon Probe. Tech. Rep., CESNET (2007)

  28. Deri, L.: nProbe: An Open Source NetFlow Probe for Gigabit Networks. In: Proceedings of the Terena, TNC’03 (2003)

  29. Zseby, T., Molina, M., Duffield, N., Niccolini, S., Raspall, F.: Sampling and Filtering Techniques for IP Packet Selection. RFC 5475 (Standards Track) (2009)

  30. Estan, C., Varghese, G.: New directions in traffic measurement and accounting. SIGCOMM Comput. Commun. Rev. 32(4), 323–336 (2002)

    Article  Google Scholar 

  31. Estan, C., Keys, K., Moore, D., Varghese, G.: Building a Better NetFlow. In: Proceedings of the ACM SIGCOMM 2004 Conference, SIGCOMM’04, pp. 245–256 (2004)

  32. Dropbox: DropboxOps. (2013). Accessed June 2013

  33. Twitter: Status. (2013). Accessed June 2013

  34. Google: Apps Status Dashboard. (2013). Accessed June 2013

  35. Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre, R., Pras, A.: Inside Dropbox: Understanding Personal Cloud Storage Services. In: Proceedings of the 12th ACM Internet Measurement Conference, IMC’12, pp. 481–494 (2012)

  36. Pras, A., Sperotto, A., Moura, G.C.M., Drago, I., Barbosa, R.R.R., Sadre, R., de Oliveira Schmidt, R., Hofstede, R.: Attacks by “Anonymous” WikiLeaks Proponents not Anonymous. Tech. Rep. TR-CTIT-10-41, CTIT, University of Twente, Enschede (2010)

  37. Kossmann, D., Kraska, T., Loesing, S.: An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD’10, pp. 579–590 (2010)

  38. Lenk, A., Menzel, M., Lipsky, J., Tai, S., Offermann, P.: What Are You Paying for? Performance Benchmarking for Infrastructure-as-a-Service Offerings. In: Proceedings of the 4th IEEE International Conference on Cloud Computing, CLOUD’11, pp. 484–491 (2011)

  39. Li, A., Yang, X., Kandula, S., Zhang, M.: CloudCmp: Comparing Public Cloud Providers. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC’10, pp. 1–14 (2010)

  40. Meng, S., Iyengar, A.K., Rouvellou, I.M., Liu, L., Lee, K., Palanisamy, B., Tang, Y.: Reliable State Monitoring in Cloud Datacenters. In: Proceedings of the 5th IEEE International Conference on Cloud Computing, CLOUD’12, pp. 951–958 (2012)

  41. Meng, S., Liu, L.: Enhanced monitoring-as-a-service for effective cloud management. IEEE Trans. Comput. (2012)

  42. Hu, W., Yang, T., Matthews, J.N.: The good, the bad and the ugly of consumer cloud storage. SIGOPS Oper. Syst. Rev. 44(3), 110–115 (2010)

    Article  Google Scholar 

  43. Wang, G., Ng, T.E.: The Impact of Virtualization on Network Performance of Amazon EC2 Data Center. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, pp. 1–9 (2010)

  44. Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-art and research challenges. J. Internet Serv. Appl. 1, 7–18 (2010)

    Article  Google Scholar 

  45. Glatz, E., Dimitropoulos, X.: Classifying Internet One-Way Traffic. In: Proceedings of the 12th ACM Internet Measurement Conference, IMC’12, pp. 37–50 (2012)

  46. Schatzmann, D., Leinen, S., Kögel, J., Mühlbauer, W.: FACT: Flow-Based Approach for Connectivity Tracking. In: Proceedings of the 12th International Conference on Passive and Active Network Measurement, PAM’11, pp. 214–223 (2011)

  47. Caracas, A., Kind, A., Gantenbein, D., Fussenegger, S., Dechouniotis, D.: Mining Semantic Relations using NetFlow. In: Proceedings of the 3rd IEEE/IFIP International Workshop on Business-driven IT Management, BDIM’08, pp. 110–111 (2008)

  48. Bermudez, I., Mellia, M., Munafò, M.M., Keralapura, R., Nucci, A.: DNS to the Rescue: Discerning Content and Services in a Tangled Web. In: Proceedings of the 12th ACM Internet Measurement Conference, IMC’12, pp. 413–426 (2012)

  49. Quittek, J., Bryant, S., Claise, B., Aitken, P., Meyer, J.: Information Model for IP Flow Information Export. RFC 5102 (Standards Track) (2008)

  50. Trammell, B., Boschi, E.: An introduction to IP flow information export (IPFIX). Commun. Mag. 49(4), 89–95 (2011)

    Article  Google Scholar 

  51. Lewis, D.D., Gale, W.A.: A Sequential Algorithm for Training Text Classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’94, pp. 3–12 (1994)

Download references


This work has been carried out in the context of the FP7 FLAMINGO Network of Excellence Project (CNECT-ICT-318488), the EU FP7-257513 UniverSelf Collaborative Project and the IOP GenCom project Service Optimization and Quality (SeQual). SeQual is supported by the Dutch Ministry of Economic Affairs, Agriculture and Innovation via its agency Agentschap NL

Author information

Authors and Affiliations


Corresponding author

Correspondence to Idilio Drago.


Appendix 1: Background Flow Monitoring


We assume flow records to be defined as in NetFlow/IPFIX [7]: a unidirectional set of packets crossing an observation point, sharing a common set of attributes (the flow key). NetFlow v9 [6] uses a fixed flow key composed of source and destination IP addresses and port numbers, IP protocol number, IP type of service and the input interface index. IPFIX offers more flexibility by supporting variable flow definitions [7]. More important for this work, NetFlow and IPFIX [6, 49] define the following reasons for a flow record to be exported:

  1. 1.

    Idle timeout: No packets belonging to a flow have been observed for a specified period of time;

  2. 2.

    Active timeout: A flow has been active for a specified period of time;

  3. 3.

    End of flow detected: The end of a flow has been identified. For instance, TCP flags can be used to expire flow records;

  4. 4.

    Forced end: An external event, such as a shutdown of the exporting device, forces all flow records to be exported;

  5. 5.

    Lack of resources: Special heuristics can expire flow records prematurely in case of resource constraints on the exporting device.

After expiration, flow records are sent to flow collectors. In IPFIX, only the protocol for transporting records from exporters to collectors is specified. All details related to flow measurement, including the configuration of expiration rules, are considered implementation decisions [50]. Therefore, flow data can differ per exporting device. Since our method aims to be robust against different flow definitions, it must handle data exported according to any of these rules. See Sect. 3 for a discussion about the consequences of different flow definitions to our methodology.

Why Raw Flow Records are Not Sufficient

All NetFlow/IPFIX expiration rules can cause TCP connections to be split into multiple flow records. However, since TCP is bidirectional, one should expect that those rules would produce the same number of records in both traffic directions for a healthy service. Figure 10 illustrates the invalidity of this reasoning using traffic to Google and Facebook in our network.

Fig. 10
figure 10

Incoming/outgoing flow records for healthy Web services. a Google (non-sampled), b Facebook (non-sampled), c Google (sampled, p = 0.01), d Facebook (sampled, p = 0.01)

Figure 10a, b depict the number of flow records when packet sampling is not applied. The packet traces of our validations are converted into NetFlow records, with idle and active timeouts set to the typical values of our flow exporters (30 and 120 s, respectively). Footnote 6 The difference between the quantities is also shown, with negative values representing more incoming records. In general, more outgoing than incoming records are seen. This difference is mainly caused by the widespread use of TCP RST to terminate connections (see [23]). This can result in an extra flow record in either traffic direction, depending on timeout parameters.

Interestingly, an opposite pattern is seen when sampling is applied. For illustration, Fig. 10c, d depict the results of applying independent random sampling with probability p = 0.01 to the same packet traces. The figures show that the number of incoming records tends to be higher than the number of outgoing records. This happens because servers normally send more packets to clients, increasing the probability of sampling incoming flows.

Appendix 2: The Performance of Classification Models

As defined by [24], “a classification model (or classifier) is a mapping from instances to predicted classes”. In the classification problem studied in Sect. 3, the instances are TCP connections, which are mapped to discrete labels (healthy, unhealthy, unmatched). This appendix reviews the background on performance metrics commonly used to compare classifiers.

Confusion Matrix

The confusion matrix is a way of presenting results when evaluating classifiers [24]. It presents the original number of instances per class (i.e., the ground truth in a test set) versus the total number of instances that are predicted to belong to each class. Table 3 shows an example of such a matrix M, in a problem composed of n classes (n = 3 in our case). The diagonal contains correctly classified instances, whereas the remaining cells contain the errors per class.

Table 3 Confusion matrix M when classifying instances in n classes

Note that, in our problem, no instance can be correctly classified as unmatched, since no connections of this class exist in the ground truth set. This artificial state, however, is needed because the output of our approach may have different cardinality than the ground truth set. Instances in the unmatched state penalize our method when it wrongly merges flow records.

Performance Metrics

Several performance metrics to compare classification schemes can be derived from the confusion matrix. The most simplistic one is the accuracy A, which is defined as the overall fraction of correctly classified instances:

$$ A=\frac{\sum\nolimits _{i=1}^{n}M_{ii}} {\sum\nolimits _{i=1}^{n}\sum\nolimits _{j=1}^{n}M_{ij}} $$

The accuracy, however, fails to provide insights about the source of errors of a classifier. This is particularly the case when the different classes in the classification problem are unbalanced—that is, when the probability of observing instances of a specific class is much lower than the probability of observing the remaining classes. It is very common that the rarest class is exactly the most important one to be correctly classified. In our case, instances of the healthy class are much more frequent, accounting for almost 90 % of the samples in our test set. Any classification model could reach high accuracy by predicting that all instances belong to the healthy class, although such models would be of no use. Metrics that capture the performance of a classifier in a finer granularity are therefore needed.

Three metrics are of primary interest in our analysis: Precision (P i ), recall (R i ), and F-measure (F i ) [24, 25]. Assuming \(\varvec{\omega}\) to be the classes in a classification problem (i.e., \({\varvec{\omega}}=(healthy,unhealthy,unmatched)\)), these metrics are defined as a function of the class ω i . The precision P i represents the fraction of instances correctly classified as being of class ω i . It can be calculated from the confusion matrix, as follows.

$$ P_{i}=\frac{M_{ii}}{\sum\nolimits _{j=1}^{n}M_{ji}} $$

The recall R i is the fraction of all original instances of class ω i that are correctly classified:

$$ R_{i}=\frac{M_{ii}}{\sum\nolimits _{j=1}^{n}M_{ij}} $$

As for the overall accuracy, tuning a classifier to optimize only one of these metrics for a specific class may produce degenerated models. For instance, a classifier that labels all instances as ω i has R i  = 1, but P i equals to the prior probability of ω i . To overcome that, the F-measure (F i ) can be applied to assess the classifier:

$$ F_{i}=\frac{P_{i}R_{i}}{(1-\alpha)P_{i}+\alpha R_{i}}\quad 0\leq\alpha\leq1 $$

The F-measure is a combination of precision and recall that increases faster when both metrics are simultaneously increased. The parameter α weights the importance of each metric in the results. This can be used, for instance, when failing to identify instances of a class has a different cost than making classification mistakes. In all our experiments, we calculate F i using α = 0.5, which means that both precision and recall have equal importance. We refer the readers to [25] for a deeper discussion and to [51] for a practical example of the F-measure use in another domain.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Drago, I., Hofstede, R., Sadre, R. et al. Measuring Cloud Service Health Using NetFlow/IPFIX: The WikiLeaks Case. J Netw Syst Manage 23, 58–88 (2015).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Cloud computing
  • Performance
  • Measurements