Advertisement

Data summarization: a survey

  • Mohiuddin Ahmed
Survey Paper

Abstract

Summarization has been proven to be a useful and effective technique supporting data analysis of large amounts of data. Knowledge discovery from data (KDD) is time consuming, and summarization is an important step to expedite KDD tasks by intelligently reducing the size of processed data. In this paper, different summarization techniques for structured and unstructured data are discussed. The key finding of this survey is that not all summarization techniques create a summary suitable for further analysis. It is highlighted that sampling techniques are a viable way of creating a summary for further knowledge discovery such as anomaly detection from summary. Also different summary evaluation metrics are discussed.

Keywords

Summarization Structured data Unstructured data Machine learning Statistics Semantics Natural language processing Cyber security 

References

  1. 1.
    Salomon D (2006) Data compression: the complete reference. Springer, New YorkzbMATHGoogle Scholar
  2. 2.
    WinZip (2016) Accessed on 07 March 2016Google Scholar
  3. 3.
    Hoplaros D, Tari Z, Khalil I (2014) Data summarization for network traffic monitoring. J Netw Comput Appl 37:194–205CrossRefGoogle Scholar
  4. 4.
    Papalexakis EE, Beutel A, Steenkiste P (2012) Network anomaly detection using co-clustering. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM’12, Washington, DC, USA. IEEE Computer Society, pp 403–410Google Scholar
  5. 5.
    The Australian Cyber Security Centre (2016) Accessed on 24 May 2016Google Scholar
  6. 6.
    Ahmed M, Mahmood A, Jiankun H (2015) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31CrossRefGoogle Scholar
  7. 7.
    Hawkins D (1980) Identification of outliers (monographs on statistics and applied probability), 1st edn. Springer, BerlinCrossRefGoogle Scholar
  8. 8.
    Barnett V, Lewis T (1978) Outliers in statistical data, 2nd edn. Wiley, New YorkzbMATHGoogle Scholar
  9. 9.
    Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  10. 10.
    Laurikkala J, Juhola M, Kentala E (2000) Informal identification of outliers in medical data. In: The fifth international workshop on intelligent data analysis in medicine and pharmacologyGoogle Scholar
  11. 11.
    Dantong Y, Sheikholeslami G, Zhang A (2002) Findout: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412CrossRefGoogle Scholar
  12. 12.
    Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases, VLDB’98, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp 392–403Google Scholar
  13. 13.
    Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. SIGMOD Rec 29(2):427–438CrossRefGoogle Scholar
  14. 14.
    Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Disc 16(3):349–364MathSciNetCrossRefGoogle Scholar
  15. 15.
    Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. SIGMOD Rec 29(2):93–104CrossRefGoogle Scholar
  16. 16.
    Hu T, Sung SY (2003) Detecting pattern-based outliers. Pattern Recogn Lett 24(16):3059–3068CrossRefGoogle Scholar
  17. 17.
    Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery, lecture notes in computer science, vol 2454. Springer, Berlin, pp 170–180CrossRefGoogle Scholar
  18. 18.
    Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefzbMATHGoogle Scholar
  19. 19.
    Aggarwal C, Yu S (2005) An effective and efficient algorithm for high-dimensional outlier detection. VLDB J 14(2):211–221CrossRefGoogle Scholar
  20. 20.
    Jagadish HV, Koudas Nick, Muthukrishnan S (1999) Mining deviants in a time series database. In: Proceedings of the 25th international conference on very large data bases, VLDB’99, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp 102–113Google Scholar
  21. 21.
    Shekhar S, Chang-Tien L, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7(2):139–166CrossRefGoogle Scholar
  22. 22.
    Cheng T, Li Z (2006) A multiscale approach for spatio-temporal outlier detection. Trans GIS 10(2):253–263CrossRefGoogle Scholar
  23. 23.
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58CrossRefGoogle Scholar
  24. 24.
    Ahmed M, Mahmood AN, Hu J (2014) Outlier detection, chapter 1. In: Pathan ASK (ed) The state of the art in intrusion prevention and detection. CRC Press, New York, pp 3–21CrossRefGoogle Scholar
  25. 25.
    Ahmed M, Mahmood AN, Rafiqul Islam M (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288CrossRefGoogle Scholar
  26. 26.
    Ahmed M, Anwar A, Mahmood AN, Shah Z, Maher MJ (2015) An investigation of performance analysis of anomaly detection techniques for big data in scada systems. EAI Endorsed Trans Ind Netw Intell Syst 15(3):1–16Google Scholar
  27. 27.
    Coffman KG, Odlyzko AM (2002) Internet growth: is there a “Moore’s law” for data traffic? In: Abello J, Pardalos PM, Resende MG (eds) Handbook of massive data sets. Kluwer Academic Publishers, Norwell, pp 47–93CrossRefGoogle Scholar
  28. 28.
    Kamma D, Geetha G, Neela JP (2013) Countering Parkinson’s law for improving productivity. In: Proceedings of the 6th India software engineering conference, ISEC’13, New York, NY, USA. ACM, pp 91–96Google Scholar
  29. 29.
    The Zettabyte Era-Trends and Analysis. Accessed 02 April 2016Google Scholar
  30. 30.
    Ahmed M, Mahmood AN, Maher MJ (2015) An efficient approach for complex data summarization using multiview clustering. In: Jung JJ, Badica C, Kiss A (eds) Scalable information systems. Springer, Cham, pp 38–47Google Scholar
  31. 31.
    Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12(3):355–378CrossRefGoogle Scholar
  32. 32.
    Ahmed M, Mahmood AN, Maher MJ (2015) A novel approach for network traffic summarization. In: Jung JJ, Badica C, Kiss A (eds) Scalable information systems. Springer, Cham, pp 51–60Google Scholar
  33. 33.
    Ahmed M, Mahmood AN, Maher MJ (2015) An efficient technique for network traffic summarization using multiview clustering and statistical sampling. EAI Endorsed Trans Scalable Inf Syst 15(5):1–9Google Scholar
  34. 34.
    Ahmed M, Mahmood AN (2014) Clustering based semantic data summarization technique: a new approach. In: IEEE 9th conference on industrial electronics and applications (ICIEA), 2014, pp 1780–1785Google Scholar
  35. 35.
    Mahmood AN (2008) Hierarchical clustering and summarization of network traffic data. Ph.D. theses, University of MelbourneGoogle Scholar
  36. 36.
    Liu Y, Dighe A, Safavi T, Koutra D (2016) A graph summarization: a survey. CoRR. arXiv:1612.04883
  37. 37.
    Elfayoumy S, Thoppil J (2014) A survey of unstructured text summarization techniques. Int J Adv Comput Sci Appl 5(7):149–154Google Scholar
  38. 38.
    Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66CrossRefGoogle Scholar
  39. 39.
    Das D, Martins AFT (2007) A survey on automatic text summarization. Technical report, literature survey for the language and statistics II course at Carnegie Mellon UniversityGoogle Scholar
  40. 40.
    Nenkova A, McKeown K (2012) A survey of text summarization techniques. Springer, Boston, pp 43–76Google Scholar
  41. 41.
    Hesabi ZR, Tari Z, Goscinski A, Fahad A, Khalil I, Queiroz C (2015) Data summarization techniques for big data—a survey. Springer, New York, pp 1109–1152Google Scholar
  42. 42.
    Hesabi ZR, Tari Z, Goscinski A, Fahad A, Khalil I, Queiroz C (2015) Data summarization techniques for big data—a survey. In: Khan SU, Zomaya AY (eds) Handbook on data centers. Springer, New York, pp 1109–1152Google Scholar
  43. 43.
    Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28(4):399–408CrossRefGoogle Scholar
  44. 44.
    Luhn (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165MathSciNetCrossRefGoogle Scholar
  45. 45.
    Baxendale PB (1958) Machine-made index for technical literature: an experiment. IBM J Res Dev 2(4):354–361CrossRefGoogle Scholar
  46. 46.
    Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’95, New York, NY, USA. ACM, pp 68–73Google Scholar
  47. 47.
    Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285CrossRefzbMATHGoogle Scholar
  48. 48.
    Aone C, Okurowski ME, Gorlinsky J, Larsen B (1999) A trainable summarizer with knowledge acquired from robust nlp techniques. In: Mani I, Maybury MT (eds) Advances in automatic text summarization. MIT Press, Cambridge, pp 71–80Google Scholar
  49. 49.
    Lin C-Y, Hovy E (1997) Identifying topics by position. In: Proceedings of the fifth conference on applied natural language processing, ANLC’97, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 283–290Google Scholar
  50. 50.
    Lin C-Y (1999) Training a selection function for extraction. In: Proceedings of the eighth international conference on information and knowledge management, CIKM’99, New York, NY, USA. ACM, pp 55–62Google Scholar
  51. 51.
    Conroy JM, O’leary DP (2001) Text summarization via hidden Markov models. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’01, New York, NY, USA. ACM, pp 406–407Google Scholar
  52. 52.
    McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Svore K, Vanderwende L, Burges C (2007) Enhancing single-document summarization by combining RankNet and third-party sources. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp 448–457Google Scholar
  54. 54.
    Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Moens M-F, Szpakowicz S (eds) Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain. Association for Computational Linguistics, pp 74–81Google Scholar
  55. 55.
    Barzilay R, Elhadad M (1997) Using lexical chains for text summarization. In: Proceedings of the ACL workshop on intelligent scalable text summarization, pp 10–17Google Scholar
  56. 56.
    Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization, NAACL-ANLP-AutoSum’00, Stroudsburg, PA, USA, vol 4. Association for Computational Linguistics, pp 21–30Google Scholar
  57. 57.
    Barzilay R, McKeown KR, Elhadad M (1999) Information fusion in the context of multi-document summarization. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics, ACL’99, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 550–557Google Scholar
  58. 58.
    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’98, Melbourne, Australia. ACM, pp 335–336Google Scholar
  59. 59.
    Evans DK, Mckeown K, Klavans JL (2005) Similarity-based multilingual multi-document summarization. IEEE Trans Inf Theory 49:1–8Google Scholar
  60. 60.
    Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic. Springer, Berlin, pp 159–168Google Scholar
  61. 61.
    Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, SIGMOD’96, New York, NY, USA. ACM, pp 103–114Google Scholar
  62. 62.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297Google Scholar
  63. 63.
    Breunig MM, Kriegel H-P, Sander J (2000) Fast Hierarchical Clustering Based on Compressed Data and OPTICS. In: Proceedings of 4th European conference on principles of data mining and knowledge discovery, PKDD 2000 Lyon, France, 13–16 Sept 2000. Springer, Berlin, pp 232–242Google Scholar
  64. 64.
    Breunig MM, Kriegel H-P, Krger P, Sander J (2001) Data bubbles: quality preserving performance boosting for hierarchical clustering. In: ACM SIGMOD conference, pp 79–90Google Scholar
  65. 65.
    Zhou J, Sander J (2003) Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces. In: Proceedings of the 29th international conference on very large data bases, VLDB ’03, vol 29. VLDB Endowment, pp 452–463Google Scholar
  66. 66.
    Patra BK, Nandi S (2011) Tolerance rough set theory based data summarization for clustering large datasets. In: Peters JF, Skowron A, Sakai H, Chakraborty MK, Slezak D, Hassanien AE, Zhu W (eds) Transactions on rough sets XIV. Springer, Berlin, Heidelberg, pp 139–158Google Scholar
  67. 67.
    Cochran WG (1977) Sampling techniques, 3rd edn. Wiley, New YorkzbMATHGoogle Scholar
  68. 68.
    Pouzols FM, Lopez DR, Barros AB (2011) Summarization and analysis of network traffic flow records. In: Mining and control of network traffic by computational intelligence, vol 342 of studies in computational intelligence. Springer, Berlin, Heidelberg, pp 147–189Google Scholar
  69. 69.
    Yager RR (1982) A new approach to the summarization of data. Inf Sci 28(1):69–86MathSciNetCrossRefzbMATHGoogle Scholar
  70. 70.
    Cai Y, Cercone N, Han J (1991) Attribute-oriented induction in relational databases. In: Knowledge discovery in databases. AAAI/MIT Press, pp 213–228Google Scholar
  71. 71.
    Han J, Yongjian F, Huang Y, Cai Y, Cercone N (1994) DBLearn: a system prototype for knowledge discovery in relational databases. SIGMOD Rec (ACM Special Interest Group on Management of Data) 23(2):516Google Scholar
  72. 72.
    Han J, Fu Y, Wang W, Chiang J, Gong W, Koperski K, Li D, Lu Y, Rajan A, Stefanovic N, Xia B, Zaiane OR (1996) Dbminer: a system for mining knowledge in large relational databases. In: Proceedings of 1996 international conference on data mining and knowledge discovery, KDD’96. AAAI Press, pp 250–255Google Scholar
  73. 73.
    Han J, Cai Y, Cercone N (1992) Knowledge discovery in databases: an attribute oriented approach. In: Proceedings of the 18th international conference on very large data bases (VLDB’92). Morgan Kaufmann, pp 547–559Google Scholar
  74. 74.
    Han J, Fu Y (1996) Exploration of the power of attribute-oriented induction. In: Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 399–421Google Scholar
  75. 75.
    Jagadish HV, Madar J, Ng RT (1999) Semantic compression and pattern extraction with fascicles. In: Proceedings of the 25th international conference on very large data bases, VLDB’99, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp 186–198Google Scholar
  76. 76.
    Shivnath B, Garofalakis M, Rastogi R (2001) Spartan: a model-based semantic compression system for massive data tables. In: International conference on management of data (SIGMOD 2001)Google Scholar
  77. 77.
    Judea P (2000) Causality: models, reasoning, and inference. Cambridge University Press, New YorkzbMATHGoogle Scholar
  78. 78.
    Pham Q-K, Raschia G, Mouaddib N, Saint-Paul R, Benatallah B (2009) Time sequence summarization to scale up chronology-dependent applications. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM’09, New York, NY, USA. ACM, pp 1137–1146Google Scholar
  79. 79.
    Jagadish HV, Ng RT, Ooi BC, Tung A (2004) Itcompress: an iterative semantic compression algorithm. In: Proceedings of 20th international conference on Data engineering, 2004, pp 646–657Google Scholar
  80. 80.
    Quang-Khai P (2010) Time sequence summarization: theory and applications. Theses, Université de NantesGoogle Scholar
  81. 81.
    Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. MIT Press, CambridgezbMATHGoogle Scholar
  82. 82.
    Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167CrossRefGoogle Scholar
  83. 83.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  84. 84.
    Ha-Thuc V, Nguyen D-C, Srinivasan P (2008) A quality-threshold data summarization algorithm. In: Proceedings of IEEE international conference on research, innovation and vision for the future (RIVF), pp 240–246Google Scholar
  85. 85.
    Wendel P, Ghanem M, Guo Y (2005) Scalable clustering on the data grid. In: Proceedings of the 5th IEEE international symposium cluster computing and the grid (CCGrid)Google Scholar
  86. 86.
    More P, Hall LO (2004) Scalable clustering: a distributed approach. Proc IEEE Int Conf Fuzzy Syst 1:143–148Google Scholar
  87. 87.
    Aggarwal C (ed) (2007) Data streams—models and algorithms. Springer, BerlinzbMATHGoogle Scholar
  88. 88.
    Aggarwal CC (2006) On biased reservoir sampling in the presence of stream evolution. In: Proceedings of the 32nd international conference on very large data bases, VLDB’06. VLDB Endowment, pp 607–618Google Scholar
  89. 89.
    Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57MathSciNetCrossRefzbMATHGoogle Scholar
  90. 90.
    Aggarwal CC, Yu PS (2007) A survey of synopsis construction in data streams. In: CharuC A (ed) Data streams, advances in database systems, vol 31. Springer, Berlin, pp 169–207Google Scholar
  91. 91.
    Tatbul N, Çetintemel U, Zdonik S, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: Proceedings of the 29th international conference on very large data bases, VLDB ’03, vol 29. VLDB Endowment, pp 309–320Google Scholar
  92. 92.
    Tatbul EN (2007) Load shedding techniques for data stream management systems. Ph.D. thesis, Providence, RI, USA. AAI3272068Google Scholar
  93. 93.
    Poosala V, Ganti V, Ioannidis YE (1999) Approximate query answering using histograms. IEEE Data Eng Bull 22:5–14Google Scholar
  94. 94.
    Poosala V, Haas PJ, Ioannidis YE, Shekita EJ (1996) Improved histograms for selectivity estimation of range predicates. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, SIGMOD’96, New York, NY, USA. ACM, pp 294–305Google Scholar
  95. 95.
    Kooi RP (1980) The optimization of queries in relational databases. Ph.D. thesis, Cleveland, OH, USA. AAI8109596Google Scholar
  96. 96.
    Poosala V, Ioannidis YE (1997) Selectivity estimation without the attribute value independence assumption. In: Proceedings of the 23rd international conference on very large data bases, VLDB’97, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp 486–495Google Scholar
  97. 97.
    Broder A, Mitzenmacher M (2004) Network applications of bloom filters: a survey. Internet Math 1(4):485–509MathSciNetCrossRefzbMATHGoogle Scholar
  98. 98.
    Rivetti N, Busnel Y, Mostefaoui A (2015) Efficiently summarizing data streams over sliding windows. In: IEEE 14th international symposium on network computing and applications (NCA), 2015, pp 151–158Google Scholar
  99. 99.
    Babcock B, Datar M, Motwani R, O’Callaghan L (2002) Sliding window computations over data streams. Technical report 2002-25, Stanford InfoLabGoogle Scholar
  100. 100.
    Babcock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’03, New York, NY, USA. ACM, pp 234–243Google Scholar
  101. 101.
    Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’02, New York, NY, USA. ACM, pp 1–16Google Scholar
  102. 102.
    Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends Theor Comput Sci 1(2):117–236MathSciNetCrossRefzbMATHGoogle Scholar
  103. 103.
    Keim D, Heczko M, Are W (2001) Wavelets and their applications in databases. In: Tutorial notes of ICDE 2001Google Scholar
  104. 104.
    Stollnitz Eric J, Derose Tony D, Salesin David H (1996) Wavelets for computer graphics: theory and applications. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  105. 105.
    Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithms 55(1):58–75MathSciNetCrossRefzbMATHGoogle Scholar
  106. 106.
    Alon N, Matias Y, Szegedy M (1996) The space complexity of approximating the frequency moments. In: Proceedings of the 28th annual ACM symposium on theory of computing, STOC’96, New York, NY, USA. ACM, pp 20–29Google Scholar
  107. 107.
    Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Proceedings of the 29th international colloquium on automata, languages and programming, ICALP’02, London, UK. Springer, pp 693–703Google Scholar
  108. 108.
    Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the 26th international conference on very large data bases, VLDB’00, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp 363–372Google Scholar
  109. 109.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604Google Scholar
  110. 110.
    Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho ACPLF, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13:1–13:31CrossRefzbMATHGoogle Scholar
  111. 111.
    Alex N, Hasenfuss A, Hammer B (2009) Patch clustering for massive data sets. Neurocomputing 72(7–9):1455–1469CrossRefGoogle Scholar
  112. 112.
    Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithmics 17:2.4:2.1–2.4:2.30MathSciNetCrossRefzbMATHGoogle Scholar
  113. 113.
    Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, SODA’07, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics, pp 1027–1035Google Scholar
  114. 114.
    Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB ’03, vol 29. VLDB Endowment, pp 81–92Google Scholar
  115. 115.
    Kranen P, Assent I, Baldauf C, Seidl T (2009) Self-adaptive anytime stream clustering. In: 9th IEEE international conference on data mining, 2009, ICDM ’09, pp 249–258Google Scholar
  116. 116.
    Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: 2006 SIAM conference on data mining, pp 328–339Google Scholar
  117. 117.
    Li T, Chen Y (2009) Stream data clustering based on grid density and attraction. ACM Trans Knowl Discov Data 3(3):12:1–12:27Google Scholar
  118. 118.
    Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172Google Scholar
  119. 119.
    Lin C-Y, Cao G, Gao J, Nie J-Y (2006) An information-theoretic approach to automatic evaluation of summaries. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, HLT-NAACL ’06, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 463–470Google Scholar
  120. 120.
    Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28(4):399–408CrossRefGoogle Scholar
  121. 121.
    Shah Z, Mahmood AN, Barlow M (2016) Computing hierarchical summary of the data streams. In: Bailey J, Khan L, Washio T, Dobbie G, Huang JZ, Wang R (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 168–179Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of ICT and Library StudiesCanberra Institute of TechnologyReidAustralia

Personalised recommendations