Related Work and Concepts

Cordeiro, Robson  L. F.; Faloutsos, Christos; Traina Júnior, Caetano

doi:10.1007/978-1-4471-4890-6_2

Robson L. F. Cordeiro⁴,
Christos Faloutsos⁵ &
Caetano Traina Júnior⁴

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1972 Accesses

Abstract

This chapter presents the main background knowledge relevant to the book. Sections 2.1 and 2.2 describe the areas of processing complex data and knowledge discovery in traditional databases. The task of clustering complex data is discussed in Sect. 2.3, while the task of labeling such kind of data is described in Sect. 2.4. Section 2.5 introduces the MapReduce framework, a promising tool for large scale data analysis, which has been proven to offer one valuable support to the execution of data mining algorithms in a parallel processing environment. Section 2.6 concludes the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.google.com

References

Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. SDM, USA, In (2007)
Google Scholar
Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min. 1, 111–127 (2008). doi:10.1002/sam.v1:3
Article MathSciNet Google Scholar
Aggarwal, C., Yu, P.: Redefining clustering for high-dimensional applications. IEEE TKDE 14(2), 210–225 (2002). doi:10.1109/69.991713
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec. 29(2), 70–81 (2000). doi:10.1145/335191.335383
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998). doi:10.1145/276305.276314
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999). doi:10.1145/304181.304188
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005). doi:10.1007/s10618-005-1396-1
Article MathSciNet Google Scholar
Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SDM. SIAM (2006).
Google Scholar
Ando, S., Iba, H.: Classification of gene expression profile using combinatory method of evolutionary computation and machine learning. Genet. Program Evolvable Mach. 5, 145–156 (2004). doi:10.1023/B:GENP.0000023685.83861.69
Article Google Scholar
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: ICDT, pp. 217–235. UK (1999).
Google Scholar
Blicher, A.P.: Edge detection and geometric methods in computer vision (differential topology, perception, artificial intelligence, low-level). Ph.D. thesis, University of California, Berkeley (1984). AAI8512758
Google Scholar
Bohm, C., Kailing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM ’04: Proceedings of the 4th IEEE International Conference on Data Mining, pp. 27–34. IEEE Computer Society, Washington, DC, USA (2004).
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
MATH Google Scholar
Chan, T.F., Shen, J.: Image processing and analysis-variational, PDE, wavelet, and stochastic methods. SIAM (2005).
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: USENIX’06. Berkeley, CA, USA (2006).
Google Scholar
Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: KDD, pp. 84–93. NY, USA (1999). doi:http://doi.acm.org/10.1145/312129.312199
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: The, VLDB Journal, pp. 426–435 (1997).
Google Scholar
Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: Apté, C., Ghosh, J., Smyth, P. (eds.) KDD, pp. 690–698. ACM (2011).
Google Scholar
Dash, M., Liu, H., Yao, J.: Dimensionality reduction for unsupervised data. In: Proceedings of the 9th IEEE International Conference on Tools with, Artificial Intelligence (ICTAI’97), pp. 532–539 (1997).
Google Scholar
Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A 2, 1160–1169 (1985). doi:10.1364/JOSAA.2.001160
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. OSDI (2004)
Google Scholar
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) SDM (2004)
Google Scholar
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. 14(1), 63–97 (2007). doi:10.1007/s10618-006-0060-8
Article MathSciNet Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996).
Google Scholar
Fayyad, U.: A data miner’s story-getting to know the grand challenges. In: Invited Innovation Talk, KDD (2007). Slide 61. Available at: http://videolectures.net/kdd07_fayyad_dms/
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–34 (1996).
Google Scholar
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). J. Roy. Stat. Soc. B 66(4), 815–849 (2004). doi:ideas.repec.org/a/bla/jorssb/v66y2004i4p815-849.html
Article MathSciNet MATH Google Scholar
Hadoop information. http://hadoop.apache.org/
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. Syst. Man Cybern. IEEE Trans. 3(6), 610–621 (1973). doi:10.1109/TSMC.1973.4309314
Article Google Scholar
Huang, J., Kumar, S., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and, Pattern Recognition, pp. 762–768 (1997). doi:10.1109/CVPR.1997.609412
Kailing, K., Kriegel, H.: Kroger. P, Density-connected subspace clustering for highdimensional data (2004).
Google Scholar
Kang, U., Tsourakakis, C., Faloutsos, C.: Pegasus: a peta-scale graph mining system-implementation and observations. ICDM (2009).
Google Scholar
Kang, U., Tsourakakis, C., Appel, A.P., Faloutsos, C., Leskovec., J.: Radius plots for mining tera-byte scale graphs: algorithms, patterns, and observations. SDM (2010).
Google Scholar
Korn, F., Pagel, B.U., Faloutsos, C.: On the ‘dimensionality curse’ and the ‘self-similarity blessing. IEEE Trans. Knowl. Data Eng. (TKDE) 13(1), 96–111 (2001). doi:10.1109/69.908983
Article Google Scholar
Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 250–257. Washington, USA (2005). doi:http://dx.doi.org/10.1109/ICDM.2005.5
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009). doi:10.1145/1497577.1497578
Article Google Scholar
Lämmel, R.: Google’s mapreduce programming model-revisited. Sci. Comput. Program. 70, 1–30 (2008)
Article MATH Google Scholar
Lazebnik, S., Raginsky, M.: An empirical bayes approach to contextual region classification. In: CVPR, pp. 2380–2387. IEEE (2009).
Google Scholar
Lloyd, S.: Least squares quantization in pcm. Inf. Theory IEEE Trans. 28(2), 129–137 (1982). doi:10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Long, F., Zhang, H., Feng, D.D.: Fundamentals of content-based image retrieval. In: Multimedia Information Retrieval and Management. Springer (2002).
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967).
Google Scholar
Mehrotra, S., Rui, Y., Chakrabarti, K., Ortega, M., Huang, T.S.: Multimedia analysis and retrieval system. In: Proceedings of 3rd International Workshop on Multimedia. Information Systems, pp. 25–27 (1997).
Google Scholar
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008).
Google Scholar
Moise, G., Sander, J., Ester, M.: P3C: a robust projected clustering algorithm. In: ICDM, pp. 414–425. IEEE Computer Society (2006).
Google Scholar
Moise, G., Sander, J., Ester, M.: Robust projected clustering. Knowl. Inf. Syst 14(3), 273–298 (2008). doi:10.1007/s10115-007-0090-6
Article MATH Google Scholar
Moise, G., Zimek, A., Kröger, P., Kriegel, H.P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowl. Inf. Syst. 21(3), 299–326 (2009)
Article Google Scholar
Mount, D.M., Arya, S.: Ann: a library for approximate nearest neighbor searching. http://www.cs.umd.edu/mount/ANN/
Ng, E.K.K., Fu, A.W.: Efficient algorithm for projected clustering. In: ICDE ’02: Proceedings of the 18th International Conference on Data Engineering, p. 273. IEEE Computer Society, Washington, DC, USA (2002).
Google Scholar
Ng, E.K.K., chee Fu, A.W., Wong, R.C.W.: Projective clustering by histograms. TKDE 17(3), 369–383 (2005). doi:10.1109/TKDE.2005.47
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD ’08, pp. 1099–1110 (2008).
Google Scholar
Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Gcap: graph-based automatic image captioning. In: CVPRW ’04: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition, Workshop (CVPRW’04) vol. 9, p. 146 (2004).
Google Scholar
Papadimitriou, S., Sun, J.: Disco: distributed co-clustering with map-reduce. ICDM (2008)
Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl 6(1), 90–105 (2004). doi:10.1145/1007730.1007731
Article Google Scholar
Pass, G., Zabih, R., Miller, J.: Comparing images using color coherence vectors. In: ACM Multimedia, pp. 65–73 (1996).
Google Scholar
Pentland, A., Picard, R.W., Sclaroff, S.: Photobook: tools for content-based manipulation of image databases. In: Storage and Retrieval for Image and Video Databases (SPIE), pp. 34–47 (1994).
Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427. USA (2002). doi:http://doi.acm.org/10.1145/564691.564739
Rangayyan, R.M.: Biomedical Image Analysis. CRC Press, Boca Raton (2005)
Google Scholar
Rezende, S.O.: Sistemas Inteligentes: Fundamentos e Aplicações. Ed , Manole Ltda (2002). (in Portuguese)
Google Scholar
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz A. (eds.) ECCV (1), Lecture Notes in Computer Science, vol. 3951, pp. 1–15. Springer (2006).
Google Scholar
Sonka, M., Hlavac, V., Boyle, R.: Image Processing: Analysis and Machine Vision, 2nd edn. Brooks/Cole Pub Co, Pacific Grove (1998)
Google Scholar
Sousa, E.P.M.: Identificação de correlações usando a teoria dos fractais. Ph.D. Dissertation, Computer Science Department–ICMC, University of São Paulo-USP, São Carlos, Brazil (2006). (in Portuguese).
Google Scholar
Sousa, E.P.: Caetano Traina, J., Traina, A.J., Wu, L., Faloutsos, C.: A fast and effective method to find correlations among attributes in databases. Data Min. Knowl. Discov. 14(3), 367–407 (2007). doi:10.1007/s10618-006-0056-4
Article MathSciNet Google Scholar
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: Cell histograms versus color histograms for image representation and retrieval. Knowl. Inf. Syst. 5, 315–336 (2003). doi:10.1007/s10115-003-0084-y. http://portal.acm.org/citation.cfm?id=959128.959131
Steinhaus, H.: Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci. 1, 801–804 (1956). (in French).
Google Scholar
Tong, H., Faloutsos, C., Pan, J.Y.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14, 327–346 (2008). doi:10.1007/s10115-007-0094-2. http://portal.acm.org/citation.cfm?id=1357641.1357646
Torralba, A.B., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Article Google Scholar
Traina, A.J.M., Traina, C., Bueno, J.M., Chino, F.J.T., Azevedo-Marques, P.: Efficient content-based image retrieval through metric histograms. World Wide Web 6, 157–185 (2003). doi:10.1023/A:1023670521530
Google Scholar
Traina Jr, C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: high performance metric trees minimizing overlap between nodes. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) International Conference on Extending Database Technology (EDBT). Lecture Notes in Computer Science, vol. 1777, pp. 51–65. Springer, Konstanz, Germany (2000).
Google Scholar
Traina Jr., C., Traina, A.J.M., Santos Filho, R.F., Faloutsos, C.: How to improve the pruning ability of dynamic metric access methods. In: International Conference on Information and Knowledge Management (CIKM), pp. 219–226. ACM Press, McLean, VA, USA (2002)
Google Scholar
Tung, A.K.H., Xu, X., Ooi, B.C.: Curler: finding and visualizing nonlinear correlation clusters. In: SIGMOD, pp. 467–478 (2005). doi:http://doi.acm.org/10.1145/1066157.1066211
Vieira, M.R., Traina Jr, C., Traina, A.J.M., Chino, F.J.T.: Dbm-tree: a dynamic metric access method sensitive to local density data. In: Lifschitz, S. (ed.) Brazilian Symposium on Databases (SBBD), vol. 1, pp. 33–47. SBC, Brasìlia, DF (2004)
Google Scholar
Wang, W., Yang, J., Muntz, R.: Sting: a statistical information grid approach to spatial data mining. In: VLDB, pp. 186–195 (1997).
Google Scholar
Wiki: http://wiki.apache.org/hadoop/hbase. Hadoop’s Bigtable-like structure
Woo, K.G., Lee, J.H., Kim, M.H., Lee, Y.J.: Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46(4), 255–271 (2004)
Article Google Scholar
Yip, K.Y., Ng, M.K.: Harp: a practical projected clustering algorithm. IEEE Trans. on Knowl. Data Eng. 16(11), 1387–1397 (2004). doi:http://dx.doi.org/10.1109/TKDE.2004.74. Member-David W. Cheung
Google Scholar
Yip, K.Y., Cheung, D.W., Ng, M.K.: On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In: ICDE, pp. 329–340. Washington, USA (2005). doi:http://dx.doi.org/10.1109/ICDE.2005.96
Zhang, B., Hsu, M., Dayal, U.: K-harmonic means-a spatial clustering algorithm with boosting. In: Roddick, J.F., Hornsby, K. (eds.) TSDM. Lecture Notes in Computer Science, vol. 2007, pp. 31–45. Springer (2000).
Google Scholar
Zhang, H.: The optimality of naive Bayes. In: V. Barr, Z. Markov (eds.) FLAIRS Conference. AAAI Press (2004). http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf
Zhou, C., Xiao, W., Tirpak, T.M., Nelson, P.C.: Evolving accurate and compact classification rules with gene expression programming. IEEE Trans. Evol. Comput. 7(6), 519–531 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department (ICMC), University of Sao Paulo, Av. do Trabalhador Saocarlense 400, São Carlos, SP, 13566-590, Brazil
Robson L. F. Cordeiro & Caetano Traina Júnior
Department of Computer Science, Carnegie Mellon University, Forbes Ave. 5000, Pittsburgh, PA, 15213, USA
Christos Faloutsos

Authors

Robson L. F. Cordeiro
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Caetano Traina Júnior
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robson L. F. Cordeiro .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cordeiro, R. ., Faloutsos, C., Traina Júnior, C. (2013). Related Work and Concepts. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4890-6_2
Published: 11 January 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4889-0
Online ISBN: 978-1-4471-4890-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics