Abstract
Most enterprises are generating data at an unprecedented way. On the other hand, traditional consumers are transforming into digital consumers due to high adoption of social media and networks by individuals. Since transactions on these sites are huge and increasing rapidly, social networks have become the new target for several business applications. Big Data mining deals with tapping large amount of data that is complex with a wide variety of data types and provides actionable insights at the right time. The search and mining applications over Big Data resulted in the development of a new kind of technologies, platforms, and frameworks. This chapter introduces the notion of search and data mining in the Big Data context and technologies supporting Big Data. We also present some data mining techniques that deal with scalability and heterogeneity of large data. We further discuss clustering social networks using topology discovery and also address the problem of evaluating and managing text-based sentiments from social network media. Further, this chapter accentuates some of the open source tools for Big Data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. (JMLR) 11, 1601–1604 (2010)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Chan, S.Y., Leung, I.X., Li.: Fast centrality approximation in modular networks. In: 1st ACM International Workshop on Complex Networks meet Information and Knowledge Management (CNIKM ’09), ACM, pp. 31–38 (2009)
Celen, M., Satyabrata, P., Radha Krishna, P.: Clustering social networks to discover topologies. In: 17th International Conference on Management of Data (COMAD 2011), Bangalore, India (2011)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation (OSDI’04), San Francisco, CA, pp. 137–150 (2004)
Dhaval, C.L., Somayajulu, D.V.L.N., Radha Krishna, P.: SE-CDA: a scalable and efficient community detection algorithm. In: 2014 IEEE International Conference on BigData (IEEE BigData14), Washington DC, 2014, pp. 877–882 (2014)
Eppstein, D., Wang, J.: Fast approximation of centrality. J. Graph Algorithms Appl. 8(1), 39–45 (2004)
Fan, W., Bifet, A.: Mining Big data: current status, and forecast to the future. SIGKDD Explor. 14(2), 1–5 (2012)
Imre, D., Palla, G., Vicsek, T.: Clique percolation in random networks. Phys. Rev. Lett. 94(16), 160–202 (2005)
Ipsen, I.C.F., Rebecca, S. Wills: Mathematical Properties and Analysis of Google’s PageRank. http://www4.ncsu.edu/~ipsen/ps/cedya.pdf
Jyoti Rani, Y., Somayajulu, D.V.L.N., Radha Krishna, P.: A scalable algorithm for discovering topologies in social networks. In: IEEE ICDM workshop on business applications and social network analysis (BASNA 2014) (2014)
Kang, U., Tsourakakis, C.E.,Christos Faloutsos. PEGASUS: a peta-scale graph mining system—implementation and observations. In: IEEE International Conference on Data Mining (ICDM), Miami, Florida, USA (2009)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, USA (2010)
Manuel, K., Kishore Varma Indukuri, Radha Krishna, P.: Analyzing internet slang for sentiment mining. In: Second Vaagdevi International Conference on Information Technology for Real World Problems (VCON), pp. 9–11 (2010)
McCreadie, R.M.C., Macdonald, C., Ounis, L.: Comparing Distributed Indexing: To MapReduce or Not?. LSDS-IR Workshop, Boston, USA (2009)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
R Development Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013) (ISBN 3-900051-07-0)
Radha Krishna, P., Indukuri, K.V., Syed, S.: A generic topology discovery approach for huge social networks. In: ACM COMPUTE 2012, 23–24 Jan 2012
Tang, L., Haun, L.: Chapter 16: Graph mining application to social network analysis. Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, Springer, pp. 487–513
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Exercises
Exercises
-
1.
Define (a) Big Data search and (b) Big Data mining
-
2.
What type of intermediate data does MapReduce store? Where does MapReduce store them?
-
3.
Write Mapper and Reducer functions for k-means clustering algorithm.
-
4.
Give the algorithm to find the page ranking.
-
5.
List ideas to improve/tune (existing) text processing and mining approaches to support big data scale.
-
6.
(a) Explain the significance of social networks and their role in the context of Big Data.
(b) List challenges of Big Data in supporting social network analytics and discuss approaches to handle them with justification.
-
7.
How the centrality measures and structure of the social networks are useful in analyzing social networks.
-
8.
What is community detection? Discuss various community detection approaches.
-
9.
Explain social network clustering algorithm (using topology discovery) that allows overlap clusters.
-
10.
Explain the concepts of active learning and concept drift. How these concepts are useful for big data search and mining.
-
11.
What is sentiment mining? Describe an approach for extracting sentiments from a given text with examples.
-
12.
Develop a suitable architecture for supporting real-time sentiment mining and discuss their components.
-
13.
List open source tools and their characteristics to perform Big Data analytics.
-
14.
Discuss various alternative mechanisms to MapReduce along with their merits
Rights and permissions
Copyright information
© 2015 Springer India
About this chapter
Cite this chapter
Radha Krishna, P. (2015). Big Data Search and Mining. In: Mohanty, H., Bhuyan, P., Chenthati, D. (eds) Big Data. Studies in Big Data, vol 11. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2494-5_4
Download citation
DOI: https://doi.org/10.1007/978-81-322-2494-5_4
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2493-8
Online ISBN: 978-81-322-2494-5
eBook Packages: EngineeringEngineering (R0)