Skip to main content

Big Data Search and Mining

  • Chapter
  • First Online:
Big Data

Part of the book series: Studies in Big Data ((SBD,volume 11))

Abstract

Most enterprises are generating data at an unprecedented way. On the other hand, traditional consumers are transforming into digital consumers due to high adoption of social media and networks by individuals. Since transactions on these sites are huge and increasing rapidly, social networks have become the new target for several business applications. Big Data mining deals with tapping large amount of data that is complex with a wide variety of data types and provides actionable insights at the right time. The search and mining applications over Big Data resulted in the development of a new kind of technologies, platforms, and frameworks. This chapter introduces the notion of search and data mining in the Big Data context and technologies supporting Big Data. We also present some data mining techniques that deal with scalability and heterogeneity of large data. We further discuss clustering social networks using topology discovery and also address the problem of evaluating and managing text-based sentiments from social network media. Further, this chapter accentuates some of the open source tools for Big Data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. (JMLR) 11, 1601–1604 (2010)

    Google Scholar 

  2. Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  MATH  Google Scholar 

  3. Chan, S.Y., Leung, I.X., Li.: Fast centrality approximation in modular networks. In: 1st ACM International Workshop on Complex Networks meet Information and Knowledge Management (CNIKM ’09), ACM, pp. 31–38 (2009)

    Google Scholar 

  4. Celen, M., Satyabrata, P., Radha Krishna, P.: Clustering social networks to discover topologies. In: 17th International Conference on Management of Data (COMAD 2011), Bangalore, India (2011)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation (OSDI’04), San Francisco, CA, pp. 137–150 (2004)

    Google Scholar 

  6. Dhaval, C.L., Somayajulu, D.V.L.N., Radha Krishna, P.: SE-CDA: a scalable and efficient community detection algorithm. In: 2014 IEEE International Conference on BigData (IEEE BigData14), Washington DC, 2014, pp. 877–882 (2014)

    Google Scholar 

  7. Eppstein, D., Wang, J.: Fast approximation of centrality. J. Graph Algorithms Appl. 8(1), 39–45 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  8. Fan, W., Bifet, A.: Mining Big data: current status, and forecast to the future. SIGKDD Explor. 14(2), 1–5 (2012)

    Article  Google Scholar 

  9. Imre, D., Palla, G., Vicsek, T.: Clique percolation in random networks. Phys. Rev. Lett. 94(16), 160–202 (2005)

    Google Scholar 

  10. Ipsen, I.C.F., Rebecca, S. Wills: Mathematical Properties and Analysis of Google’s PageRank. http://www4.ncsu.edu/~ipsen/ps/cedya.pdf

  11. Jyoti Rani, Y., Somayajulu, D.V.L.N., Radha Krishna, P.: A scalable algorithm for discovering topologies in social networks. In: IEEE ICDM workshop on business applications and social network analysis (BASNA 2014) (2014)

    Google Scholar 

  12. Kang, U., Tsourakakis, C.E.,Christos Faloutsos. PEGASUS: a peta-scale graph mining system—implementation and observations. In: IEEE International Conference on Data Mining (ICDM), Miami, Florida, USA (2009)

    Google Scholar 

  13. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, USA (2010)

    Google Scholar 

  14. Manuel, K., Kishore Varma Indukuri, Radha Krishna, P.: Analyzing internet slang for sentiment mining. In: Second Vaagdevi International Conference on Information Technology for Real World Problems (VCON), pp. 9–11 (2010)

    Google Scholar 

  15. McCreadie, R.M.C., Macdonald, C., Ounis, L.: Comparing Distributed Indexing: To MapReduce or Not?. LSDS-IR Workshop, Boston, USA (2009)

    Google Scholar 

  16. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)

    Article  Google Scholar 

  17. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  18. R Development Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013) (ISBN 3-900051-07-0)

    Google Scholar 

  19. Radha Krishna, P., Indukuri, K.V., Syed, S.: A generic topology discovery approach for huge social networks. In: ACM COMPUTE 2012, 23–24 Jan 2012

    Google Scholar 

  20. Tang, L., Haun, L.: Chapter 16: Graph mining application to social network analysis. Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, Springer, pp. 487–513

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Radha Krishna .

Editor information

Editors and Affiliations

Exercises

Exercises

  1. 1.

    Define (a) Big Data search and (b) Big Data mining

  2. 2.

    What type of intermediate data does MapReduce store? Where does MapReduce store them?

  3. 3.

    Write Mapper and Reducer functions for k-means clustering algorithm.

  4. 4.

    Give the algorithm to find the page ranking.

  5. 5.

    List ideas to improve/tune (existing) text processing and mining approaches to support big data scale.

  6. 6.

    (a) Explain the significance of social networks and their role in the context of Big Data.

    (b) List challenges of Big Data in supporting social network analytics and discuss approaches to handle them with justification.

  1. 7.

    How the centrality measures and structure of the social networks are useful in analyzing social networks.

  2. 8.

    What is community detection? Discuss various community detection approaches.

  3. 9.

    Explain social network clustering algorithm (using topology discovery) that allows overlap clusters.

  4. 10.

    Explain the concepts of active learning and concept drift. How these concepts are useful for big data search and mining.

  5. 11.

    What is sentiment mining? Describe an approach for extracting sentiments from a given text with examples.

  6. 12.

    Develop a suitable architecture for supporting real-time sentiment mining and discuss their components.

  7. 13.

    List open source tools and their characteristics to perform Big Data analytics.

  8. 14.

    Discuss various alternative mechanisms to MapReduce along with their merits

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this chapter

Cite this chapter

Radha Krishna, P. (2015). Big Data Search and Mining. In: Mohanty, H., Bhuyan, P., Chenthati, D. (eds) Big Data. Studies in Big Data, vol 11. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2494-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2494-5_4

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2493-8

  • Online ISBN: 978-81-322-2494-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics