LDA Topic Modeling Based Dataset Dependency Matrix Prediction

Bhattacharya, Hindol; Bhattacharya, Arnab; Chattopadhyay, Samiran; Chattopadhyay, Matangini

doi:10.1007/978-981-13-8581-0_5

Hindol Bhattacharya¹¹,
Arnab Bhattacharya¹²,
Samiran Chattopadhyay¹¹ &
…
Matangini Chattopadhyay¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1031))

Included in the following conference series:

International Conference on Computational Intelligence, Communications, and Business Analytics

1024 Accesses

Abstract

Classification of text based datasets has many applications in the field of Computer Science. Some of the key application areas include scientific article recommendation, news article tagging, multimedia content search assistance, etc. We are interested in the problem of data placement of text based datasets in a distributed storage system. Distributed data placement entails placing related data together at a local site. Thus, classifying related data from the unrelated ones is a pre-requisite for any such data placement system. Classification of datasets can be accomplished using information provided to the system about the relatedness of a pair of dataset. However, when such information are not available, the relatedness of pairs of dataset need to be inferred from content of the dataset itself. In literature, topic modeling has been used to find similarity between text documents and in classifying these documents according to the similarity between them. We intend to develop a novel classification system of text based datasets using topic modeling, as a precursor to a data placement scheme to be developed for distributed data storage system.

Hindol Bhattacharya would like to thank the Department of Science and Technology, Ministry of Science and Technology, Govt of India for supporting this research work under DST-INSPIRE AORC fellowship scheme, vide number: DST/INSPIRE Fellowship/[160562] Dated: June 9, 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barkhordari, M., Niamanesh, M.: ScadiBino: an effective MapReduce-based association rule mining method. In: Proceedings of the Sixteenth International Conference on Electronic Commerce, p. 1. ACM (2014)
Google Scholar
Basu, S., Yu, Y., Zimmermann, R.: Fuzzy clustering of lecture videos based on topic modeling. In: 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2016)
Google Scholar
Blei, D.M., Lafferty, J.D.: Topic models. Text Min.: Classif. Clustering Appl. 10(71), 34 (2009)
MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Gopalan, P.K., Charlin, L., Blei, D.: Content-based recommendations with poisson factorization. In: Advances in Neural Information Processing Systems, pp. 3176–3184 (2014)
Google Scholar
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of 23rd International Conference on Machine learning (ICML 2006), pp. 377–384. ACM Press (2006)
Google Scholar
Hamrouni, T., Slimani, S., Charrada, F.B.: A data mining correlated patterns-based periodic decentralized replication strategy for data grids. J. Syst. Softw. 110, 10–27 (2015)
Article Google Scholar
McCormick Jr., W.T., Schweitzer, P.J., White, T.W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)
Article Google Scholar
Nagarajan, V., Mohamed, M.A.M.: A prediction-based dynamic replication strategy for data-intensive applications. Comput. Electr. Eng. 57, 281–293 (2017)
Article Google Scholar
Rus, V., Niraula, N., Banjade, R.: Similarity measures based on latent dirichlet allocation. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 459–470. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_37
Chapter Google Scholar
Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. Comput. Syst. 28(4), 666–681 (2012)
Article Google Scholar
Slimani, S., Hamrouni, T., Charrada, F.B.: New replication strategy based on maximal frequent correlated pattern mining for data grids. In: 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 144–151. IEEE (2014)
Google Scholar
Wang, J., Shang, P., Yin, J.: DRAW: a new Data-gRouping-AWare data placement scheme for data intensive applications with interest locality. In: Li, X., Qiu, J. (eds.) Cloud Computing for Data-Intensive Applications, pp. 149–174. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-1905-5_7
Chapter Google Scholar
Wikipedia, the free encyclopedia. Latent dirichlet allocation (2018). Accessed 7 Apr 2018
Google Scholar
Wu, J., Zhang, C., Zhang, B., Wang, P.: A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop. Microprocess. Microsyst. 47, 161–169 (2016)
Google Scholar
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gener. Comput. Syst. 26(8), 1200–1214 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jadavpur University, Kolkata, India
Hindol Bhattacharya, Samiran Chattopadhyay & Matangini Chattopadhyay
Indian Institute of Technology, Kanpur, India
Arnab Bhattacharya

Authors

Hindol Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Arnab Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Samiran Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Matangini Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hindol Bhattacharya .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Department of Computer and Systems Sciences, Visva Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, West Bengal, India
Kousik Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhattacharya, H., Bhattacharya, A., Chattopadhyay, S., Chattopadhyay, M. (2019). LDA Topic Modeling Based Dataset Dependency Matrix Prediction. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-8581-0_5
Published: 26 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics