Abstract
With the rapid development of mobile Internet, mobile Internet has come into the era of big data. The demand for data analysis of mobile applications has become more and more obvious, which puts forward higher requirements for the standard of mobile application information collection. Due to the large number of applications, almost all third-party app stores display only a small number of applications, and most of the information is hidden in the Deep Web database behind the query form. The existing crawler strategy cannot meet the demand. In order to solve the above problems, this paper proposes a collection method based on category keywords query to improve the crawl rate and integrity of the mobile app stores information collection. Firstly, get the information of application interfaces that include various kinds of applications by using the vertical crawler. Then extract the keywords that represent each category of applications by TF-IDF algorithm from the application name and description information. Finally, incremental crawling is performed by using keyword query-based acquisition method. Results show that this collection method effectively promoted information integrity and acquisition efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
iiMedia Research. http://www.iimedia.cn/c400/47250.html. Accessed 23 Dec 2016
Navigli, R., Velardi, P.: An analysis of ontology-based query expansion strategies. In: Proceedings of the 14th European Conference on Machine Learning, Croatia, pp. 42–49 (2003)
Hernández, I., Rivero, C.R., Ruiz, D.: World wide web (2018). https://doi.org/10.1007/s11280-018-0602-1
Olston, C., Najork, M.: Web crawling. Found. Trends Inf. Retriev. 4(3), 175246 (2010)
Li, J.-R., Mao, Y.-F., Yang, K.: Improvement and application of TF * IDF algorithm. In: Liu, B., Chai, C. (eds.) ICICA 2011. LNCS, vol. 7030, pp. 121–127. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25255-6_16
Li, W., Li, J., Zhang, B.: Saliency-GD: A TF-IDF analogy for landmark image mining. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10735, pp. 477–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77380-3_45
Mahale, V.V., Dhande, M.T., Pandit, A.V.: Advanced web crawler for deep web interface using binary vector & page rank. In: 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 30–31 August 2018
Brightplanet. https://brightplanet.com/2013/03/whitepaper-understanding-the-deep-web-in-10-minutes. Accessed 12 Mar 2013
Zhang, L., et al.: Online modeling of esthetic communities using deep perception graph analytics. IEEE Trans. Multimedia 20(6), 1462–1474 (2018)
Zhu, Z., Liang, J., Li, D., Yu, H., Liu, G.: Hot topic detection based on a refined TF-IDF algorithm. IEEE Access 7, 26996–27007 (2019)
Baader, F.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, London (2003)
Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through key-word queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 100–109. ACM (2005)
Zifei, D.: Design and Implementation of an Ajax Supported Deep Web Crawler Sys-tem. South China University of Technology, Guangdong (2015)
Acknowledgement
This research is supported by National Key R&D Program of China (No. 2018YFC0806900), Beijing Engineering Laboratory For security emulation & Hacking and Defense of IoV; This research is supported by National Secrecy Scientific Research Program of China (No. BMKY2018802-1) too.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, G. et al. (2019). Method of Deep Web Collection for Mobile Application Store Based on Category Keyword Searching. In: Wang, G., Feng, J., Bhuiyan, M., Lu, R. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2019. Lecture Notes in Computer Science(), vol 11611. Springer, Cham. https://doi.org/10.1007/978-3-030-24907-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-24907-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24906-9
Online ISBN: 978-3-030-24907-6
eBook Packages: Computer ScienceComputer Science (R0)