Internet Data Extraction and Analysis for Profile Generation
Almost everything is stored on the Internet nowadays, and relying data on the Internet has become usual over the last years, directly increasing the value of data retrieval. Via Internet, data scientist can now find a way to access all the available data that is stored on the Internet, so they can turn that data into useful information. As people rely a lot of data on the Internet, they sometimes ignore the fact that all that data can be easily extracted, even when people think their information is safe or unavailable. In this article, we propose a system in where some data extraction techniques are going to be analysed in order to have an overview of the amount of data of a person that can be extracted from the Internet, and how that data is turned into information with an additional value in order to make data useful. The proposed system is going to be capable of retrieving huge loads of data from a person and process it using Artificial Intelligence, in order to classify its content to generate a personal profile containing all the information once its analysed. This research is based on personal profile generation of people from Spain, but it could be implemented for any other country. The proposed system has been implemented and tested on different people, and the results were quite satisfactory.
KeywordsInformation recovery Information fusion Big Data Profile generation
This research has been partially supported by the European Regional Development Fund (FEDER) within the framework of the Interreg program V-A Spain-Portugal 2014–2020 (PocTep) under the IOTEC project grant 0123 IOTEC 3 E.
- 2.Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 (2017)
- 4.Bahrami, M., Singhal, M., Zhuang, Z.: A cloud-based web crawler architecture. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 216–223. IEEE (2015)Google Scholar
- 5.Jose, B., Abraham, S.: Exploring the merits of NoSQL: a study based on MongoDB. In: 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), pp. 266–271. IEEE (2017)Google Scholar
- 7.Roy, D., Ganguly, D., Mitra, M., Jones, G.J.F.: Representing documents and queries as sets of word embedded vectors for information retrieval. arXiv preprint arXiv:1606.07869 (2016)
- 9.Rivas, A., Martín, L., Sittón, I., Chamoso, P., Martín-Limorti, J.J., Prieto, J., González-Briones, A.: Semantic analysis system for industry 4.0. In: International Conference on Knowledge Management in Organizations, pp. 537–548. Springer (2018)Google Scholar
- 10.Binkheder, S., Wu, H.-Y., Quinney, S., Li, L.: Analyzing patterns of literature-based phenotyping definitions for text mining applications. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 374–376. IEEE (2018)Google Scholar
- 11.Shah, J.H., Sharif, M., Yasmin, M., Fernandes, S.L.: Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn. Lett. (2017)Google Scholar
- 12.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
- 13.Kasar, M.M., Bhattacharyya, D., Kim, T.H.: Face recognition using neural network: a review. Int. J. Secur. Appl. 10(3), 81–100 (2016)Google Scholar
- 14.Amos, B., Ludwiczuk, B., Satyanarayanan, M., et al.: OpenFace: a general-purpose face recognition library with mobile applications. CMU School of Computer Science (2016)Google Scholar