Learning Resources Based on Analysis of Digital Newspaper Data
Digital newspapers are a source of information with very particular characteristics such as: they collect heterogeneous information on a wide variety of subjects, they can be considered truthful sources of information, generally their html pages have a regular structure, they are maintained permanently archived, they are quickly indexed by the main search engines, and the format of their URLs is regular, … This article describes a work developed in the context of a final degree project from the School of Computer Science at the Complutense University of Madrid. The objective of the work was to analyze the problems related to recover the information of an online newspaper, analyze which tools can be used to extract information, and how the retrieved information can be exploited in order to generate value-added services. To illustrate the results of this work, it has developed a prototype application in Python.
I would like to thank the student Luis Felipe de Oliveira Mesa, who carried out the coding of the application described in this paper. This work has been partially supported by the projects Santander-UCM GR3 /14 (group 962022) and eLITE-CM S2015/HUM-3426.
- 1.Ahmed, S., Danti, A.: Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. In: Computational Intelligence in Data Mining, vol. 1, pp. 171–179. Springer, New Delhi (2016)Google Scholar
- 2.Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501–510. ACM (2007)Google Scholar
- 5.Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: 2002 Proceedings of the 18th International Conference on Data Engineering, pp. 419–430. IEEE (2002)Google Scholar
- 6.Fernández Villamor, J.I., Blasco Garcia, J., Iglesias Fernandez, C.A., Garijo Ayestaran, M.: A semantic scraping model for web resources-applying linked data to web page screen scraping (2011)Google Scholar
- 7.Greer, J., Mensing, D.: The evolution of online newspapers: a longitudinal content analysis, 1997–2003. In: Internet Newspapers: The Making of a Mainstream Medium, pp. 13–32 (2006)Google Scholar
- 9.Kenney, K., Gorelik, A., Mwangi, S.: Interactive features of online newspapers. First Monday 5(1) (2000)Google Scholar
- 10.McGovern, G., Norton, R., O’Dowd, C.: The Web Content Style Guide: An Essential Reference for Online Writers, Editors, and Managers. FT Press (2002)Google Scholar
- 11.Othman, M., Hassan, H., Moawad, R., El-Korany, A.: Opinion mining and sentimental analysis approaches: a survey. Life Sci. J. 11(4), 321–326 (2014)Google Scholar
- 12.Taylor, E.M., Rodriguez, C., Velásquez, J.D., Ghosh, G., Banerjee, S.: Web opinion mining and sentimental analysis. In: Advanced Techniques in Web Intelligence-2, pp. 105–126. Springer, Heidelberg (2013)Google Scholar
- 13.Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: International Conference on Extending Database Technology, pp. 477–495. Springer, Heidelberg (2002)Google Scholar
- 16.Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on interactive demonstrations, pp. 34–35. Association for Computational Linguistics (2005)Google Scholar
- 17.Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM (2004)Google Scholar