Learning Resources Based on Analysis of Digital Newspaper Data

  • Antonio Sarasa CabezueloEmail author
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 13)


Digital newspapers are a source of information with very particular characteristics such as: they collect heterogeneous information on a wide variety of subjects, they can be considered truthful sources of information, generally their html pages have a regular structure, they are maintained permanently archived, they are quickly indexed by the main search engines, and the format of their URLs is regular, … This article describes a work developed in the context of a final degree project from the School of Computer Science at the Complutense University of Madrid. The objective of the work was to analyze the problems related to recover the information of an online newspaper, analyze which tools can be used to extract information, and how the retrieved information can be exploited in order to generate value-added services. To illustrate the results of this work, it has developed a prototype application in Python.



I would like to thank the student Luis Felipe de Oliveira Mesa, who carried out the coding of the application described in this paper. This work has been partially supported by the projects Santander-UCM GR3 /14 (group 962022) and eLITE-CM S2015/HUM-3426.


  1. 1.
    Ahmed, S., Danti, A.: Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. In: Computational Intelligence in Data Mining, vol. 1, pp. 171–179. Springer, New Delhi (2016)Google Scholar
  2. 2.
    Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501–510. ACM (2007)Google Scholar
  3. 3.
    Brin, S., Page, L.: Reprint of the anatomy of a large-scale hypertextual web search engine. Comput. Netw. 56(18), 3825–3833 (2012)CrossRefGoogle Scholar
  4. 4.
    Carpenter, S.: A study of content diversity in online citizen journalism and online newspaper articles. New Media Soc. 12(7), 1064–1084 (2010)CrossRefGoogle Scholar
  5. 5.
    Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: 2002 Proceedings of the 18th International Conference on Data Engineering, pp. 419–430. IEEE (2002)Google Scholar
  6. 6.
    Fernández Villamor, J.I., Blasco Garcia, J., Iglesias Fernandez, C.A., Garijo Ayestaran, M.: A semantic scraping model for web resources-applying linked data to web page screen scraping (2011)Google Scholar
  7. 7.
    Greer, J., Mensing, D.: The evolution of online newspapers: a longitudinal content analysis, 1997–2003. In: Internet Newspapers: The Making of a Mainstream Medium, pp. 13–32 (2006)Google Scholar
  8. 8.
    Hibbitts, B.: Last writes? Re-assessing the law review in the age of cyberspace. Tilburg Law Rev. 5(4), 299–306 (1996)CrossRefGoogle Scholar
  9. 9.
    Kenney, K., Gorelik, A., Mwangi, S.: Interactive features of online newspapers. First Monday 5(1) (2000)Google Scholar
  10. 10.
    McGovern, G., Norton, R., O’Dowd, C.: The Web Content Style Guide: An Essential Reference for Online Writers, Editors, and Managers. FT Press (2002)Google Scholar
  11. 11.
    Othman, M., Hassan, H., Moawad, R., El-Korany, A.: Opinion mining and sentimental analysis approaches: a survey. Life Sci. J. 11(4), 321–326 (2014)Google Scholar
  12. 12.
    Taylor, E.M., Rodriguez, C., Velásquez, J.D., Ghosh, G., Banerjee, S.: Web opinion mining and sentimental analysis. In: Advanced Techniques in Web Intelligence-2, pp. 105–126. Springer, Heidelberg (2013)Google Scholar
  13. 13.
    Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: International Conference on Extending Database Technology, pp. 477–495. Springer, Heidelberg (2002)Google Scholar
  14. 14.
    Van Atteveldt, W., Kleinnijenhuis, J., Ruigrok, N.: Parsing, semantic networks, and political authority using syntactic analysis to extract semantic relations from Dutch newspaper articles. Polit. Anal. 16(4), 428–446 (2008)CrossRefGoogle Scholar
  15. 15.
    Vargiu, E., Urru, M.: Exploiting web scraping in a collaborative filtering-based approach to web advertising. Artif. Intell. Res. 2(1), 44 (2012)CrossRefGoogle Scholar
  16. 16.
    Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on interactive demonstrations, pp. 34–35. Association for Computational Linguistics (2005)Google Scholar
  17. 17.
    Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM (2004)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Universidad Complutense de MadridMadridSpain

Personalised recommendations