Skip to main content

Abstract

In the recent, the World Wide Web has become a platform for online news publications. Many sources started publishing digital versions of news articles online to vast users through a variety of devices, i.e. television channels, magazines, and newspapers. It is observed that the news articles available can be very huge and recommendation systems can help to recommend relevant news to the news readers by filtering news articles based on some predefined criteria or similarity measure, i.e. collaborative filtering or content-based filtering approach. The paper presents named entities based similarity measure for linking digital news stories published in various newspapers during the preservation process in a digital news stories archive to ensure future accessibility. The study compares the similarity of news articles based on human judgment with a similarity value computed automatically using the proposed technique. The results are generalized by defining a threshold value based on multiple experimental results using different datasets of different size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Borrega, O., Taulé, M., Antø’nia Martı, M.: What do we mean when we speak about named entities. In: Proceedings of Corpus Linguistics (2007)

    Google Scholar 

  2. Burda, D., Teuteberg, F.: Sustaining accessibility of information through digital preservation: a literature review. J. Inf. Sci. 39(4), 442–458 (2013)

    Article  Google Scholar 

  3. Chun, D.: On indexing of key words. Acta Editologica 16(2), 105–106 (2004)

    Google Scholar 

  4. da Silva, J.R., Ribeiro, C., Lopes, J.C.: A data curation experiment at u. porto using dspace (2011)

    Google Scholar 

  5. El Bazzi, M.S., Mammass, D., Zaki, T., Ennaji, A.: A graph based method for Arabic document indexing. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 308–312. IEEE (2016)

    Google Scholar 

  6. Escoter, L., Pivovarova, L., Du, M., Katinskaia, A., Yangarber, R., et al.: Grouping business news stories based on salience of named entities. In: 15th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of Conference, Volume 1: Long Papers. Association for Computational Linguistics (2017)

    Google Scholar 

  7. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)

    Google Scholar 

  8. Gupta, S.: Named entity recognition: applications and use cases. https://towardsdatascience.com/named-entity-recognition-applications-and-use-cases-acdbf57d595e. Accessed 10 Aug 2018

  9. Khan, M.: Using Text Processing Techniques for Linking News Stories for Digital Preservation. PhD thesis, Faculty of Computer Science, Preston University Kohat, Islamabad Campus, HEC Pakistan (2018)

    Google Scholar 

  10. Khan, M., Ur Rahman, A., Daud Awan, M., Alam, S.M.: Normalizing digital news-stories for preservation. In: 2016 Eleventh International Conference on Digital Information Management (ICDIM), pp. 85–90. IEEE (2016)

    Google Scholar 

  11. Khan, M., Ur Rahman, A., Awan, M.D.: Exploring the digital world of newspapers. Sci. Technol. J. (Ciencia e Tecnica Vitivinicola), Portugal 32(6), 430–449 (2017)

    Google Scholar 

  12. Khan, M., Ur Rahman, A., Awan, M.D.: Term-based approach for linking digital news stories. In: Italian Research Conference on Digital Libraries, pp. 127–138. Springer (2018)

    Google Scholar 

  13. Khan, M., Ur Rahman, A.: Digital news story preservation framework. In: Proceedings of Digital Libraries: Providing Quality Information: 17th International Conference on Asia-Pacific Digital Libraries, ICADL, p. 350. Springer (2015)

    Google Scholar 

  14. Koushkestani, A.: Using Named Entities in Post-click News Recommendation. Dalhousie University, Halifax, Nova Scotia (2016)

    Google Scholar 

  15. KĂ¼Ă§Ă¼k, D., Yazici, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: ISCIS, pp. 153–158 (2009)

    Google Scholar 

  16. Marrero, M., Urbano, J., SĂ¡nchez-Cuadrado, S., Morato, J., GĂ³mez-BerbĂ­s, J.M.: Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfaces 35(5), 482–489 (2013)

    Article  Google Scholar 

  17. Mohd, M.: Named entity patterns across news domains. In: BCS IRSG Symposium: Future Directions in Information Access, pp. 30–36 (2007)

    Google Scholar 

  18. Nevzorova, O., Mukhamedshin, D., Galieva, A., Gataullin, R.: Corpus management system: Semantic aspects of representation and processing of search queries. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 285–290. IEEE (2016)

    Google Scholar 

  19. Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 157–164. ACM (2011)

    Google Scholar 

  20. Ur Rahman, A., David, G., Ribeiro, C.: Model migration approach for database preservation. In: International Conference on Asian Digital Libraries, pp. 81–90. Springer (2010)

    Google Scholar 

  21. Toujani, R., Akaichi, J.: Fuzzy sentiment classification in social network Facebook’ statuses mining. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 393–397. IEEE (2016)

    Google Scholar 

  22. Uyar, E.: Near-duplicate news detection using named entities. Bilkent University, Department of Computer Engineering (2009)

    Google Scholar 

  23. Xindong, W., Gong-Qing, W., Xie, F., Zhu, Z., Xue-Gang, H.: News filtering and summarization on the web. IEEE Intell. Syst. 25(5), 68–76 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muzammil Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, M., Ur Rahman, A., Ullah, M., Naseem, R. (2020). The Role of Named Entities in Linking News Articles During Preservation. In: Bouhlel, M., Rovetta, S. (eds) Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1. SETIT 2018. Smart Innovation, Systems and Technologies, vol 146. Springer, Cham. https://doi.org/10.1007/978-3-030-21005-2_5

Download citation

Publish with us

Policies and ethics