Abstract
In the recent, the World Wide Web has become a platform for online news publications. Many sources started publishing digital versions of news articles online to vast users through a variety of devices, i.e. television channels, magazines, and newspapers. It is observed that the news articles available can be very huge and recommendation systems can help to recommend relevant news to the news readers by filtering news articles based on some predefined criteria or similarity measure, i.e. collaborative filtering or content-based filtering approach. The paper presents named entities based similarity measure for linking digital news stories published in various newspapers during the preservation process in a digital news stories archive to ensure future accessibility. The study compares the similarity of news articles based on human judgment with a similarity value computed automatically using the proposed technique. The results are generalized by defining a threshold value based on multiple experimental results using different datasets of different size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Borrega, O., Taulé, M., Antø’nia Martı, M.: What do we mean when we speak about named entities. In: Proceedings of Corpus Linguistics (2007)
Burda, D., Teuteberg, F.: Sustaining accessibility of information through digital preservation: a literature review. J. Inf. Sci. 39(4), 442–458 (2013)
Chun, D.: On indexing of key words. Acta Editologica 16(2), 105–106 (2004)
da Silva, J.R., Ribeiro, C., Lopes, J.C.: A data curation experiment at u. porto using dspace (2011)
El Bazzi, M.S., Mammass, D., Zaki, T., Ennaji, A.: A graph based method for Arabic document indexing. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 308–312. IEEE (2016)
Escoter, L., Pivovarova, L., Du, M., Katinskaia, A., Yangarber, R., et al.: Grouping business news stories based on salience of named entities. In: 15th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of Conference, Volume 1: Long Papers. Association for Computational Linguistics (2017)
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)
Gupta, S.: Named entity recognition: applications and use cases. https://towardsdatascience.com/named-entity-recognition-applications-and-use-cases-acdbf57d595e. Accessed 10 Aug 2018
Khan, M.: Using Text Processing Techniques for Linking News Stories for Digital Preservation. PhD thesis, Faculty of Computer Science, Preston University Kohat, Islamabad Campus, HEC Pakistan (2018)
Khan, M., Ur Rahman, A., Daud Awan, M., Alam, S.M.: Normalizing digital news-stories for preservation. In: 2016 Eleventh International Conference on Digital Information Management (ICDIM), pp. 85–90. IEEE (2016)
Khan, M., Ur Rahman, A., Awan, M.D.: Exploring the digital world of newspapers. Sci. Technol. J. (Ciencia e Tecnica Vitivinicola), Portugal 32(6), 430–449 (2017)
Khan, M., Ur Rahman, A., Awan, M.D.: Term-based approach for linking digital news stories. In: Italian Research Conference on Digital Libraries, pp. 127–138. Springer (2018)
Khan, M., Ur Rahman, A.: Digital news story preservation framework. In: Proceedings of Digital Libraries: Providing Quality Information: 17th International Conference on Asia-Pacific Digital Libraries, ICADL, p. 350. Springer (2015)
Koushkestani, A.: Using Named Entities in Post-click News Recommendation. Dalhousie University, Halifax, Nova Scotia (2016)
KĂ¼Ă§Ă¼k, D., Yazici, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: ISCIS, pp. 153–158 (2009)
Marrero, M., Urbano, J., SĂ¡nchez-Cuadrado, S., Morato, J., GĂ³mez-BerbĂs, J.M.: Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfaces 35(5), 482–489 (2013)
Mohd, M.: Named entity patterns across news domains. In: BCS IRSG Symposium: Future Directions in Information Access, pp. 30–36 (2007)
Nevzorova, O., Mukhamedshin, D., Galieva, A., Gataullin, R.: Corpus management system: Semantic aspects of representation and processing of search queries. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 285–290. IEEE (2016)
Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 157–164. ACM (2011)
Ur Rahman, A., David, G., Ribeiro, C.: Model migration approach for database preservation. In: International Conference on Asian Digital Libraries, pp. 81–90. Springer (2010)
Toujani, R., Akaichi, J.: Fuzzy sentiment classification in social network Facebook’ statuses mining. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 393–397. IEEE (2016)
Uyar, E.: Near-duplicate news detection using named entities. Bilkent University, Department of Computer Engineering (2009)
Xindong, W., Gong-Qing, W., Xie, F., Zhu, Z., Xue-Gang, H.: News filtering and summarization on the web. IEEE Intell. Syst. 25(5), 68–76 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Khan, M., Ur Rahman, A., Ullah, M., Naseem, R. (2020). The Role of Named Entities in Linking News Articles During Preservation. In: Bouhlel, M., Rovetta, S. (eds) Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1. SETIT 2018. Smart Innovation, Systems and Technologies, vol 146. Springer, Cham. https://doi.org/10.1007/978-3-030-21005-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-21005-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21004-5
Online ISBN: 978-3-030-21005-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)