Abstract
Similar measures play an important role in information processing and have been widely investigated in computer science. With the exploration of social media such as Youtube, Wikipedia, Facebook etc., a huge number of entries have been posted on these portals. They are often described by means of short text or sets of words. Discovering similar entries based on such texts has become challenges in constructing information searching or filtering engines and attracted several research interests. In this paper, we firstly introduce a model of entries posted on media or entertainment portals, which is based on their features composed of title, category, tags, and content. Then, we present a novel similar measure among entries that incorporates their features. The experimental results show the superiority of our incorporation similarity measure compared with the other ones.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Buscaldi, D., Rosso, P., Gomez-Soriano, J.M., Sanchis, E.: Answering questions with an n-gram based passage retrieval engine. J. Intell. Inf. Syst. 34(2), 113–134 (2010)
Buscaldi, D., Le Roux, J., Garca Flores, J.J., Popescu, A.: Lipn-core: semantic text similarity using n-grams, wordnet, syntactic analysis, esa and information retrieval based features (2013)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA, pp. 363–370. Association for Computational Linguistics (2005)
Lee, M.C., Chang, J.W., Hsieh, T.C.: A grammar-based semantic similarity algorithm for natural language sentences. Sci. World J. 2014, 17 (2014)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Nguyen, M.H., Nguyen, T.H.: A general model for similarity measurement between objects. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6(2), 235–239 (2015)
Nguyen, M.H., Tran, D.Q.: A semantic similarity measure between sentences. SE Asian J. Sci. 3(1), 63–75 (2014)
Proisl, T., Evert, S., Greiner, P., Kabashi, B.: Robust semantic similarity at multiple levels using maximum weight matching. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 532–540. Association for Computational Linguistics and Dublin City University, August 2014
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Sultan, M.A., Bethard, S., Sumner, T.: Sentence similarity from word alignment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, pp. 241–246. Association for Computational Linguistics and Dublin City University, August 2014
Tran, D.Q., Nguyen, M.H.: A mathematical model for semantic similarity measures. SE Asian J. Sci. 1(1), 32–45 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nguyen, T.H., Tran, D.Q., Dam, G.M., Nguyen, M.H. (2017). Multi-feature Based Similarity Among Entries on Media Portals. In: Akagi, M., Nguyen, TT., Vu, DT., Phung, TN., Huynh, VN. (eds) Advances in Information and Communication Technology. ICTA 2016. Advances in Intelligent Systems and Computing, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-49073-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-49073-1_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49072-4
Online ISBN: 978-3-319-49073-1
eBook Packages: EngineeringEngineering (R0)