A Text Mining-Based Approach for Analyzing Information Retrieval in Spanish: Music Data Collection as a Case Study
This paper presents a text mining-based search approach aimed at information retrieval in the Spanish language. For this purpose, a tool has been developed in order to facilitate and automate the analysis and retrieval, allowing the user to apply different analyzers when carrying out a query, to index and delete documents stored in the system and to evaluate the recovery process. To this extent, a dataset consisting in 27 songs has been used as a case study. Different queries have been made to investigate about the best fitting approaches to the Spanish language and their suitability depending on the query text.
KeywordsText mining Information retrieval Stemming Spanish
This work has been supported by project MOVIURBAN Máquina social para la gestión sostenible de ciudades inteligentes: movilidad urbana, datos abiertos, sensores móviles (SA070U 16). Project cofinanced with Junta Castilla y Leon, Consejera de Educacion and FEDER funds. In addition, the research of Juan Ramos González has been co-financed by the European Social Fund and Junta de Castilla y León (Operational Programme 2014-2020 for Castilla y León, BOCYL EDU/602/2016).
- 1.Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media Inc., New York (2015)Google Scholar
- 2.Gupta, V., Lehal, G.S.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)Google Scholar
- 3.Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. Ldv Forum 20, 19–62 (2005)Google Scholar
- 4.Patel, F.N., Soni, N.R.: Text mining: a brief survey. Int. J. Adv. Comput. Res. 2(4), 243–248 (2012)Google Scholar
- 5.Porter, M.: Spanish stemming algorithm (2005). http://snowball.tartarus.org/algorithms/spanish/stemmer.html. Accessed 20 Jan 2018
- 6.Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html. Accessed 14 Jan 2018
- 7.Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)Google Scholar
- 9.Savoy, J.: Report on CLEF-2001 experiments: effective combined query-translation approach. In: Workshop of the Cross-Language Evaluation Forum for European Languages, pp. 27–43. Springer (2001)Google Scholar
- 10.Sharma, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf. Syst. 4(3), 7–12 (2012)Google Scholar
- 12.Vijayarani, S., Ilamathi, M.J., Nithya, M.: Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5(1), 7–16 (2015)Google Scholar