Abstract
The paper refers to the topic of study fields clustering using extracted information from semi-structured documents, namely documents describing study field’s KRK competences. KRK competences are the specialized descriptions of the qualifications, which students gain after graduation from the given study field. The proposed method enables extracting and processing KRK competences from diverse types of semi-structured documents. It consists of two stages: (1) entity extraction from documents (building vectors of KRK competences for each study field), and (2) study fields clustering using those competence representations. Polish KRK competence files, describing almost 3000 study fields in Poland, were used as a corpora. The method and its stages are thoroughly analyzed. The results allow to compare and identify similar study fields according to theirs final effects of education.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andruszkiewicz, P., Nachyla, B.: Automatic extraction of profiles from web pages. In: Intelligent Tools for Building a Scientific Information Platform, Studies in Computational Intelligence. Warsaw (2013)
Bernardini, A., Carpineto, C., DAmico, M.: Full-subtopic retrieval with keyphrasebased search results clustering. In: Proceedings of 2009 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 206–213. Milan (2009)
Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), pp. 1–38. ACM, New York (2009)
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings of SIGIR, pp. 318–329. Copenhagen (1992)
Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Proceedings of the 12th Congress of the Italian Association for Artificial Intelligence, pp. 201–212. Palermo (2011)
Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. In: Computational Linguistics, pp. 709–754. MIT Press, Cambridge (2013)
Kozlowski, M.: Word sense discovery using frequent termsets. Ph.D. in Warsaw University of Technology (2014)
Kozlowski, M.: Web search results clustering using frequent termset mining. In: Proceedings of 6th International Conference on Pattern Recognition and Machine Intelligence. Warsaw (2015)
Kozlowski, M., Rybinski, H.: SnS: A novel word sense induction method. In: Rough Sets and Intelligent Systems Paradigms, pp. 258–268. Madrid (2014)
Maarek, I., Fagin, R., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)
Manning, Ch., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Boston (2010)
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Proceedings of the International IIS: IIPWM04 Conference held in Zakopane, pp. 359–368. Zakopane (2004)
Van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)
Zaki, M., Hsiao, Ch.: Charm: An efficient algorithm for closed itemset mining. In: Proceedings 2002 SIAM International Conference on Data Mining, pp. 457–472. Arlington (2002)
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. New York (1998)
Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, pp. 555–562. Singapore (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kozlowski, M. (2016). Study Fields Clustering Using KRK Competences. In: Ryżko, D., Gawrysiak, P., Kryszkiewicz, M., Rybiński, H. (eds) Machine Intelligence and Big Data in Industry. Studies in Big Data, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-30315-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-30315-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30314-7
Online ISBN: 978-3-319-30315-4
eBook Packages: EngineeringEngineering (R0)