Study Fields Clustering Using KRK Competences

Kozlowski, Marek

doi:10.1007/978-3-319-30315-4_4

Study Fields Clustering Using KRK Competences

Marek Kozlowski⁶

Chapter
First Online: 25 March 2016

1949 Accesses

Part of the book series: Studies in Big Data ((SBD,volume 19))

Abstract

The paper refers to the topic of study fields clustering using extracted information from semi-structured documents, namely documents describing study field’s KRK competences. KRK competences are the specialized descriptions of the qualifications, which students gain after graduation from the given study field. The proposed method enables extracting and processing KRK competences from diverse types of semi-structured documents. It consists of two stages: (1) entity extraction from documents (building vectors of KRK competences for each study field), and (2) study fields clustering using those competence representations. Polish KRK competence files, describing almost 3000 study fields in Poland, were used as a corpora. The method and its stages are thoroughly analyzed. The results allow to compare and identify similar study fields according to theirs final effects of education.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andruszkiewicz, P., Nachyla, B.: Automatic extraction of profiles from web pages. In: Intelligent Tools for Building a Scientific Information Platform, Studies in Computational Intelligence. Warsaw (2013)
Google Scholar
Bernardini, A., Carpineto, C., DAmico, M.: Full-subtopic retrieval with keyphrasebased search results clustering. In: Proceedings of 2009 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 206–213. Milan (2009)
Google Scholar
Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), pp. 1–38. ACM, New York (2009)
Google Scholar
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings of SIGIR, pp. 318–329. Copenhagen (1992)
Google Scholar
Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Proceedings of the 12th Congress of the Italian Association for Artificial Intelligence, pp. 201–212. Palermo (2011)
Google Scholar
Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. In: Computational Linguistics, pp. 709–754. MIT Press, Cambridge (2013)
Google Scholar
Kozlowski, M.: Word sense discovery using frequent termsets. Ph.D. in Warsaw University of Technology (2014)
Google Scholar
Kozlowski, M.: Web search results clustering using frequent termset mining. In: Proceedings of 6th International Conference on Pattern Recognition and Machine Intelligence. Warsaw (2015)
Google Scholar
Kozlowski, M., Rybinski, H.: SnS: A novel word sense induction method. In: Rough Sets and Intelligent Systems Paradigms, pp. 258–268. Madrid (2014)
Google Scholar
Maarek, I., Fagin, R., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)
Google Scholar
Manning, Ch., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Google Scholar
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Boston (2010)
Google Scholar
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Proceedings of the International IIS: IIPWM04 Conference held in Zakopane, pp. 359–368. Zakopane (2004)
Google Scholar
Van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Zaki, M., Hsiao, Ch.: Charm: An efficient algorithm for closed itemset mining. In: Proceedings 2002 SIAM International Conference on Data Mining, pp. 457–472. Arlington (2002)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. New York (1998)
Google Scholar
Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, pp. 555–562. Singapore (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

National Information Processing Institute, Warsaw, Poland
Marek Kozlowski

Authors

Marek Kozlowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Kozlowski .

Editor information

Editors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Dominik Ryżko
Institute of Computer Science, Warsaw University of Technology, Abo, Poland
Piotr Gawrysiak
Institute of Computer Science, Warsaw University of Technology, Warszawa, Poland
Marzena Kryszkiewicz
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Henryk Rybiński

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kozlowski, M. (2016). Study Fields Clustering Using KRK Competences. In: Ryżko, D., Gawrysiak, P., Kryszkiewicz, M., Rybiński, H. (eds) Machine Intelligence and Big Data in Industry. Studies in Big Data, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-30315-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-30315-4_4
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30314-7
Online ISBN: 978-3-319-30315-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics