Skip to main content

Study Fields Clustering Using KRK Competences

  • Chapter
  • First Online:
  • 1949 Accesses

Part of the book series: Studies in Big Data ((SBD,volume 19))

Abstract

The paper refers to the topic of study fields clustering using extracted information from semi-structured documents, namely documents describing study field’s KRK competences. KRK competences are the specialized descriptions of the qualifications, which students gain after graduation from the given study field. The proposed method enables extracting and processing KRK competences from diverse types of semi-structured documents. It consists of two stages: (1) entity extraction from documents (building vectors of KRK competences for each study field), and (2) study fields clustering using those competence representations. Polish KRK competence files, describing almost 3000 study fields in Poland, were used as a corpora. The method and its stages are thoroughly analyzed. The results allow to compare and identify similar study fields according to theirs final effects of education.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Andruszkiewicz, P., Nachyla, B.: Automatic extraction of profiles from web pages. In: Intelligent Tools for Building a Scientific Information Platform, Studies in Computational Intelligence. Warsaw (2013)

    Google Scholar 

  2. Bernardini, A., Carpineto, C., DAmico, M.: Full-subtopic retrieval with keyphrasebased search results clustering. In: Proceedings of 2009 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 206–213. Milan (2009)

    Google Scholar 

  3. Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), pp. 1–38. ACM, New York (2009)

    Google Scholar 

  4. Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings of SIGIR, pp. 318–329. Copenhagen (1992)

    Google Scholar 

  5. Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Proceedings of the 12th Congress of the Italian Association for Artificial Intelligence, pp. 201–212. Palermo (2011)

    Google Scholar 

  6. Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. In: Computational Linguistics, pp. 709–754. MIT Press, Cambridge (2013)

    Google Scholar 

  7. Kozlowski, M.: Word sense discovery using frequent termsets. Ph.D. in Warsaw University of Technology (2014)

    Google Scholar 

  8. Kozlowski, M.: Web search results clustering using frequent termset mining. In: Proceedings of 6th International Conference on Pattern Recognition and Machine Intelligence. Warsaw (2015)

    Google Scholar 

  9. Kozlowski, M., Rybinski, H.: SnS: A novel word sense induction method. In: Rough Sets and Intelligent Systems Paradigms, pp. 258–268. Madrid (2014)

    Google Scholar 

  10. Maarek, I., Fagin, R., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)

    Google Scholar 

  11. Manning, Ch., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  12. Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Boston (2010)

    Google Scholar 

  13. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Proceedings of the International IIS: IIPWM04 Conference held in Zakopane, pp. 359–368. Zakopane (2004)

    Google Scholar 

  14. Van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  15. Zaki, M., Hsiao, Ch.: Charm: An efficient algorithm for closed itemset mining. In: Proceedings 2002 SIAM International Conference on Data Mining, pp. 457–472. Arlington (2002)

    Google Scholar 

  16. Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. New York (1998)

    Google Scholar 

  17. Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, pp. 555–562. Singapore (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Kozlowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kozlowski, M. (2016). Study Fields Clustering Using KRK Competences. In: Ryżko, D., Gawrysiak, P., Kryszkiewicz, M., Rybiński, H. (eds) Machine Intelligence and Big Data in Industry. Studies in Big Data, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-30315-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30315-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30314-7

  • Online ISBN: 978-3-319-30315-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics