Improving Information-Carrying Data Capacity in Text Mining

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9330)


In this article the relation between the selection of textual data representation and text mining quality has been shown. Due to this, the information-carrying capacity of data has been formalized. Then the procedure of comparing information-carrying data capacity with different structures has been described. Moreover, the method of preparing the γ -gram representation of a text involving machine learning methods and ontology created by the domain expert, has been presented. This method integrates expert knowledge and automatic methods to develop the traditional text-mining technology, which cannot understand text semantics. Representation built in this way can improve the quality of text mining, what was shown in the test research.


Text mining Information-carrying data capacity Vector space model Text documents representation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amiri, I.S., Akanbi, O.A., Fazeldehkordi, E.: A Machine-Learning Approach to Phishing Detection and Defense, p. 27 (2014)Google Scholar
  2. 2.
    Mbarek, R., Tmar, M., Hattab, H.: A New Relevance Feedback Algorith Based on Vector Space Basis Change. Computational Linguistics and Intelligent Text Processing 2, 355–356 (2014)Google Scholar
  3. 3.
    Munier, N.: A Strategy for Using Multicriteria Analysis in Decision-Making: A Guide for Simple and Complex Environmental Projects, pp. 59–65 (2011)Google Scholar
  4. 4.
    Velasquez, M., Hester, P.T.: An Analysis of Multi-Criteria Decision Making Methods. International Journal of Operations Research 10(2), 56–66 (2013)MathSciNetGoogle Scholar
  5. 5.
    Gawrysiak, P.: Automatyczna kategoryzacja dokumentów, pp. 36–45 (2001)Google Scholar
  6. 6.
    Ramasubramanian, C., Ramya, R.: Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, Valume 2(12), 4537 (2013)Google Scholar
  7. 7.
    Weiss, S.M., Indurkhya, N., Zhang, T.: Fundamentals of Predictive Text Mining, pp. 17–19 (2010)Google Scholar
  8. 8.
    Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis, p. 10 (2013)Google Scholar
  9. 9.
    Dale, R., Moisl, H., Somers, H.: Handbook of Natural Language Processing, p. 11 (2000)Google Scholar
  10. 10.
    Luna Dong, X., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge_based Trust: Estimating the Trustworthiness of Web Sources. Computer Science Database (2015)Google Scholar
  11. 11.
    Gentile, A.L., Basile, P., Iaquinta, L., Semeraro, G.: Lexical and semantic resources for NLP: from words to meanings. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS (LNAI), vol. 5179, pp. 277–284. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Kononenko, I., Kukar, M.: Machine Learning and Data Mining, p. 17 (2007)Google Scholar
  13. 13.
    Berry, M., Linoff, G.: Mastering Data Mining: The Art and Science of Customer Relationship Management, p. 7 (2004)Google Scholar
  14. 14.
    Esposti, M.D.: Mathematical Models of Textual Data: A short Review, pp. 100–102 (2014)Google Scholar
  15. 15.
    Sabbah, T., Selemat, A.: Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification, pp. 185–187 (2014)Google Scholar
  16. 16.
    Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization, Amsterdam, vol. 5, pp. 125–126 (2007)Google Scholar
  17. 17.
    Bechhofer, S., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL Web Ontology Language (2015).
  18. 18.
    Merkelis, R.: Philosophy and Linguistics, p. 12 (2013).
  19. 19.
    Jiang, L., Zhang, H.-b., Yang, X., Xie, N.: Research on semantic text mining based on domain ontology. In: Li, D., Chen, Y. (eds.) Computer and Computing Technologies in Agriculture VI, Part I. IFIP AICT, vol. 392, pp. 336–343. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  20. 20.
    Chakraborty, G., Pagolu, M., Satshi, G.: Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, p. 70 (2014)Google Scholar
  21. 21.
    Sanders, T., Schilperoord, J., Spooren, W.: Text Representation: Linguistic and Psycholinguistic Aspects, pp. 1–19 (2001)Google Scholar
  22. 22.
    Śmiałkowska, B., Gibert, M.: The classification of text documents by using Latent Semantic Analysis for extracted information. Ekonomiczne Problemu Usług No. 106, Zeszyty Naukowe Uniwersytetu Szczecińskiego No. 781, pp. 345–358 (2013)Google Scholar
  23. 23.
    Smialkowska, B., Gibert, M.: The classification of text documents in Polish language by using Latent Semantic Analysis for extracted information. Theoretical and applied informatics 25, 239–250 (2013)Google Scholar
  24. 24.
    Lubaszewski, W.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu (2009)Google Scholar
  25. 25.
    Wang, R.Y., Strong, D.M.: What data quality means to data consumers. Journal of Management Information Systems 12(4), 7 (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Information Systems Engineering, Faculty of Computer ScienceWest Pomeranian University of Technology in SzczecinSzczecinPoland

Personalised recommendations