Skip to main content

Autonomous and Adaptive Identification of Topics in Unstructured Text

  • Conference paper
Book cover Knowlege-Based and Intelligent Information and Engineering Systems (KES 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6882))

Abstract

Existing topic identification techniques must tackle an important problem: they depend on human intervention, thus incurring major preparation costs and lacking operational flexibility when facing novelty. To resolve this issue, we propose an adaptable and autonomous algorithm that discovers topics in unstructured text documents. The algorithm is based on principles that differ from existing natural language processing and artificial intelligence techniques. These principles involve the retrieval, activation and decay of general-purpose lexical knowledge, inspired by how the brain may process information when someone reads. The algorithm handles words sequentially in a single document, contrary to the usual corpus-based bag-of-words approach. Empirical results demonstrate the potential of the new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research (3), 993–1022 (2003)

    Google Scholar 

  2. Landauer, T.K., Dumais, S.T.: Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  3. McNamara, D.S.: Computational methods to extract meaning from text and advance theories of human cognition. Topics in Cognitive Science 3(1), 3–17 (2011)

    Article  Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (2002)

    Article  Google Scholar 

  5. Qi, X., Davison, B.D.: Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2 (2009)

    Article  Google Scholar 

  6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 3 (1999)

    Article  Google Scholar 

  7. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, NY (2006)

    Book  Google Scholar 

  8. Massey, L.: On the quality of ART1 text clustering. Neural Networks 16, 5–6 (2003)

    Article  Google Scholar 

  9. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. J. ACM 15, 1 (1968)

    Article  MATH  Google Scholar 

  10. Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. of Doc. 28, 1 (1972)

    Article  Google Scholar 

  11. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman Pub. Group, NY (1976)

    Google Scholar 

  12. Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of Semantic Web Workshop, the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, NY (2003)

    Google Scholar 

  13. Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 179–186. ACM, NY (2008)

    Google Scholar 

  14. Scott, S., Matwin, S.: Feature engineering for text classification. In: Proceedings of 16th International Conference on Machine Learning, pp. 379–388 (1999)

    Google Scholar 

  15. Lenat, D.B.: CYC: A Large-Scale Investment in Knowledge Infrastructure. Commun. ACM 38, 11 (1995)

    Google Scholar 

  16. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), pp. 509–518. ACM, New York (2008)

    Chapter  Google Scholar 

  17. Kim, H.L., Scerri, S., Breslin, J.G., Decker, S., Kim, H.G.: The state of the art in tag ontologies: a semantic model for tagging and folksonomies. In: Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications (DCMI 2008), Dublin Core Metadata Initiative, pp. 128–137 (2008)

    Google Scholar 

  18. Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)

    Article  Google Scholar 

  19. Velardi, P., Navigli, R., D’Amadio, P.: Mining the Web to Create Specialized Glossaries. IEEE Intelligent Systems 23(5), 18–25 (2008)

    Article  Google Scholar 

  20. Wong, W., Liu, W., Bennamoun, M.: A probabilistic framework for automatic term recognition. Intelligent Data Analysis 13(4), 499–539 (2009)

    Google Scholar 

  21. Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1(4), 390 (1957)

    Article  MathSciNet  Google Scholar 

  22. Cabre-Castellvi, T., Estopa, R., Vivaldi-Palatresi, J.: Automatic term detection: A review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.C. (eds.) Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)

    Google Scholar 

  23. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of the 2003 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 216–223. Association for Computational Linguistics, Morristown (2003)

    Chapter  Google Scholar 

  24. Milne, D.N., Witten, I.H., Nichols, D.M.: A knowledge-based search engine powered by wikipedia. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 445–454 (2007)

    Google Scholar 

  25. Jarvella, R.J.: Syntactic processing of connected speech. J. Verb. Learn. Verb. Behav. 10 (1971)

    Google Scholar 

  26. Just, M.A., Carpenter, P.A.: A capacity theory of comprehension: Individual differences in working memory. Psychol. Rev. 99 (1992)

    Google Scholar 

  27. Fellbaum, C.: WordNet: An Electronic Lexical Database (1998)

    Google Scholar 

  28. Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009)

    Article  Google Scholar 

  29. Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63 (1956)

    Google Scholar 

  30. Lewis, D.D.: Reuters-21578 Distribution 1.0, http://www.daviddlewis.com/resources/testcollections/reuters21578 (last retrieved April 22, 2010)

  31. Massey, L.: Evaluating and Comparing Text Clustering Results. In: Proceedings of 2005 IASTED International Conference on Computational Intelligence (2005)

    Google Scholar 

  32. Dhillon, I.S., Modha, D.M.: Concept Decompositions for Large Sparse Text Data using Clustering. Mach. Learn. 42, 1 (2001)

    Article  MATH  Google Scholar 

  33. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st National Conference on Artificial intelligence, pp. 1301–1306 (2006)

    Google Scholar 

  34. Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L., Zhang, T.: Classifying search queries using the Web as a source of knowledge. ACM Trans. Web 3, 2 (2009)

    Article  Google Scholar 

  35. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Sci. Am. 284, 5 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Massey, L. (2011). Autonomous and Adaptive Identification of Topics in Unstructured Text. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowlege-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23863-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23863-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23862-8

  • Online ISBN: 978-3-642-23863-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics