Autonomous and Adaptive Identification of Topics in Unstructured Text

Massey, Louis

doi:10.1007/978-3-642-23863-5_1

Louis Massey²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6882))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1324 Accesses
5 Citations

Abstract

Existing topic identification techniques must tackle an important problem: they depend on human intervention, thus incurring major preparation costs and lacking operational flexibility when facing novelty. To resolve this issue, we propose an adaptable and autonomous algorithm that discovers topics in unstructured text documents. The algorithm is based on principles that differ from existing natural language processing and artificial intelligence techniques. These principles involve the retrieval, activation and decay of general-purpose lexical knowledge, inspired by how the brain may process information when someone reads. The algorithm handles words sequentially in a single document, contrary to the usual corpus-based bag-of-words approach. Empirical results demonstrate the potential of the new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research (3), 993–1022 (2003)
Google Scholar
Landauer, T.K., Dumais, S.T.: Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
McNamara, D.S.: Computational methods to extract meaning from text and advance theories of human cognition. Topics in Cognitive Science 3(1), 3–17 (2011)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (2002)
Article Google Scholar
Qi, X., Davison, B.D.: Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2 (2009)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 3 (1999)
Article Google Scholar
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, NY (2006)
Book Google Scholar
Massey, L.: On the quality of ART1 text clustering. Neural Networks 16, 5–6 (2003)
Article Google Scholar
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. J. ACM 15, 1 (1968)
Article MATH Google Scholar
Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. of Doc. 28, 1 (1972)
Article Google Scholar
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman Pub. Group, NY (1976)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of Semantic Web Workshop, the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, NY (2003)
Google Scholar
Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 179–186. ACM, NY (2008)
Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: Proceedings of 16th International Conference on Machine Learning, pp. 379–388 (1999)
Google Scholar
Lenat, D.B.: CYC: A Large-Scale Investment in Knowledge Infrastructure. Commun. ACM 38, 11 (1995)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), pp. 509–518. ACM, New York (2008)
Chapter Google Scholar
Kim, H.L., Scerri, S., Breslin, J.G., Decker, S., Kim, H.G.: The state of the art in tag ontologies: a semantic model for tagging and folksonomies. In: Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications (DCMI 2008), Dublin Core Metadata Initiative, pp. 128–137 (2008)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)
Article Google Scholar
Velardi, P., Navigli, R., D’Amadio, P.: Mining the Web to Create Specialized Glossaries. IEEE Intelligent Systems 23(5), 18–25 (2008)
Article Google Scholar
Wong, W., Liu, W., Bennamoun, M.: A probabilistic framework for automatic term recognition. Intelligent Data Analysis 13(4), 499–539 (2009)
Google Scholar
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1(4), 390 (1957)
Article MathSciNet Google Scholar
Cabre-Castellvi, T., Estopa, R., Vivaldi-Palatresi, J.: Automatic term detection: A review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.C. (eds.) Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of the 2003 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 216–223. Association for Computational Linguistics, Morristown (2003)
Chapter Google Scholar
Milne, D.N., Witten, I.H., Nichols, D.M.: A knowledge-based search engine powered by wikipedia. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 445–454 (2007)
Google Scholar
Jarvella, R.J.: Syntactic processing of connected speech. J. Verb. Learn. Verb. Behav. 10 (1971)
Google Scholar
Just, M.A., Carpenter, P.A.: A capacity theory of comprehension: Individual differences in working memory. Psychol. Rev. 99 (1992)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database (1998)
Google Scholar
Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2 (2009)
Article Google Scholar
Miller, G.A.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 63 (1956)
Google Scholar
Lewis, D.D.: Reuters-21578 Distribution 1.0, http://www.daviddlewis.com/resources/testcollections/reuters21578 (last retrieved April 22, 2010)
Massey, L.: Evaluating and Comparing Text Clustering Results. In: Proceedings of 2005 IASTED International Conference on Computational Intelligence (2005)
Google Scholar
Dhillon, I.S., Modha, D.M.: Concept Decompositions for Large Sparse Text Data using Clustering. Mach. Learn. 42, 1 (2001)
Article MATH Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st National Conference on Artificial intelligence, pp. 1301–1306 (2006)
Google Scholar
Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L., Zhang, T.: Classifying search queries using the Web as a source of knowledge. ACM Trans. Web 3, 2 (2009)
Article Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Sci. Am. 284, 5 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Royal Military College, Kingston, Canada, K7K 7B4
Louis Massey

Authors

Louis Massey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Integrated Sensor Systems, University of Kaiserslautern, Erwin-Schroedinger-str. 12, 67663, Kaiserslautern, Germany
Andreas König
Knowledge-Based Systems Group, Department of Computer Science, University of Kaiserslautern, P.O. Box 3049, 67653, Kaiserslautern, Germany
Andreas Dengel
School of Business, University of Applied Sciences Northwestern Switzerland, Riggenbachstr. 16, 4600, Olten, Switzerland
Knut Hinkelmann
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, 599-8531, Sakai,, Osaka, Japan
Koichi Kise
KES International, P.O. Box 2115, BN43 9AF, Shoreham-by-sea, UK
Robert J. Howlett
University of South Australia, Adelaide, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Massey, L. (2011). Autonomous and Adaptive Identification of Topics in Unstructured Text. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowlege-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23863-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-23863-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23862-8
Online ISBN: 978-3-642-23863-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics