Skip to main content

Analyzing Entities and Topics in News Articles Using Statistical Topic Models

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3975))

Abstract

Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a novel combination of statistical topic models and named-entity recognizers to jointly analyze entities mentioned (persons, organizations and places) and topics discussed in a collection of 330,000 New York Times news articles. We demonstrate an analytic framework which automatically extracts from a large collection: topics; topic trends; and topics that relate entities.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Klimt, B., Yang, Y.: A New Dataset for Email Classification Research. In: 15th European Conference on Machine Learning (2004)

    Google Scholar 

  2. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1, 67–88 (1999)

    Google Scholar 

  3. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  4. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by Latent Semantic Analysis. American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37, 573–595 (1994)

    Article  MathSciNet  Google Scholar 

  6. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: 22nd Int’l. Conference on Research and Development in Information Retrieval (1999)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 1, 993–1022 (2003)

    Article  Google Scholar 

  8. Minka, T., La, J.: Expectation-Propagation for the Generative Aspect Model. In: 18th Conference on Uncertainty and Artificial Intelligence (2002)

    Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding Scientific Topics. National Academy of Sciences 101 (suppl. 1), 5228–5235 (2004)

    Google Scholar 

  10. Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of Population Structure using Multilocus Genotype Data. Genetics 155, 945–959 (2000)

    Google Scholar 

  11. Buntine, W., Perttu, S., Tuulos, V.: Using Discrete PCA on Web Pages. In: Proceedings of the Workshop W1 on Statistical Approaches for Web Mining (SAWM), Italy, pp. 99–110 (2004)

    Google Scholar 

  12. McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and Role Discovery in Social Networks. In: 19th Joint Conference on Artificial Intelligence (2005)

    Google Scholar 

  13. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic Author-Topic Models for Information Discovery. In: 10th ACM SIGKDD (2004)

    Google Scholar 

  14. Newman, D.J., Block, S.: Probabilistic Topic Decomposition of an Eighteenth-Century Newspaper. Journal American Society for Information Science and Technology (2006)

    Google Scholar 

  15. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The Author-Topic Model for Authors and Documents. In: 20th Int’l. Conference on Uncertainty in AI (2004)

    Google Scholar 

  16. Blei, D., Jordan, M.: Modeling Annotated Data. In: 26th International ACM SIGIR, pp. 127–134 (2003)

    Google Scholar 

  17. Griffiths, T., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating Topics and Syntax. Advances in Neural Information Processing Systems 17 (2004)

    Google Scholar 

  18. Steyvers, M., Griffiths, T.L.: Probabilistic Topic Models. In: Landauer, T. (ed.) Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, Mahwah (2006)

    Google Scholar 

  19. Brill E.: Some Advances in Transformation-Based Part of Speech Tagging. National Conference on Artificial Intelligence (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Newman, D., Chemudugunta, C., Smyth, P., Steyvers, M. (2006). Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, FY. (eds) Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760146_9

Download citation

  • DOI: https://doi.org/10.1007/11760146_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34478-0

  • Online ISBN: 978-3-540-34479-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics