Skip to main content

Two-Way Parsimonious Classification Models forĀ Evolving Hierarchies

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9822))

Abstract

There is an increasing volume of semantically annotated data available, in particular due to the emerging use of knowledge bases to annotate or classify dynamic data on the web. This is challenging as these knowledge bases have a dynamic hierarchical or graph structure demanding robustness against changes in the data structure over time. In general, this requires us to develop appropriate models for the hierarchical classes that capture all, and only, the essential solid features of the classes which remain valid even as the structure changes. We propose hierarchical significant words language models of textual objects in the intermediate levels of hierarchies as robust models for hierarchical classification by taking the hierarchical relations into consideration. We conduct extensive experiments on richly annotated parliamentary proceedings linking every speech to the respective speaker, their political party, and their role in the parliament. Our main findings are the following. First, we define hierarchical significant words language models as an iterative estimation process across the hierarchy, resulting in tiny models capturing only well grounded text features at each level. Second, we apply the resulting models to party membership and party position classification across time periods, where the structure of the parliament changes, and see the models dramatically better transfer across time periods, relative to the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113ā€“120 (2006)

    Google ScholarĀ 

  2. Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS ā€™24, pp. 2456ā€“2464 (2011)

    Google ScholarĀ 

  3. Dehghani, M.: Significant words representations of entities. In: SIGIR 2016 (2016)

    Google ScholarĀ 

  4. Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: Generalized group profiling for content customization. In: CHIIR 2016, pp. 245ā€“248 (2016)

    Google ScholarĀ 

  5. Dehghani, M., Azarbonyad, H., Kamps, J., Hiemstra, D., Marx, M.: Luhn revisited: significant words language models. In: The Proceedings of The ACM International Conference on Information and Knowledge Management (CIKMā€™16) (2016)

    Google ScholarĀ 

  6. Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: The Proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIRā€™16) (2016)

    Google ScholarĀ 

  7. Diermeier, D., Godbout, J.-F., Yu, B., Kaufmann, S.: Language and ideology in congress. Br. J. Polit. Sci. 42(1), 31ā€“55 (2012)

    ArticleĀ  Google ScholarĀ 

  8. Dumais, S., Chen, H.: Hierarchical classification of web content. In: SIGIR, pp. 256ā€“263 (2000)

    Google ScholarĀ 

  9. Frank, J.R., Kleiman-Weiner, M., Roberts, D.A., Voorhees, E.M., Soboroff, I.: Evaluating stream filtering for entity profile updates in trec 2012, 2013 and 2014. In: TREC 2014 (2012)

    Google ScholarĀ 

  10. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: SIGIR 2004, pp. 178ā€“185 (2004)

    Google ScholarĀ 

  11. Hirst, G., Riabinin, Y., Graham, J., Boizot-Roche, M.: Text to ideology or text to party status? From Text Polit. Positions: Text Anal. Across Disciplines 55, 93ā€“116 (2014)

    ArticleĀ  Google ScholarĀ 

  12. Kim, D.-K., Voelker, G., Saul, L.K.: A variational approximation for topic modeling of hierarchical corpora. In: ICML, pp. 55ā€“63 (2013)

    Google ScholarĀ 

  13. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159ā€“165 (1958)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  14. Marx, M., Schuth, A.: Dutchparl: a corpus of parliamentary documents in dutch. In: DIR Workshop, pp. 82ā€“83 (2010)

    Google ScholarĀ 

  15. McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: ICML 1998, pp. 359ā€“367 (1998)

    Google ScholarĀ 

  16. Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., SzlĆ”vik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224ā€“237. Springer, Heidelberg (2005)

    ChapterĀ  Google ScholarĀ 

  17. Oh, H.-S., Choi, Y., Myaeng, S.-H.: Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 7ā€“18. Springer, Heidelberg (2011)

    ChapterĀ  Google ScholarĀ 

  18. PoliticalMashup. Political mashup project (2015). http://search.politicalmashup.nl/. Netherlands Organization for Scientific Research

  19. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1ā€“47 (2002)

    ArticleĀ  Google ScholarĀ 

  20. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1ā€“2), 31ā€“72 (2011)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  21. Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI, pp. 1579ā€“1585 (2014)

    Google ScholarĀ 

  22. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566ā€“1581 (2006)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  23. Xue, G.-R., Dai, W., Yang, Q., Yu, Y.: Topic-bridged plsa for cross-domain text classification. In: SIGIR 2008, pp. 627ā€“634 (2008)

    Google ScholarĀ 

  24. Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937ā€“946 (2009)

    Google ScholarĀ 

  25. Yu, B., Kaufmann, S., Diermeier, D.: Classifying party affiliation from political speech. J. Inf. Technol. Politics 5(1), 33ā€“48 (2008)

    ArticleĀ  Google ScholarĀ 

  26. Zavitsanos, E., Paliouras, G., Vouros, G.A.: Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. J. Mach. Learn. Res. 12, 2749ā€“2775 (2011)

    MathSciNetĀ  MATHĀ  Google ScholarĀ 

  27. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334ā€“342 (2001)

    Google ScholarĀ 

Download references

Acknowledgments

This research is funded in part by Netherlands Organization for Scientific Research through the Exploratory Political Search project (ExPoSe, NWO CI # 314.99.108), and by the Digging into Data Challenge through the Digging Into Linked Parliamentary Data project (DiLiPaD, NWO Digging into Data # 600.006.014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Dehghani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M. (2016). Two-Way Parsimonious Classification Models forĀ Evolving Hierarchies. In: Fuhr, N., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science(), vol 9822. Springer, Cham. https://doi.org/10.1007/978-3-319-44564-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44564-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44563-2

  • Online ISBN: 978-3-319-44564-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics