Two-Way Parsimonious Classification Models for Evolving Hierarchies

Dehghani, Mostafa; Azarbonyad, Hosein; Kamps, Jaap; Marx, Maarten

doi:10.1007/978-3-319-44564-9_6

Mostafa Dehghani²¹,
Hosein Azarbonyad²²,
Jaap Kamps²¹ &
…
Maarten Marx²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9822))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1064 Accesses
7 Citations

Abstract

There is an increasing volume of semantically annotated data available, in particular due to the emerging use of knowledge bases to annotate or classify dynamic data on the web. This is challenging as these knowledge bases have a dynamic hierarchical or graph structure demanding robustness against changes in the data structure over time. In general, this requires us to develop appropriate models for the hierarchical classes that capture all, and only, the essential solid features of the classes which remain valid even as the structure changes. We propose hierarchical significant words language models of textual objects in the intermediate levels of hierarchies as robust models for hierarchical classification by taking the hierarchical relations into consideration. We conduct extensive experiments on richly annotated parliamentary proceedings linking every speech to the respective speaker, their political party, and their role in the parliament. Our main findings are the following. First, we define hierarchical significant words language models as an iterative estimation process across the hierarchy, resulting in tiny models capturing only well grounded text features at each level. Second, we apply the resulting models to party membership and party position classification across time periods, where the structure of the parliament changes, and see the models dramatically better transfer across time periods, relative to the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
Google Scholar
Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS ’24, pp. 2456–2464 (2011)
Google Scholar
Dehghani, M.: Significant words representations of entities. In: SIGIR 2016 (2016)
Google Scholar
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: Generalized group profiling for content customization. In: CHIIR 2016, pp. 245–248 (2016)
Google Scholar
Dehghani, M., Azarbonyad, H., Kamps, J., Hiemstra, D., Marx, M.: Luhn revisited: significant words language models. In: The Proceedings of The ACM International Conference on Information and Knowledge Management (CIKM’16) (2016)
Google Scholar
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: The Proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR’16) (2016)
Google Scholar
Diermeier, D., Godbout, J.-F., Yu, B., Kaufmann, S.: Language and ideology in congress. Br. J. Polit. Sci. 42(1), 31–55 (2012)
Article Google Scholar
Dumais, S., Chen, H.: Hierarchical classification of web content. In: SIGIR, pp. 256–263 (2000)
Google Scholar
Frank, J.R., Kleiman-Weiner, M., Roberts, D.A., Voorhees, E.M., Soboroff, I.: Evaluating stream filtering for entity profile updates in trec 2012, 2013 and 2014. In: TREC 2014 (2012)
Google Scholar
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: SIGIR 2004, pp. 178–185 (2004)
Google Scholar
Hirst, G., Riabinin, Y., Graham, J., Boizot-Roche, M.: Text to ideology or text to party status? From Text Polit. Positions: Text Anal. Across Disciplines 55, 93–116 (2014)
Article Google Scholar
Kim, D.-K., Voelker, G., Saul, L.K.: A variational approximation for topic modeling of hierarchical corpora. In: ICML, pp. 55–63 (2013)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Marx, M., Schuth, A.: Dutchparl: a corpus of parliamentary documents in dutch. In: DIR Workshop, pp. 82–83 (2010)
Google Scholar
McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: ICML 1998, pp. 359–367 (1998)
Google Scholar
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)
Chapter Google Scholar
Oh, H.-S., Choi, Y., Myaeng, S.-H.: Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 7–18. Springer, Heidelberg (2011)
Chapter Google Scholar
PoliticalMashup. Political mashup project (2015). http://search.politicalmashup.nl/. Netherlands Organization for Scientific Research
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)
Article MathSciNet MATH Google Scholar
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI, pp. 1579–1585 (2014)
Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Article MathSciNet MATH Google Scholar
Xue, G.-R., Dai, W., Yang, Q., Yu, Y.: Topic-bridged plsa for cross-domain text classification. In: SIGIR 2008, pp. 627–634 (2008)
Google Scholar
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937–946 (2009)
Google Scholar
Yu, B., Kaufmann, S., Diermeier, D.: Classifying party affiliation from political speech. J. Inf. Technol. Politics 5(1), 33–48 (2008)
Article Google Scholar
Zavitsanos, E., Paliouras, G., Vouros, G.A.: Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. J. Mach. Learn. Res. 12, 2749–2775 (2011)
MathSciNet MATH Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342 (2001)
Google Scholar

Download references

Acknowledgments

This research is funded in part by Netherlands Organization for Scientific Research through the Exploratory Political Search project (ExPoSe, NWO CI # 314.99.108), and by the Digging into Data Challenge through the Digging Into Linked Parliamentary Data project (DiLiPaD, NWO Digging into Data # 600.006.014).

Author information

Authors and Affiliations

Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands
Mostafa Dehghani & Jaap Kamps
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Hosein Azarbonyad & Maarten Marx

Authors

Mostafa Dehghani
View author publications
You can also search for this author in PubMed Google Scholar
Hosein Azarbonyad
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Marx
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mostafa Dehghani .

Editor information

Editors and Affiliations

Universität Duisburg-Essen , Duisburg, Germany
Norbert Fuhr
Universidade de Évora , Évora, Portugal
Paulo Quaresma
University of Évora , Évora, Portugal
Teresa Gonçalves
Aalborg University Copenhagen , Copenhagen, Denmark
Birger Larsen
University of Stavanger , Stavanger, Norway
Krisztian Balog
University of Glasgow , Glasgow, United Kingdom
Craig Macdonald
University of Padua , Padua, Italy
Linda Cappellato
University of Padua , Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M. (2016). Two-Way Parsimonious Classification Models for Evolving Hierarchies. In: Fuhr, N., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science(), vol 9822. Springer, Cham. https://doi.org/10.1007/978-3-319-44564-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-44564-9_6
Published: 23 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44563-2
Online ISBN: 978-3-319-44564-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Two-Way Parsimonious Classification Models for Evolving Hierarchies