Abstract
There is an increasing volume of semantically annotated data available, in particular due to the emerging use of knowledge bases to annotate or classify dynamic data on the web. This is challenging as these knowledge bases have a dynamic hierarchical or graph structure demanding robustness against changes in the data structure over time. In general, this requires us to develop appropriate models for the hierarchical classes that capture all, and only, the essential solid features of the classes which remain valid even as the structure changes. We propose hierarchical significant words language models of textual objects in the intermediate levels of hierarchies as robust models for hierarchical classification by taking the hierarchical relations into consideration. We conduct extensive experiments on richly annotated parliamentary proceedings linking every speech to the respective speaker, their political party, and their role in the parliament. Our main findings are the following. First, we define hierarchical significant words language models as an iterative estimation process across the hierarchy, resulting in tiny models capturing only well grounded text features at each level. Second, we apply the resulting models to party membership and party position classification across time periods, where the structure of the parliament changes, and see the models dramatically better transfer across time periods, relative to the baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113ā120 (2006)
Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS ā24, pp. 2456ā2464 (2011)
Dehghani, M.: Significant words representations of entities. In: SIGIR 2016 (2016)
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: Generalized group profiling for content customization. In: CHIIR 2016, pp. 245ā248 (2016)
Dehghani, M., Azarbonyad, H., Kamps, J., Hiemstra, D., Marx, M.: Luhn revisited: significant words language models. In: The Proceedings of The ACM International Conference on Information and Knowledge Management (CIKMā16) (2016)
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: The Proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIRā16) (2016)
Diermeier, D., Godbout, J.-F., Yu, B., Kaufmann, S.: Language and ideology in congress. Br. J. Polit. Sci. 42(1), 31ā55 (2012)
Dumais, S., Chen, H.: Hierarchical classification of web content. In: SIGIR, pp. 256ā263 (2000)
Frank, J.R., Kleiman-Weiner, M., Roberts, D.A., Voorhees, E.M., Soboroff, I.: Evaluating stream filtering for entity profile updates in trec 2012, 2013 and 2014. In: TREC 2014 (2012)
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: SIGIR 2004, pp. 178ā185 (2004)
Hirst, G., Riabinin, Y., Graham, J., Boizot-Roche, M.: Text to ideology or text to party status? From Text Polit. Positions: Text Anal. Across Disciplines 55, 93ā116 (2014)
Kim, D.-K., Voelker, G., Saul, L.K.: A variational approximation for topic modeling of hierarchical corpora. In: ICML, pp. 55ā63 (2013)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159ā165 (1958)
Marx, M., Schuth, A.: Dutchparl: a corpus of parliamentary documents in dutch. In: DIR Workshop, pp. 82ā83 (2010)
McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: ICML 1998, pp. 359ā367 (1998)
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., SzlĆ”vik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224ā237. Springer, Heidelberg (2005)
Oh, H.-S., Choi, Y., Myaeng, S.-H.: Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 7ā18. Springer, Heidelberg (2011)
PoliticalMashup. Political mashup project (2015). http://search.politicalmashup.nl/. Netherlands Organization for Scientific Research
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1ā47 (2002)
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1ā2), 31ā72 (2011)
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI, pp. 1579ā1585 (2014)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566ā1581 (2006)
Xue, G.-R., Dai, W., Yang, Q., Yu, Y.: Topic-bridged plsa for cross-domain text classification. In: SIGIR 2008, pp. 627ā634 (2008)
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: SIGKDD, pp. 937ā946 (2009)
Yu, B., Kaufmann, S., Diermeier, D.: Classifying party affiliation from political speech. J. Inf. Technol. Politics 5(1), 33ā48 (2008)
Zavitsanos, E., Paliouras, G., Vouros, G.A.: Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes. J. Mach. Learn. Res. 12, 2749ā2775 (2011)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334ā342 (2001)
Acknowledgments
This research is funded in part by Netherlands Organization for Scientific Research through the Exploratory Political Search project (ExPoSe, NWO CI # 314.99.108), and by the Digging into Data Challenge through the Digging Into Linked Parliamentary Data project (DiLiPaD, NWO Digging into Data # 600.006.014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M. (2016). Two-Way Parsimonious Classification Models forĀ Evolving Hierarchies. In: Fuhr, N., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science(), vol 9822. Springer, Cham. https://doi.org/10.1007/978-3-319-44564-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-44564-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44563-2
Online ISBN: 978-3-319-44564-9
eBook Packages: Computer ScienceComputer Science (R0)