International Conference of the Cross-Language Evaluation Forum for European Languages

Experimental IR Meets Multilinguality, Multimodality, and Interaction pp 215-221 | Cite as

Are Topically Diverse Documents Also Interesting?

  • Hosein Azarbonyad
  • Ferron Saan
  • Mostafa Dehghani
  • Maarten Marx
  • Jaap Kamps
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9283)

Abstract

Text interestingness is a measure of assessing the quality of documents from users’ perspective which shows their willingness to read a document. Different approaches are proposed for measuring the interestingness of texts. Most of these approaches suppose that interesting texts are also topically diverse and estimate interestingness using topical diversity. In this paper, we investigate the relation between interestingness and topical diversity. We do this on the Dutch and Canadian parliamentary proceedings. We apply an existing measure of interestingness, which is based on structural properties of the proceedings (eg, how much interaction there is between speakers in a debate). We then compute the correlation between this measure of interestingness and topical diversity.

Our main findings are that in general there is a relatively low correlation between interestingness and topical diversity; that there are two extreme categories of documents: highly interesting, but hardly diverse (focused interesting documents) and highly diverse but not interesting documents. When we remove these two extreme types of documents there is a positive correlation between interestingness and diversity.

Keywords

Text interestingness Text topical diversity Parliamentary proceedings 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bache, K., Newman, D., Smyth, P.: Text-based measures of document diversity. In: KDD 2013, pp. 23–31 (2013)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Derzinski, M., Rohanimanesh, K.: An information theoretic approach to quantifying text interestingness. In: NIPS MLNLP Workshop (2014)Google Scholar
  4. 4.
    Hogenboom, A., Jongmans, M., Frasincar, F.: Structuring political documents for importance ranking. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 345–350. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  5. 5.
    Rao, C.R.: Diversity and dissimilarity coefficients: a unified approach. Theoretical Population Biology 21(1), 24–43 (1982)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Van Dongen, S., Enright, A.J.: Metric distances derived from cosine similarity and pearson and spearman correlations (2012). arXiv preprint http://arxiv.org/abs/1208.3145 arXiv:1208.3145

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hosein Azarbonyad
    • 1
  • Ferron Saan
    • 1
  • Mostafa Dehghani
    • 1
  • Maarten Marx
    • 1
  • Jaap Kamps
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations