Skip to main content

TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1285))

Included in the following conference series:

Abstract

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

This work has been partially supported by the “Wachstumskern Qurator – Corporate Smart Insights” project (03WKDA1F) funded by the German Federal Ministry of Education and Research (BMBF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Joachims, T.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Proceedings of the Conference on Research and Development in Information Retrieval, SIGIR (2001).

  2. 2.

    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006).

  3. 3.

    https://archive.org/details/stackexchange.

References

  1. Al Qundus, J., Peikert, S., Paschke, A.: AI supported topic modeling using KNIME-workflows. In: Conference on Digit Curation Technologies, Berlin, Germany (2020)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Wei, L., McCallum, A.: Pachinko: allocation DAG-structured mixture models of topic correlations. In: ACM International Conference Proceeding Series (2006)

    Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999 (1999)

    Google Scholar 

  5. Allahyari, M., Kochut, K.: Automatic topic labeling using ontology-based topic models. In: Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 (2016)

    Google Scholar 

  6. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search Data Mining (2013)

    Google Scholar 

  7. AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 67–82. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04180-8_22

    Chapter  Google Scholar 

  8. Song, Y., Pan, S., Liu, S., Zhou, M.X., Qian, W.: Topic and keyword re-ranking for LDA-based topic modeling. In: International Conference on Information and Knowledge Management Proceedings (2009)

    Google Scholar 

  9. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  10. Mehta, V., Caceres, R.S., Carter, K.M.: Evaluating topic quality using model clustering. In: IEEE SSCI 2014–2014 IEEE Symposium on Computational Intelligence and Data Mining, Proceedings (2015)

    Google Scholar 

  11. Al Qundus, J., Paschke, A., Kumar, S., Gupta, S.: Calculating trust in domain analysis: theoretical trust model. Int. J. Inf. Manage. 48, 1–11 (2019)

    Article  Google Scholar 

  12. Qundus, J.A., Paschke, A.: Investigating the effect of attributes on user trust in social media. In: Elloumi, M., Granitzer, M., Hameurlain, A., Seifert, C., Stein, B., Tjoa, A.M., Wagner, R. (eds.) DEXA 2018. CCIS, vol. 903, pp. 278–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99133-7_23

    Chapter  Google Scholar 

  13. Al Qundus, J., Paschke, A., Gupta, S., Alzouby, A., Yousef, M.: Exploring the impact of short text complexity and structure on its quality in social media. J. Enterp. Inf. Manage. (2020)

    Google Scholar 

  14. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., et al.: KNIME: the Konstanz information miner. SIGKDD Explor. 319–326 (2008)

    Google Scholar 

  15. Xu, Q.-S., Liang, Y.-Z.: Monte Carlo cross validation. Chemom. Intell. Lab. Syst. 56, 1–11 (2001)

    Article  Google Scholar 

  16. Manevitz, L., Yousef, M.: One-class document classification via Neural Networks. Neurocomputing 70, 1466–81 (2007)

    Article  Google Scholar 

  17. Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malik Yousef .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yousef, M., Qundus, J.A., Peikert, S., Paschke, A. (2020). TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data. In: Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2020. Communications in Computer and Information Science, vol 1285. Springer, Cham. https://doi.org/10.1007/978-3-030-59028-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59028-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59027-7

  • Online ISBN: 978-3-030-59028-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics