An Interactive Independent Topic Analysis for a Mass Document Review Service

Nishigaki, Takahiro; Nitta, Katsumi; Onoda, Takashi

doi:10.1007/s12626-018-0018-5

An Interactive Independent Topic Analysis for a Mass Document Review Service

Article
Published: 26 April 2018

Volume 12, pages 47–69, (2018)
Cite this article

The Review of Socionetwork Strategies Aims and scope Submit manuscript

140 Accesses
Explore all metrics

Abstract

In this paper, we propose an interactive constrained independent topic analysis in text data mining. Independent topic analysis (ITA) is a method for extracting independent topics from document data using independent component analysis. In this independent topic analysis, the most independent topics between each topic are extracted. By extracting the independent topic, managing documents with a large number of text data is easy with document access support systems and document management systems. However, the topics extracted by ITA are often different from the topics a user requests. For the system to be of service to users, an interactive system that reflects the user’s requests is necessary. Thus, we propose an interactive ITA that works for the user. For example, if there are three topics, i.e., topic A, topic B, and topic C, and a user choose the content from topics A and B, a user can merge those topics into one topic D. In addition, if a user wants to analyze topic A in more detail, a user could separate topic A into topics E and topic F. To that end, we define Merge Link constraints and Separate Link constraints as user requests. The Merge Link constraint is a constraint that merges two topics into one topic. The Separate Link constraint is a constraint that separates two topics from one topic. In this paper, we propose a method for extracting a highly independent topic that meets these constraints. We conducted evaluation experiments on our proposed methods, and obtained results to show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MPTM: A Topic Model for Multi-Part Documents

Extracting information and inferences from a large text corpus

Article 20 November 2022

Exclusive Topic Model

References

Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09) (pp. 25–32), ACM.
Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6, 937–965.
Google Scholar
Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. In Proceedings of the 19th International Conference on Machine Learning (pp. 27–34). Morgan Kaufmann Publishers Inc.
Basu, S., Bilenko, M., & Raymond, M. J. (2004). A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 59–68), ACM.
Basu, S., Davidson, I., & Wagstaff, K. L. (2008). Constrained clustering: Advances in algorithms, theory, and application. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, Boca Raton.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Article Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Google Scholar
Brown, G., Pocock, A., Zhao, M.-J., & Lujan, M. (2012). Conditional likelihood maximisation: A unifying framework for information theoretic feature. Journal of Machine Learning Research, 13, 27–66.
Google Scholar
Chang, H., & Yeung, D. (2004). Locally linear metric adaptation for semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning (pp. 153–160).
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6), 391–407.
Article Google Scholar
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI’99) (pp. 289–29). Morgan Kaufmann Publishers Inc.
Hoi, S. C., Liu, W., Lyu, M. R., & Ma, W. (2006). Learning distance metrics with contextual constraints for image retrieval. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) (vol. 2, pp. 2072–2078).
Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626–634.
Article Google Scholar
Karhunen, J., Oja, E., & Hyvarinen, A. (2001). Independent component analysis. Oxford: Wiley.
Google Scholar
Kamishima, T., Akaho, S., & Sato, I. (2015). A topic model whose information-independence is enhanced. In The 29th Annual Conference of the Japanese Society for Artificial Intelligence, No. 3L3–3.
Kobayashi, H., Wakaki, H., Yamasaki, T., & Suzuki, M. (2012). Topic models with logical constraints on words. In Proceedings of Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing (pp. 42–49).
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 27 Aug 2017.
Salton, G., Fox, E. A., & Wu, H. (1983). Extended boolean information retrieval. Communications of ACM, 26(11), 1022–1036.
Article Google Scholar
Shinohara, Y. (1999). Independent Topic Analysis: Extraction of Characteristic Topics by maximization of Independence, Technical report of IEICE.
Shinohara, Y. (2000). Development of Browsing Assistance System for finding Primary Topics and Tracking their Changes in a Document Database, CRIEPI Research Report.
Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the 18th International Conference on Machine Learning (pp. 577–584), Morgan Kaufmann.
Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Conference of Information and Knowledge Management (CIKM) (pp. 515–524), ACM.
Zhong, S., & Ghosh, J. (2003). A comparative study of generative models for document clustering. In Data Mining Workshop on Clustering High Dimensional Data and Its Applications.

Download references

Acknowledgements

This work was supported by the Japan Science and Technology (JST) agency under the EMS-CREST program.

Author information

Authors and Affiliations

Aoyama Gakuin University, Kanagawa, Japan
Takahiro Nishigaki & Takashi Onoda
Tokyo Institute of Technology, Kanagawa, Japan
Katsumi Nitta

Authors

Takahiro Nishigaki
View author publications
You can also search for this author in PubMed Google Scholar
Katsumi Nitta
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Onoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takahiro Nishigaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nishigaki, T., Nitta, K. & Onoda, T. An Interactive Independent Topic Analysis for a Mass Document Review Service. Rev Socionetwork Strat 12, 47–69 (2018). https://doi.org/10.1007/s12626-018-0018-5

Download citation

Received: 06 September 2017
Accepted: 30 March 2018
Published: 26 April 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s12626-018-0018-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Interactive Independent Topic Analysis for a Mass Document Review Service

Abstract

Access this article

Similar content being viewed by others

MPTM: A Topic Model for Multi-Part Documents

Extracting information and inferences from a large text corpus

Exclusive Topic Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Interactive Independent Topic Analysis for a Mass Document Review Service

Abstract

Access this article

Similar content being viewed by others

MPTM: A Topic Model for Multi-Part Documents

Extracting information and inferences from a large text corpus

Exclusive Topic Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation