Multi-Label Classification of Historical Documents by Using Hierarchical Attention Networks


The quantitative analysis of digitized historical documents has begun in earnest in recent years. Text classification is of particular importance for quantitative historical analysis because it helps to search literature efficiently and to determine the important subjects of a particular age. While numerous historians have joined together to classify large-scale historical documents, consistent classification among individual researchers has not been achieved. In this study, we present a classification method for large-scale historical data that uses a recently developed supervised learning algorithm called the Hierarchical Attention Network (HAN). By applying various classification methods to the Annals of the Joseon Dynasty (AJD), we show that HAN is more accurate than conventional techniques with word-frequency-based features. HAN provides the extent that a particular sentence or word contributes to the classification process through a quantitative value called ’attention’. We extract the representative keywords from various categories by using the attention mechanism and show the evolution of the keywords over the 472-year span of the AJD. Our results reveal that largely two groups of event categories are found in the AJD. In one group, the representative keywords of the categories were stable over long periods while the keywords in the other group varied rapidly, exhibiting repeatedly changing characteristics of the categories. Observing such macroscopic changes of representative words may provide insight into how a particular topic changes over a historical period.

This is a preview of subscription content, access via your institution.


  1. [1]

    D. J. Hopkins and G. King, Am. J. Political Sci. 54, 229 (2010).

    Article  Google Scholar 

  2. [2]

    J. Grimmer and B. M. Stewart, Polit. Anal. 21, 267 (2013).

    Article  Google Scholar 

  3. [3]

    J. B. Michel et al, Science 331, 176 (2011).

    ADS  Article  Google Scholar 

  4. [4]

    S. Klingenstein, T. Hitchcock and S. DeDeo, Proc. Natl. Acad. Sci. U.S.A. 111, 9419 (2014).

    ADS  Article  Google Scholar 

  5. [5]

    S. Hochreiter and J. Schmidhuber, Neural Comput. 9, 1735 (1997).

    Article  Google Scholar 

  6. [6]

    Y. Wu et al, arXiv: 1609.08144.

  7. [7]

    D. Tang et al., in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Baltimore, Maryland, USA, June 23-25, 2014), Vol. 1, pp. 1555–1565.

    Article  Google Scholar 

  8. [8]

    Y. Kim, arXiv:1408.5882.

  9. [9]

    X. Zhang, J. Zhao and Y. LeCun, in Advances in Neural Information Processing Systems (Montreal, Canada, December 7-12, 2015), pp. 649–657.

    Google Scholar 

  10. [10]

    Z. Yang et al, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (San Diego, CA, USA, June 12-17, 2016), pp. 1480–1489.

    Google Scholar 

  11. [11]

    B. Lee, D. Kim, D. Kim and H. Jeong, New Phys.: Sae Mulli 66, 502 (2016).

    Google Scholar 

  12. [12]

    J. Bak and A. Oh, in Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities (LaTeCH) (Beijing, China, July 30, 2015), pp. 10–14.

    Google Scholar 

  13. [13]

    J. Bak and A. Oh, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Brussels, Belgium, October 31-November 4, 2018), pp. 956–961.

    Book  Google Scholar 

  14. [14]

    The Annals of the Joseon Dynasty,

  15. [15]

    The Daily Records of Royal Secretariat of Joseon Dynasty,

  16. [16]

    R. Rehurek and P. Sojka, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, (Valletta, Malta, May 22, 2010), pp. 45–50.

    Google Scholar 

  17. [17]

    T. Mikolov, K. Chen, G. Corrado and J. Dean, arXiv:1301.3781.

  18. [18]

    D. Bahdanau, K. Cho and Y. Bengio, arXiv: 1409.0473.

  19. [19]

    K. Xu et al., in International Conference on Machine Learning (Lille, France, July 6-11, 2015), pp. 2048–2057.

    Google Scholar 

  20. [20]

    D. P. Kingma and J. Ba, arXiv:1412.6980.

  21. [21]

    A. Paszke et al., in 31st Conference on Neural Information Processing Systems (Long Beach, CA, USA, December 4-9, 2017).

    Google Scholar 

  22. [22]

    G. Salton and M. J. McGill, Introduction to Modern Information Retrieval (McGraw-Hill, New York, NY, USA, 1983).

    MATH  Google Scholar 

  23. [23]

    S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (Pearson Education Limited, Malaysia, 2016).

    MATH  Google Scholar 

  24. [24]

    F. Pedregosa et al, J. Mach. Learn. Res. 12, 2825 (2011).

    MathSciNet  Google Scholar 

  25. [25]

    G. Tsoumakas and I. Katakis, Int. J. Data Warehous. Min. 3, 1 (2007).

    Article  Google Scholar 

  26. [26]

    The ratio of people’s names to verbs and nouns in each category is as follows; Royal 0.11, Military 0.13, Diplomacy 0.18, Finance 0.10, Agriculture 0.10, Science 0.01, Politics 0.58, Administration 0.30, Personnel 0.60, Jurisdiction 0.51, Rebellion 0.60, Philosophy 0.25 and History 0.42.

Download references


This work was supported by the National Research Foundation of Korea (Grant No. 2017R1A2B3006930).

Author information



Corresponding author

Correspondence to Hawoong Jeong.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, DK., Lee, B., Kim, D. et al. Multi-Label Classification of Historical Documents by Using Hierarchical Attention Networks. J. Korean Phys. Soc. 76, 368–377 (2020).

Download citation


  • Deep learning
  • Recurrent neural network
  • Text analysis
  • Big data
  • History