Towards Compiling Textbooks from Wikipedia

  • Ditty MathewEmail author
  • Sutanu Chakraborti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11320)


In this paper, we explore challenges in compiling a pedagogic resource like a textbook on a given topic from relevant Wikipedia articles, and present an approach towards assisting humans in this task. We present an algorithm that attempts to suggest the textbook structure from Wikipedia based on a set of seed concepts (chapters) provided by the user. We also conceptualize a decision support system where users can interact with the proposed structure and the corresponding Wikipedia content to improve its pedagogic value. The proposed algorithm is implemented and evaluated against the outline of online textbooks on five different subjects. We also propose a measure to quantify the pedagogic value of the suggested textbook structure.



We thank Prof. Marti A. Hearst for the fruitful discussion and feedback, and the members of AIDB lab for their insightful comments. This work is partially funded by TCS Research Scholar Program, India.


  1. 1.
    Agrawal, R., Gollapudi, S., Kenthapadi, K., Srivastava, N., Velu, R.: Enriching textbooks through data mining. In: ACM DEV, p. 19 (2010)Google Scholar
  2. 2.
    Li, Y., Chenguang, Z.: A metric normalization of tree edit distance. Front. Comput. Sci. China 5(1), 119–125 (2011)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Pan, L., Li, C., Li, J., Tang, J.: Prerequisite relation learning for concepts in MOOCs. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1447–1456 (2017)Google Scholar
  4. 4.
    Jain, P., Hitzler, P., Verma, K., Yeh, P. Z., Sheth, A.P.: Moving beyond sameAs with PLATO: partonomy detection for linked data. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pp. 33–42 (2012)Google Scholar
  5. 5.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)CrossRefGoogle Scholar
  6. 6.
    Mateusz, P., Nikolaus, A.: Tree edit distance: robust and memory-efficient. Inf. Syst. 56, 157–173 (2016)CrossRefGoogle Scholar
  7. 7.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999)Google Scholar
  8. 8.
    Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)CrossRefGoogle Scholar
  9. 9.
    Witten, I.H., Milne, D.N.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Wikipedia and AI: An Evolving Synergy, pp. 25–30 (2008)Google Scholar
  10. 10.
    Liang, C., et al.: Bbookx: an automatic book creation framework. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 121–124 (2015)Google Scholar
  11. 11.
    Agrawal, R., Chakraborty, S., Gollapudi, S., Kannan, A., Kenthapadi, K.: Quality of textbooks: an empirical study. In: ACM Symposium on Computing for Development (2012)Google Scholar
  12. 12.
    Talukdar, P.P., Cohen, W.: Crowdsourced Comprehension: predicting prerequisite structure in wikipedia. In: 7th Workshop on Building Educational Applications Using NLP, pp. 307–315 (2012)Google Scholar
  13. 13.
    Mathew D., Eswaran, D., Chakraborti, S.: Towards creating pedagogic views from encyclopedic resources. In: 10th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 190–195 (2015)Google Scholar
  14. 14.
    Liang, C., Wu, Z., Huang, W., Lee Giles, C.: Measuring prerequisite relations among concepts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 1668–1674 (2015)Google Scholar
  15. 15.
    Agrawal, R., Golshan, B., Papalexakis, E.E.: Toward data-driven design of educational courses: a feasibility study. In: Proceedings of the 9th International Conference on Educational Data Mining, EDM, p. 6 (2016)Google Scholar
  16. 16.
    Wang, S., et al.: Using prerequisites to extract concept maps from textbooks. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM, pp. 317–326 (2016)Google Scholar
  17. 17.
    Liang, C., Ye, J., Wu, W., Pursel, B., Giles, C.L.: Recovering concept prerequisite relations from university course dependencies. In: (2017) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4786–4791 (2016)Google Scholar
  18. 18.
    Levary, D., Eckmann, J., Moses, E., Tlusty, T.: Loops and self-reference in the construction of dictionaries. Phys. Rev. 2(3), 031018 (2012)CrossRefGoogle Scholar
  19. 19.
    Agrawal, R., Golshan, B., Papalexakis, E.: Data-driven synthesis of study plans. Data Insights Laboratories (2015)Google Scholar
  20. 20.
    Negahban, S., Oh, S., Shah, D.: Rank centrality: ranking from pairwise comparisons. Oper. Res. 65(1), 266–287 (2016)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Jenks, G.F.: The data model concept in statistical mapping. Int. Yearb. Cartography 7, 186–190 (1967)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Artificial Intelligence and Databases Lab, Department of Computer Sciene and EngineeringIndian Institute of Technology MadrasChennaiIndia

Personalised recommendations