Neural Computing and Applications

, Volume 26, Issue 4, pp 929–939 | Cite as

Improving reading comprehension step by step using Online-Boost text readability classification system

  • Lei La
  • Nan Wang
  • Dong-ping Zhou
Original Article


Online reading exercise becomes the universal tool for a wide variety of second language learning systems. Readability sorting is a key step to display suitable reading materials for the learners. Traditional text readability classification techniques cannot meet the request for online learning perfectly as they do not have real-time classification ability and cannot get the information of learners’ language levels. This paper presents a novel framework for online reading exercise which is based on the Online-Boost text readability classification algorithm. We first modified the multinomial Naïve Bayes model to give the reading materials initial readability. We then proposed an Online-Boost algorithm for the text readability update and learners’ reading comprehension evaluation according to the learners’ answers correct rate of the text. Finally, the system would deliver reading materials with different difficulties to testers with different levels of reading ability in real time. The experimental result reveals that the novel method has ideal ease of use and can significantly improve the performance of second language learners.


Readability sorting Text classification Online learning Reading comprehension Boosting Naïve Bayes 


  1. 1.
    Krashen SD (1989) The input hypothesis: issues and implications. Mod Lang J 73(4):440–464CrossRefGoogle Scholar
  2. 2.
    Klingner JK, Artiles AJ, Barletta LM (2006) English language learners who struggle with reading: language acquisition or LD? J Learn Disabil 39(2):107–128CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Mc Laughlin GH (1969) SMOG grading—a new readability formula. J Read 20(5):639–646Google Scholar
  5. 5.
    Farr JN, Jenkins JJ, Paterson DG (1951) Simplification of Flesch reading ease formula. J Appl Psychol 35(5):333–337CrossRefGoogle Scholar
  6. 6.
    Courtis JK, Hassan S (2002) Reading ease of bilingual annual reports. J Bus Commun 39(4):394–413CrossRefGoogle Scholar
  7. 7.
    Graesser AC, McNamara DS, Louwerse MM, Cai Z (2004) Coh–Metrix: analysis of text on cohesion and language. Behav Res Methods 36(2):193–202CrossRefGoogle Scholar
  8. 8.
    Nagy WE, Anderson RC (1987) Learning word meanings from context during normal reading. Am Educ Res J 24(2):237–270CrossRefGoogle Scholar
  9. 9.
    Socher R, Bauer J, Manning CD, Ng AY (2013) Parsing with compositional vector grammars. In: The annual meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp 213–220Google Scholar
  10. 10.
    Schwarm SE, Ostendorf M (2005) Sorting texts by readability. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics (ACL ‘05), pp 523–530Google Scholar
  11. 11.
    Tanaka-Ishii K, Tezuka S, Terada H (2010) Narrow-band analyzer. Comput Linguist 36(2):503–527Google Scholar
  12. 12.
    Schwenker F, Trentin E (2014) Pattern classification and clustering: a review of partially supervised learning approaches. Pattern Recognit Lett 37:4–14CrossRefGoogle Scholar
  13. 13.
    Feldman R, Sanger J (2007) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, New York, pp 77–78Google Scholar
  14. 14.
    Huanling T, Jun W, Zhengkui L (2010) An enhanced AdaBoost algorithm with Naive Bayesian text categorization based on a novel re-weighting strategy. Int J Innov Comput Inf Control 6(11):5299–5310Google Scholar
  15. 15.
    Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309. doi: 10.1109/TPAMI.2010.71 CrossRefGoogle Scholar
  16. 16.
    Vu TT, Braga-Neto UM (2010) Small-sample error estimation for bagged classification rules. EURASIP J Adv Signal Process 2010:1–12CrossRefGoogle Scholar
  17. 17.
    Xiaoyong L, Hui F (2012) A hybrid algorithm for text classification problem. Prz Elektrotech 88(1B):8–11Google Scholar
  18. 18.
    Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034. doi: 10.1109/TKDE.2010.160 CrossRefGoogle Scholar
  19. 19.
    Tan S, Li Y, Sun H et al (2014) Interpreting the public sentiment variations on twitter. IEEE Trans Knowl Data Eng 26(5):1158–1170CrossRefGoogle Scholar
  20. 20.
    Yuanping Z, Mingzhu T, Jia Y (2007) Rocchio text classification based on ontology. In: 7th international conference of Chinese computing (ICCC 2007), China, 2007, pp 266–271Google Scholar
  21. 21.
    Kwon O-W, Lee J-H (2003) Text categorization based on k-nearest neighbor approach for Web site classification. Inf Process Manag 39(1):25–44CrossRefzbMATHGoogle Scholar
  22. 22.
    Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320CrossRefzbMATHGoogle Scholar
  23. 23.
    Javed I, Afzal H, Majeed A et al (2014) Towards creation of linguistic resources for bilingual sentiment analysis of twitter data. In: 19th international conference on applications of natural language to information systems, Montpellier, France, pp 232–236Google Scholar
  24. 24.
    Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of TechnologyGoogle Scholar
  25. 25.
    Crossley SA, Greenfield J, McNamara DS (2008) Assessing text readability using cognitively based indices. Tesol Q 42(3):475–493Google Scholar
  26. 26.
    Kanungo T, Orr D (2009) Predicting the readability of short web summaries. In: Proceedings of the second ACM international conference on web search and data mining, NY, USA, pp 202–211Google Scholar
  27. 27.
    Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034CrossRefGoogle Scholar
  28. 28.
    Miranda V, Jaco D, Henk F (2012) Ethnic concentration in the neighbourhood and majority and minority language: a study of first and second-generation immigrants. Soc Sci Res 41(3):555–569CrossRefGoogle Scholar
  29. 29.
    Abuom TO, Roelien B (2012) Characteristics of Swahili–English bilingual agrammatic spontaneous speech and the consequences for understanding agrammatic aphasia. J Neurolinguist 15(5):885–893Google Scholar
  30. 30.
    González-Ortega D, Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, Díez-Higuera JF, Boto-Giralda D (2010) Real-time hands, face and facial features detection and tracking: application to cognitive rehabilitation tests monitoring. J Netw Comput Appl 33(4):447–466CrossRefGoogle Scholar
  31. 31.
    Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168CrossRefzbMATHGoogle Scholar
  32. 32.
    Gambina A, Szczureka E, Dutkowskia J, Bakunc M, Dadlez M (2009) Classification of peptide mass fingerprint data by novel no-regret boosting method. Comput Biol Med 39(5):460–473CrossRefGoogle Scholar
  33. 33.
    Schapire RE (2005) Boosting with prior knowledge for call classification. IEEE Trans Speech Audio Process 13(2):174–181. doi: 10.1109/TSA.2004.840937 CrossRefGoogle Scholar
  34. 34.
    Zhu J, Rosset S, Zou H, Hastie T (2006) Multi-class AdaBoost. Stanford Education
  35. 35.
    Masnadi-Shirazi H, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings of the 24th international conference on machine learning (ICML ‘07), NY, USA, pp 609–616Google Scholar
  36. 36.
    Hach F, Numanagić I, Alkan C, Sahinalp SC (2012) SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics 28(23):3051–3057CrossRefGoogle Scholar
  37. 37.
    Ting KM, Zheng ZJ (2003) A study of AdaBoost with naive Bayesian classifiers: weakness and improvement. Comput Intell 19(2):186–200CrossRefMathSciNetGoogle Scholar
  38. 38.
    Yijun S, Sinisa T, Jian L (2006) Reducing the overfitting of AdaBoost by controlling its data distribution skewness. Int J Pattern Recogn Artif Intell 20(7):1093–1116CrossRefGoogle Scholar
  39. 39.
    Song E, Huang D, Ma G (2011) Semi-supervised multi-class Adaboost by exploiting unlabeled data. Expert Syst Appl 38(6):6720–6726CrossRefGoogle Scholar
  40. 40.
    Uguz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032CrossRefGoogle Scholar
  41. 41.
    Larson P, Diaconu C, Zwilling MJ, Freedman CS (2011) Optimistic multi-version concurrency control system used for controlling concurrently executing transactions, assigns created version of data records of data store as two timestamps indicating lifetime of version. US Patent US 2011153566-A1 (online).

Copyright information

© The Natural Computing Applications Forum 2014

Authors and Affiliations

  1. 1.First Research Institute of Ministry of Public SecurityBeijingPeople’s Republic of China

Personalised recommendations