International Symposium on String Processing and Information Retrieval

SPIRE 2015: String Processing and Information Retrieval pp 362-373 | Cite as

Feasibility of Word Difficulty Prediction

  • Ricardo Baeza-YatesEmail author
  • Martí Mayo-Casademont
  • Luz Rello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9309)


We present a machine learning algorithm to predict how difficult is a word for a person with dyslexia. To train the algorithm we used a data set of words labeled as easy or difficult. The algorithm predicts correctly slightly above 72% of our instances, showing the feasibility of building such a predictive solution for this problem. The main purpose of our work is to be able to weight words in order to perform lexical simplification in texts read by people with dyslexia. Since the main feature used by the classifier, and the only that is not computed in constant time, is the number of similar words in a dictionary, we did a study on the different methods that exist to compute efficiently this feature. This algorithmic comparison is interesting on its own sake and shows that two algorithms can solve the problem in less than a second.


Down Syndrome Word Length Edit Distance Similar Word Word Complexity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)CrossRefzbMATHGoogle Scholar
  3. 3.
    Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J.: Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In: Proc. of AAAI 1998 Workshop on Integrating Artificial Intelligence and Assistive Technology, pp. 7–10 (1998)Google Scholar
  4. 4.
    Chang, W.I., Lampe, J.: Theoretical and empirical comparisons of approximate string matching algorithms. In: Apostolico, A.,Crochemore, M., Galil, Z., Manber, U. (eds.) Combinatorial Pattern Matching. LNCS, vol. 644, pp. 175–184. Springer, Heidelberg (1992)Google Scholar
  5. 5.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 91–100. ACM (2004)Google Scholar
  6. 6.
    Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2), 283 (1975)CrossRefGoogle Scholar
  7. 7.
    Coltheart, M., Davelaar, E., Jonasson, T., Besner, D.: Access to the internal lexicon. Attention and Performance VI, pp. 535–555 (1977)Google Scholar
  8. 8.
    Ellis, A.W.: Reading, writing and dyslexia. Erlbaum, London (1984)Google Scholar
  9. 9.
    Evans, R., Orasan, C., Dornescu, I.: An evaluation of syntactic simplification rules for people with autism. In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL, pp. 131–140 (2014)Google Scholar
  10. 10.
    Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)CrossRefGoogle Scholar
  11. 11.
    Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)CrossRefGoogle Scholar
  12. 12.
    Fulwider, S., Mukherjee, A.: Multiple pattern matching. In: PATTERNS 2010, The Second International Conferences on Pervasive Patterns and Applications, pp. 78–83 (2010)Google Scholar
  13. 13.
    Hyönä, J., Olson, R.K.: Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition 21(6), 1430 (1995)Google Scholar
  14. 14.
    Jurafsky, D., Bell, A., Gregory, M., Raymond, W.D.: Evidence from reduction in lexical production. Frequency and the Emergence of Linguistic Structure 45, 229 (2001)CrossRefGoogle Scholar
  15. 15.
    Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission 1(1), 8–17 (1965)MathSciNetGoogle Scholar
  16. 16.
    Malvern, D., Richards, B.: Measures of lexical richness. The Encyclopedia of Applied Linguistics (2012)Google Scholar
  17. 17.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  18. 18.
    Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)Google Scholar
  19. 19.
    Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research (1998)Google Scholar
  20. 20.
    Rayner, K., Duffy, S.A.: Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition 14(3), 191–201 (1986)CrossRefGoogle Scholar
  21. 21.
    Rello, L., Baeza-Yates, R., Bott, S., Saggion, H.: Simplify or help? Text simplification strategies for people with dyslexia. In: Proc. W4A 2013, Rio de Janeiro, Brazil (2013)Google Scholar
  22. 22.
    Rello, L., Baeza-Yates, R., Llisterri, J.: A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysis. Language Resources and Evaluation (to appear)Google Scholar
  23. 23.
    Saggion, H., Stajner, S., Bott, S., Mille, S., Rello, L., Drndarevic, B.: Making it Simplext: Implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (to appear, 2015)Google Scholar
  24. 24.
    Tanaka, S., Jatowt, A., Kato, M.P., Tanaka, K.: Estimating content concreteness for finding comprehensible documents. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 475–484. ACM (2013)Google Scholar
  25. 25.
    Temple, C.M., Marshall, J.C.: A case study of developmental phonological dyslexia. British Journal of Psychology 74(4), 517–533 (1983)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
    • 2
    Email author
  • Martí Mayo-Casademont
    • 2
  • Luz Rello
    • 3
  1. 1.Yahoo LabsNew YorkUSA
  2. 2.DTICUniversitat Pompeu FabraBarcelonaSpain
  3. 3.HCI InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations