Skip to main content

Feasibility of Word Difficulty Prediction

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9309))

Included in the following conference series:

  • International Symposium on String Processing and Information Retrieval
  • 1115 Accesses

Abstract

We present a machine learning algorithm to predict how difficult is a word for a person with dyslexia. To train the algorithm we used a data set of words labeled as easy or difficult. The algorithm predicts correctly slightly above 72% of our instances, showing the feasibility of building such a predictive solution for this problem. The main purpose of our work is to be able to weight words in order to perform lexical simplification in texts read by people with dyslexia. Since the main feature used by the classifier, and the only that is not computed in constant time, is the number of similar words in a dictionary, we did a study on the different methods that exist to compute efficiently this feature. This algorithmic comparison is interesting on its own sake and shows that two algorithms can solve the problem in less than a second.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  2. Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)

    Article  MATH  Google Scholar 

  3. Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J.: Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In: Proc. of AAAI 1998 Workshop on Integrating Artificial Intelligence and Assistive Technology, pp. 7–10 (1998)

    Google Scholar 

  4. Chang, W.I., Lampe, J.: Theoretical and empirical comparisons of approximate string matching algorithms. In: Apostolico, A.,Crochemore, M., Galil, Z., Manber, U. (eds.) Combinatorial Pattern Matching. LNCS, vol. 644, pp. 175–184. Springer, Heidelberg (1992)

    Google Scholar 

  5. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 91–100. ACM (2004)

    Google Scholar 

  6. Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2), 283 (1975)

    Article  Google Scholar 

  7. Coltheart, M., Davelaar, E., Jonasson, T., Besner, D.: Access to the internal lexicon. Attention and Performance VI, pp. 535–555 (1977)

    Google Scholar 

  8. Ellis, A.W.: Reading, writing and dyslexia. Erlbaum, London (1984)

    Google Scholar 

  9. Evans, R., Orasan, C., Dornescu, I.: An evaluation of syntactic simplification rules for people with autism. In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL, pp. 131–140 (2014)

    Google Scholar 

  10. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)

    Article  Google Scholar 

  11. Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  12. Fulwider, S., Mukherjee, A.: Multiple pattern matching. In: PATTERNS 2010, The Second International Conferences on Pervasive Patterns and Applications, pp. 78–83 (2010)

    Google Scholar 

  13. Hyönä, J., Olson, R.K.: Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition 21(6), 1430 (1995)

    Google Scholar 

  14. Jurafsky, D., Bell, A., Gregory, M., Raymond, W.D.: Evidence from reduction in lexical production. Frequency and the Emergence of Linguistic Structure 45, 229 (2001)

    Article  Google Scholar 

  15. Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission 1(1), 8–17 (1965)

    MathSciNet  Google Scholar 

  16. Malvern, D., Richards, B.: Measures of lexical richness. The Encyclopedia of Applied Linguistics (2012)

    Google Scholar 

  17. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  18. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)

    Google Scholar 

  19. Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research (1998)

    Google Scholar 

  20. Rayner, K., Duffy, S.A.: Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition 14(3), 191–201 (1986)

    Article  Google Scholar 

  21. Rello, L., Baeza-Yates, R., Bott, S., Saggion, H.: Simplify or help? Text simplification strategies for people with dyslexia. In: Proc. W4A 2013, Rio de Janeiro, Brazil (2013)

    Google Scholar 

  22. Rello, L., Baeza-Yates, R., Llisterri, J.: A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysis. Language Resources and Evaluation (to appear)

    Google Scholar 

  23. Saggion, H., Stajner, S., Bott, S., Mille, S., Rello, L., Drndarevic, B.: Making it Simplext: Implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (to appear, 2015)

    Google Scholar 

  24. Tanaka, S., Jatowt, A., Kato, M.P., Tanaka, K.: Estimating content concreteness for finding comprehensible documents. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 475–484. ACM (2013)

    Google Scholar 

  25. Temple, C.M., Marshall, J.C.: A case study of developmental phonological dyslexia. British Journal of Psychology 74(4), 517–533 (1983)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Baeza-Yates .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Baeza-Yates, R., Mayo-Casademont, M., Rello, L. (2015). Feasibility of Word Difficulty Prediction. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23826-5_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23825-8

  • Online ISBN: 978-3-319-23826-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics