Abstract
We present a machine learning algorithm to predict how difficult is a word for a person with dyslexia. To train the algorithm we used a data set of words labeled as easy or difficult. The algorithm predicts correctly slightly above 72% of our instances, showing the feasibility of building such a predictive solution for this problem. The main purpose of our work is to be able to weight words in order to perform lexical simplification in texts read by people with dyslexia. Since the main feature used by the classifier, and the only that is not computed in constant time, is the number of similar words in a dictionary, we did a study on the different methods that exist to compute efficiently this feature. This algorithmic comparison is interesting on its own sake and shows that two algorithms can solve the problem in less than a second.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)
Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)
Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J.: Practical Simplification of English Newspaper Text to Assist Aphasic Readers. In: Proc. of AAAI 1998 Workshop on Integrating Artificial Intelligence and Assistive Technology, pp. 7–10 (1998)
Chang, W.I., Lampe, J.: Theoretical and empirical comparisons of approximate string matching algorithms. In: Apostolico, A.,Crochemore, M., Galil, Z., Manber, U. (eds.) Combinatorial Pattern Matching. LNCS, vol. 644, pp. 175–184. Springer, Heidelberg (1992)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 91–100. ACM (2004)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2), 283 (1975)
Coltheart, M., Davelaar, E., Jonasson, T., Besner, D.: Access to the internal lexicon. Attention and Performance VI, pp. 535–555 (1977)
Ellis, A.W.: Reading, writing and dyslexia. Erlbaum, London (1984)
Evans, R., Orasan, C., Dornescu, I.: An evaluation of syntactic simplification rules for people with autism. In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL, pp. 131–140 (2014)
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221 (1948)
Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)
Fulwider, S., Mukherjee, A.: Multiple pattern matching. In: PATTERNS 2010, The Second International Conferences on Pervasive Patterns and Applications, pp. 78–83 (2010)
Hyönä, J., Olson, R.K.: Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition 21(6), 1430 (1995)
Jurafsky, D., Bell, A., Gregory, M., Raymond, W.D.: Evidence from reduction in lexical production. Frequency and the Emergence of Linguistic Structure 45, 229 (2001)
Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission 1(1), 8–17 (1965)
Malvern, D., Richards, B.: Measures of lexical richness. The Encyclopedia of Applied Linguistics (2012)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research (1998)
Rayner, K., Duffy, S.A.: Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition 14(3), 191–201 (1986)
Rello, L., Baeza-Yates, R., Bott, S., Saggion, H.: Simplify or help? Text simplification strategies for people with dyslexia. In: Proc. W4A 2013, Rio de Janeiro, Brazil (2013)
Rello, L., Baeza-Yates, R., Llisterri, J.: A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysis. Language Resources and Evaluation (to appear)
Saggion, H., Stajner, S., Bott, S., Mille, S., Rello, L., Drndarevic, B.: Making it Simplext: Implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (to appear, 2015)
Tanaka, S., Jatowt, A., Kato, M.P., Tanaka, K.: Estimating content concreteness for finding comprehensible documents. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 475–484. ACM (2013)
Temple, C.M., Marshall, J.C.: A case study of developmental phonological dyslexia. British Journal of Psychology 74(4), 517–533 (1983)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Baeza-Yates, R., Mayo-Casademont, M., Rello, L. (2015). Feasibility of Word Difficulty Prediction. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-23826-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23825-8
Online ISBN: 978-3-319-23826-5
eBook Packages: Computer ScienceComputer Science (R0)