Using Twitter to learn about the autism community

  • Adham Beykikhoshk
  • Ognjen Arandjelović
  • Dinh Phung
  • Svetha Venkatesh
  • Terry Caelli
Original Article


Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals’ carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD—their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.


Social media Big data Asperger’s Mental health Health care Public health ASD 



The authors would like to express their sincere gratitude to the anonymous reviewers whose constructive feedback on our original work (Beykikhoshk et al. 2014) greatly helped shape the present paper. Specifically, we are thankful for their suggestions for additional experiments which were described herein.


  1. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on language in social, media, pp 30–38Google Scholar
  2. American Psychiatric Association (2013) Autism spectrum disorder fact sheet. American Psychiatric Publishing, ArlingtonGoogle Scholar
  3. Asur S, Huberman BA (2010) Predicting the future with social media. In: Proceedings of the IEEE/ACM international conference on web intelligence and intelligent agent technology, pp 492–499Google Scholar
  4. Arandjelović O (2010) Automatic attribution of ancient Roman imperial coins. In: Procedings of the IEEE conference on computer vision and pattern recognition, pp 1728–1734Google Scholar
  5. Arandjelović O (2012) Object matching using boundary descriptors. In: Proceedings of the British machine vision conference. doi: 10.5244/C.26.85
  6. Arandjelović O (2012) Reading ancient coins: automatically identifying denarii using obverse legend seeded retrieval. In: Proceedings of the European conference on computer vision, vol 4, pp 317–330Google Scholar
  7. Baucom E, Sanjari A, Liu X, Chen M (2013) Mirroring the real world in social media: Twitter, geolocation, and sentiment analysis. In: Proceedings of the international workshop on mining unstructured big data using natural language processing, pp 61–68Google Scholar
  8. Baxter AJ, Brugha TS, Erskine HE, Scheurer RW, Vos T, Scott JG (2015) The epidemiology and global burden of autism spectrum disorders. Psychol Med 45(3):601–613CrossRefGoogle Scholar
  9. Beykikhoshk A, Arandjelović O, Phung D, Venkatesh S, Caelli T (2014) Data-mining Twitter and the autism spectrum disorder: a pilot study. In: Proceedings of the IEEE/ACM international conference on advances in social network analysis and mining, pp 349–356Google Scholar
  10. Beykikhoshk A, Arandjelović O, Phung D, Venkatesh S (2015) Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, vol 1, pp 550–562Google Scholar
  11. Bifet A, Frank E (2010) Sentiment knowledge discovery in Twitter streaming data. In: Proceedings of the international conference on discovery science, pp 1–15Google Scholar
  12. Bishop CM (2007) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  13. Bollen J, Mao H, Pepe A (2011) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of the international conference on weblogs and social media, pp 450–453Google Scholar
  14. Bouchaud JP, Mézard M (2000) Wealth condensation in a simple model of economy. Phys A 282(3):536–545CrossRefGoogle Scholar
  15. Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703zbMATHMathSciNetCrossRefGoogle Scholar
  16. Chew C, Eysenbach G (2010) Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLoS One 5(11):e14118CrossRefGoogle Scholar
  17. Culotta A (2010) Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the ACM workshop on social media analytics, pp 115–122Google Scholar
  18. Danial JT, Wood JJ (2013) Cognitive behavioral therapy for children with autism: review and considerations for future research. J Dev Behav Pediatr 34(9):702–715CrossRefGoogle Scholar
  19. Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230CrossRefGoogle Scholar
  20. Fombonne E (2009) Epidemiology of pervasive developmental disorders. J Pediatr Res 65(6):591–598CrossRefGoogle Scholar
  21. Geier DA, Kern JK, Davis G, King PG, Adams JB, Young JL, Geier MR (2011) A prospective double-blind, randomized clinical trial of levocarnitine to treat autism spectrum disorders. J Med Sci Monit 17(6):PI15–PI23Google Scholar
  22. Gray DE (1993) Perceptions of stigma: the parents of autistic children. Sociol Health Illn 15(1):102–120CrossRefGoogle Scholar
  23. Harshavardhan A, Gandhe A, Lazarus R, Yu SH, Liu B (2011) Predicting flu trends using Twitter data. In: Proceedings of the IEEE conference on computer communications, pp 702–707Google Scholar
  24. Harrington JW, Rosen L, Garnecho A, Patrick PA (2006) Parental perceptions and use of complementary and alternative medicine practices for children with autistic spectrum disorders in private practice. J Dev Behav Pediatr 27(2):S156–S161CrossRefGoogle Scholar
  25. Himelboim I, Han JY (2014) Cancer talk on Twitter: community structure and information sources in breast and prostate cancer social networks. J Health Commun 19(2):210–225CrossRefGoogle Scholar
  26. Jashinsky J, Burton SH, Hanson CL, West J, Giraud-Carrier C, Barnes MD, Argyle T (2014) Tracking suicide risk factors through Twitter in the US. Crisis 35(1):51–59CrossRefGoogle Scholar
  27. Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Proceedings of the annual meeting of the association for, computational linguistics, pp 151–160Google Scholar
  28. Kanner L (1946) Irrelevant and metaphorical language in early infantile autism. Am J Psychiatry 103(2):242–246CrossRefGoogle Scholar
  29. Levy SE, Mandell DS, Schultz RT (2009) Autism. Lancet 374(9701):1627–1638CrossRefGoogle Scholar
  30. Lewis DD, Ringuette M (1994) A comparison of two learning algorithms for text categorization. In: Proceedings of the annual symposium on document analysis and information retrieval, vol 33, pp 81–93Google Scholar
  31. Li J, Cardie C (2013) Early stage influenza detection from Twitter. arXiv preprint: 1309.7340Google Scholar
  32. Miles JH (2011) Autism spectrum disorders—a genetics review. Genet Med 13:278–294MathSciNetCrossRefGoogle Scholar
  33. Mitchell L, Frank MR, Harris KD, Dodds PS, Danforth CM (2013) The geography of happiness: connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PLoS One 8(5):e64417CrossRefGoogle Scholar
  34. Newton AT, Kramer ADI, McIntosh DN (2009) Autism online: a comparison of word usage in bloggers with and without autism spectrum disorders. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 463–466Google Scholar
  35. Owoputi O, O’Connor B, Dyer C, Gimpel K, Schneider N, Smith NA (2013) Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the North American chapter of the association for computational linguistics conference on human language technologies, pp 380–390Google Scholar
  36. Paul MJ, Dredze M (2011) You are what you tweet: analyzing Twitter for public health. In: Proceedings of the international conference on weblogs and social media, pp 265–272Google Scholar
  37. Paul MJ, Dredze M (2012) A model for mining public health topics from Twitter. Health 11:16–16Google Scholar
  38. Perkins J (2010) Python text processing with NLTK 2.0 cookbook. Packt Publishing, BirminghamGoogle Scholar
  39. Prier KW, Smith MS, Giraud-Carrier C, Hanson CL (2011) Identifying health-related topics on Twitter, an exploration of tobacco-related tweets as a test topic. In: Proceedings of the international conference on social computing, behavioral-cultural modeling, and prediction, pp 18–25Google Scholar
  40. Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503–520CrossRefGoogle Scholar
  41. Robillard JM, Johnson TW, Hennessey C, Beattie BL, Illes J (2013) Aging 2.0: health information about dementia on Twitter. PLoS One 8(7):e69861CrossRefGoogle Scholar
  42. Robinson B, Power R, Cameron M (2013) An evidence based earthquake detector using Twitter. In: Proceedings of the workshop on language processing and crisis, information, pp 1–9Google Scholar
  43. Richardson LF (1948) Variation of the frequency of fatal quarrels with magnitude. J Am Stat Assoc 43(244):523–546CrossRefGoogle Scholar
  44. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the international conference on world wide web, pp 851–860Google Scholar
  45. Scanfeld D, Scanfeld V, Larson EL (2010) Dissemination of health information through social networks: Twitter and antibiotics. Am J Infect Control 38(3):182–188CrossRefGoogle Scholar
  46. Trembath D, Balandin S, Rossi C (2005) Cross+cultural practice and autism. J Intellect Dev Disabil 4(30):240–242CrossRefGoogle Scholar
  47. Verma S, Vieweg S, Corvey WJ, Palen L, Martin JH, Palmer H, Schram A, Anderson KM (2011) Natural language processing to the rescue? Extracting “situational awareness” tweets during mass emergency. In: Proceedings of the international conference on weblogs and social media, pp 385–392Google Scholar
  48. Wakefield AJ, Murch SH, Anthony A, Linnell J, Casson DM, Malik M, Berelowitz M, Dhillon AP, Thomson MA, Harvey P (1998) RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. Lancet 351(9103):637–641CrossRefGoogle Scholar
  49. Warren Z, McPheeters ML, Sathe N, Foss-Feig JH, Glasser A, Veenstra-VanderWeele J (2011) A systematic review of early intensive intervention for autism spectrum disorders. Pediatrics 127(5):e1303–e1311CrossRefGoogle Scholar
  50. Yu H-F, Huang F-L, Lin C-J (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1–2):41–75zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Adham Beykikhoshk
    • 1
  • Ognjen Arandjelović
    • 1
  • Dinh Phung
    • 1
  • Svetha Venkatesh
    • 1
  • Terry Caelli
    • 1
  1. 1.Centre for Pattern Recognition and Data Analytics (PRaDA), School of Information TechnologyDeaking UniversityGeelongAustralia

Personalised recommendations