Advertisement

Planning for Text Analytics

  • Murugan Anandarajan
  • Chelsey Hill
  • Thomas Nolan
Chapter
Part of the Advances in Analytics and Data Science book series (AADS, volume 2)

Abstract

This chapter encourages readers to consider the reason for their analysis to chart the correct path for conducing it. This chapter outlines the process for planning the text analytics process. The chapter starts by asking the analyst to consider the objective, data availability, cost, and outcome desired. Analysis paths are then shown as possible ways to achieve the goal.

Keywords

Text analytics Text mining Planning Sampling 

References

  1. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.Google Scholar
  2. Boudah, D. J. (2011). Identifying a research problem and question and searching relevant literature. In Conducting educational research: Guide to completing a major project. Thousand Oaks: SAGE Publications.CrossRefGoogle Scholar
  3. Cukier, K. (2010). Data, data everywhere: A special report on managing information. Economist Newspaper.Google Scholar
  4. Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5): 1–54. http://www.jstatsoft.org/v25/i05/.
  5. Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.Google Scholar
  6. Granello, D. H., & Wheaton, J. E. (2004). Online data collection: Strategies for research. Journal of Counseling & Development, 82(4), 387–393.CrossRefGoogle Scholar
  7. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.CrossRefGoogle Scholar
  8. Kabanoff, B. (1996). Computers can read as well as count: How computer-aided text analysis can benefit organisational research. Trends in organizational behavior, 3, 1–22.Google Scholar
  9. Krippendorff, K. (2004). Reliability in content analysis: Some common misconceptions and recommendations. Human communication research, 30(3), 411–433.Google Scholar
  10. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.CrossRefGoogle Scholar
  11. Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Thousand Oaks: Sage.Google Scholar
  12. Krippendorff, K., & Bock, M. A. (2009). The content analysis reader. Thousand Oaks: Sage.Google Scholar
  13. Kroenke, D. M., & Auer, D. J. (2010). Database processing (Vol. 6). Upper Saddle River: Prentice Hall.Google Scholar
  14. Lin, F. R., Hsieh, L. S., & Chuang, F. T. (2009). Discovering genres of online discussion threads via text mining. Computers & Education, 52(2), 481–495.CrossRefGoogle Scholar
  15. Marshall, M. N. (1996). Sampling for qualitative research. Family Practice, 13(6), 522–526.CrossRefGoogle Scholar
  16. Neuendorf, K. A. (2016). The content analysis guidebook. Sage.Google Scholar
  17. Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218.CrossRefGoogle Scholar
  18. Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.Google Scholar
  19. Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.Google Scholar
  20. Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.Google Scholar
  21. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.CrossRefGoogle Scholar
  22. Shapiro, G., & Markoff, J. (1997). A Matter of Definition. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  23. Silge, J., & Robinson, D. (2016). tidytext: Text Mining and Analysis Using Tidy Data Principles in R. Journal of Statistical Software, 1(3).CrossRefGoogle Scholar
  24. Stepchenkova, S. (2012). Content analysis. In L. Dwyer et al. (ed.), Handbook of research methods in tourism: Quantitative and qualitative approaches (pp. 443–458). Edward Elger Publishing.Google Scholar
  25. Stone, P.J. (1997). Thematic text analysis. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts (pp. 35-54). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  26. Ur-Rahman, N., & Harding, J. A. (2012). Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications, 39(5), 4729-4739.CrossRefGoogle Scholar
  27. Webb, L. M., & Wang, Y. (2014). Techniques for sampling online text-based data sets. In Big data management, technologies, and applications (pp. 95–114). Hershey: IGI Global.CrossRefGoogle Scholar
  28. Wiedemann, G. (2013). Opening up to big data: Computer-assisted analysis of textual data in social sciences. Historical Social Research/Historische Sozialforschung, 38(4), 332–357.Google Scholar
  29. Yang, Y. (1996). Sampling strategies and learning efficiency in text categorization. In M. Hearst & H. Hirsh (Eds.), AAAI spring symposium on machine learning in information access (pp. 88–95). Menlo Park: AAAI Press.Google Scholar
  30. Yu, C. H., Jannasch-Pennell, A., & DiGangi, S. (2011). Compatibility between text mining and qualitative research in the perspectives of grounded theory, content analysis, and reliability. The Qualitative Report, 16(3), 730.Google Scholar
  31. Zanasi, A. (2005). Text mining tools. In Text Mining and its Applications to Intelligence, CRM and Knowledge Management. WIT Press, Southampton Boston, 315–327.CrossRefGoogle Scholar
  32. Zhai, C., & Massung, S. (2016). Text data management and analysis: A practical introduction to information retrieval and text mining. San Rafael: Morgan & Claypool.Google Scholar

Further Reading

  1. For more thorough coverage of the research problem and question, see Boudah (2011). Database management, processing, and querying are beyond the scope of this book. For more comprehensive coverage of these topics, see Kroenke and Auer (2010). Web scraping is very important, but also beyond the scope of this book. For more detailed information and instructions, see Munzert et al. (2014) for web scraping using R or Mitchell (2015) for web scraping using Python.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Murugan Anandarajan
    • 1
  • Chelsey Hill
    • 2
  • Thomas Nolan
    • 3
  1. 1.LeBow College of BusinessDrexel UniversityPhiladelphiaUSA
  2. 2.Feliciano School of BusinessMontclair State UniversityMontclairUSA
  3. 3.Mercury Data ScienceHoustonUSA

Personalised recommendations