Skip to main content

Planning for Text Analytics

  • Chapter
  • First Online:
Practical Text Analytics

Abstract

This chapter encourages readers to consider the reason for their analysis to chart the correct path for conducing it. This chapter outlines the process for planning the text analytics process. The chapter starts by asking the analyst to consider the objective, data availability, cost, and outcome desired. Analysis paths are then shown as possible ways to achieve the goal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In Microsoft Excel, random numbers can be generated using the function = RANDBETWEEN. The function requires minimum and maximum values as inputs. In the example the function would be = RANDBETWEEN(1,20), and the function would need to be copied to four cells to produce four random numbers between 1 and 20.

References

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.

    Google Scholar 

  • Boudah, D. J. (2011). Identifying a research problem and question and searching relevant literature. In Conducting educational research: Guide to completing a major project. Thousand Oaks: SAGE Publications.

    Chapter  Google Scholar 

  • Cukier, K. (2010). Data, data everywhere: A special report on managing information. Economist Newspaper.

    Google Scholar 

  • Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5): 1–54. http://www.jstatsoft.org/v25/i05/.

  • Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.

    Google Scholar 

  • Granello, D. H., & Wheaton, J. E. (2004). Online data collection: Strategies for research. Journal of Counseling & Development, 82(4), 387–393.

    Article  Google Scholar 

  • Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.

    Article  Google Scholar 

  • Kabanoff, B. (1996). Computers can read as well as count: How computer-aided text analysis can benefit organisational research. Trends in organizational behavior, 3, 1–22.

    Google Scholar 

  • Krippendorff, K. (2004). Reliability in content analysis: Some common misconceptions and recommendations. Human communication research, 30(3), 411–433.

    Google Scholar 

  • Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.

    Article  Google Scholar 

  • Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Thousand Oaks: Sage.

    Google Scholar 

  • Krippendorff, K., & Bock, M. A. (2009). The content analysis reader. Thousand Oaks: Sage.

    Google Scholar 

  • Kroenke, D. M., & Auer, D. J. (2010). Database processing (Vol. 6). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Lin, F. R., Hsieh, L. S., & Chuang, F. T. (2009). Discovering genres of online discussion threads via text mining. Computers & Education, 52(2), 481–495.

    Article  Google Scholar 

  • Marshall, M. N. (1996). Sampling for qualitative research. Family Practice, 13(6), 522–526.

    Article  Google Scholar 

  • Neuendorf, K. A. (2016). The content analysis guidebook. Sage.

    Google Scholar 

  • Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218.

    Article  Google Scholar 

  • Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.

    Google Scholar 

  • Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.

    Google Scholar 

  • Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.

    Google Scholar 

  • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.

    Article  Google Scholar 

  • Shapiro, G., & Markoff, J. (1997). A Matter of Definition. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Silge, J., & Robinson, D. (2016). tidytext: Text Mining and Analysis Using Tidy Data Principles in R. Journal of Statistical Software, 1(3).

    Article  Google Scholar 

  • Stepchenkova, S. (2012). Content analysis. In L. Dwyer et al. (ed.), Handbook of research methods in tourism: Quantitative and qualitative approaches (pp. 443–458). Edward Elger Publishing.

    Google Scholar 

  • Stone, P.J. (1997). Thematic text analysis. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts (pp. 35-54). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Ur-Rahman, N., & Harding, J. A. (2012). Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications, 39(5), 4729-4739.

    Article  Google Scholar 

  • Webb, L. M., & Wang, Y. (2014). Techniques for sampling online text-based data sets. In Big data management, technologies, and applications (pp. 95–114). Hershey: IGI Global.

    Chapter  Google Scholar 

  • Wiedemann, G. (2013). Opening up to big data: Computer-assisted analysis of textual data in social sciences. Historical Social Research/Historische Sozialforschung, 38(4), 332–357.

    Google Scholar 

  • Yang, Y. (1996). Sampling strategies and learning efficiency in text categorization. In M. Hearst & H. Hirsh (Eds.), AAAI spring symposium on machine learning in information access (pp. 88–95). Menlo Park: AAAI Press.

    Google Scholar 

  • Yu, C. H., Jannasch-Pennell, A., & DiGangi, S. (2011). Compatibility between text mining and qualitative research in the perspectives of grounded theory, content analysis, and reliability. The Qualitative Report, 16(3), 730.

    Google Scholar 

  • Zanasi, A. (2005). Text mining tools. In Text Mining and its Applications to Intelligence, CRM and Knowledge Management. WIT Press, Southampton Boston, 315–327.

    Chapter  Google Scholar 

  • Zhai, C., & Massung, S. (2016). Text data management and analysis: A practical introduction to information retrieval and text mining. San Rafael: Morgan & Claypool.

    Google Scholar 

Further Reading

  • For more thorough coverage of the research problem and question, see Boudah (2011). Database management, processing, and querying are beyond the scope of this book. For more comprehensive coverage of these topics, see Kroenke and Auer (2010). Web scraping is very important, but also beyond the scope of this book. For more detailed information and instructions, see Munzert et al. (2014) for web scraping using R or Mitchell (2015) for web scraping using Python.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Anandarajan, M., Hill, C., Nolan, T. (2019). Planning for Text Analytics. In: Practical Text Analytics. Advances in Analytics and Data Science, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-319-95663-3_3

Download citation

Publish with us

Policies and ethics