Abstract
This chapter encourages readers to consider the reason for their analysis to chart the correct path for conducing it. This chapter outlines the process for planning the text analytics process. The chapter starts by asking the analyst to consider the objective, data availability, cost, and outcome desired. Analysis paths are then shown as possible ways to achieve the goal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In Microsoft Excel, random numbers can be generated using the function = RANDBETWEEN. The function requires minimum and maximum values as inputs. In the example the function would be = RANDBETWEEN(1,20), and the function would need to be copied to four cells to produce four random numbers between 1 and 20.
References
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
Boudah, D. J. (2011). Identifying a research problem and question and searching relevant literature. In Conducting educational research: Guide to completing a major project. Thousand Oaks: SAGE Publications.
Cukier, K. (2010). Data, data everywhere: A special report on managing information. Economist Newspaper.
Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5): 1–54. http://www.jstatsoft.org/v25/i05/.
Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.
Granello, D. H., & Wheaton, J. E. (2004). Online data collection: Strategies for research. Journal of Counseling & Development, 82(4), 387–393.
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.
Kabanoff, B. (1996). Computers can read as well as count: How computer-aided text analysis can benefit organisational research. Trends in organizational behavior, 3, 1–22.
Krippendorff, K. (2004). Reliability in content analysis: Some common misconceptions and recommendations. Human communication research, 30(3), 411–433.
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.
Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Thousand Oaks: Sage.
Krippendorff, K., & Bock, M. A. (2009). The content analysis reader. Thousand Oaks: Sage.
Kroenke, D. M., & Auer, D. J. (2010). Database processing (Vol. 6). Upper Saddle River: Prentice Hall.
Lin, F. R., Hsieh, L. S., & Chuang, F. T. (2009). Discovering genres of online discussion threads via text mining. Computers & Education, 52(2), 481–495.
Marshall, M. N. (1996). Sampling for qualitative research. Family Practice, 13(6), 522–526.
Neuendorf, K. A. (2016). The content analysis guidebook. Sage.
Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218.
Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.
Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.
Scheaffer, R. L., Mendenhall, W., III, Ott, R. L., & Gerow, K. G. (2011). Elementary survey sampling. Boston: Cengage Learning.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.
Shapiro, G., & Markoff, J. (1997). A Matter of Definition. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, Mahwah, NJ: Lawrence Erlbaum Associates.
Silge, J., & Robinson, D. (2016). tidytext: Text Mining and Analysis Using Tidy Data Principles in R. Journal of Statistical Software, 1(3).
Stepchenkova, S. (2012). Content analysis. In L. Dwyer et al. (ed.), Handbook of research methods in tourism: Quantitative and qualitative approaches (pp. 443–458). Edward Elger Publishing.
Stone, P.J. (1997). Thematic text analysis. In C.W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts (pp. 35-54). Mahwah, NJ: Lawrence Erlbaum Associates.
Ur-Rahman, N., & Harding, J. A. (2012). Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications, 39(5), 4729-4739.
Webb, L. M., & Wang, Y. (2014). Techniques for sampling online text-based data sets. In Big data management, technologies, and applications (pp. 95–114). Hershey: IGI Global.
Wiedemann, G. (2013). Opening up to big data: Computer-assisted analysis of textual data in social sciences. Historical Social Research/Historische Sozialforschung, 38(4), 332–357.
Yang, Y. (1996). Sampling strategies and learning efficiency in text categorization. In M. Hearst & H. Hirsh (Eds.), AAAI spring symposium on machine learning in information access (pp. 88–95). Menlo Park: AAAI Press.
Yu, C. H., Jannasch-Pennell, A., & DiGangi, S. (2011). Compatibility between text mining and qualitative research in the perspectives of grounded theory, content analysis, and reliability. The Qualitative Report, 16(3), 730.
Zanasi, A. (2005). Text mining tools. In Text Mining and its Applications to Intelligence, CRM and Knowledge Management. WIT Press, Southampton Boston, 315–327.
Zhai, C., & Massung, S. (2016). Text data management and analysis: A practical introduction to information retrieval and text mining. San Rafael: Morgan & Claypool.
Further Reading
For more thorough coverage of the research problem and question, see Boudah (2011). Database management, processing, and querying are beyond the scope of this book. For more comprehensive coverage of these topics, see Kroenke and Auer (2010). Web scraping is very important, but also beyond the scope of this book. For more detailed information and instructions, see Munzert et al. (2014) for web scraping using R or Mitchell (2015) for web scraping using Python.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Anandarajan, M., Hill, C., Nolan, T. (2019). Planning for Text Analytics. In: Practical Text Analytics. Advances in Analytics and Data Science, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-319-95663-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-95663-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95662-6
Online ISBN: 978-3-319-95663-3
eBook Packages: Business and ManagementBusiness and Management (R0)