Quality & Quantity

, Volume 48, Issue 4, pp 2277–2294

Extending monitoring methods to textual data: a research agenda

  • Triss Ashton
  • Nicholas Evangelopoulos
  • Victor Prybutok
Article

Abstract

Textual data has become increasingly common in business analytic data sets. While concept-based text mining offers a method of extracting meaningful information from text data, methods for monitoring of customer perceptions of business processes and products that are discussed in customer-generated documents are not immediately available. We explore the results of two text-mining algorithms and review issues observed in the data that affect uploading the results onto a newly proposed methodological monitoring platform analogous to statistical process control charts. Finally, we discuss several topics for future research in text mining.

Keywords

Latent semantic analysis Latent Dirichlet allocation  Process monitoring Control charts 

References

  1. Allen, H., Gearan, P., Rexer, K.: In: 5th Annual Data Mining Survey—2011 Survey Summary Report. http://www.rexeranalytics.com/Data-Miner-Survey-Results-2011.html (2011). Accessed 31 July 2012
  2. Ashton, T., Evangelopoulos, N.: Control charts for customer comments: a case study and a research agenda. In: Proceeding of the Southwest Decision Sciences Institute, pp. 661–669 (2012)Google Scholar
  3. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)Google Scholar
  4. Bradford, R.: An empirical study of required dimensionality for large scale latent semantic indexing applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 153–162. ACM, New York (2008)Google Scholar
  5. Browne, M.: An overview of analytic rotation in exploratory factor analysis. Multivar. Behav. Res. 36(1), 111–150 (2001)CrossRefGoogle Scholar
  6. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  7. Ding, C., Li, T., Peng, W.: On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Stat. Data Anal. 52, 3913–3927 (2008)CrossRefGoogle Scholar
  8. Evangelopoulos, N., Zhang, X., Prybutok, V.: Latent semantic analysis: five methodological recommendations. Eur. J. Inf. Syst. 21, 70–86 (2012)CrossRefGoogle Scholar
  9. Franzosi, R., Doyle, S., McClelland, L., Rankin, C., Vicari, S.: Quantitative narrative analysis software options compared: PC-ACE and CAQDAS (ATLAS.ti, MAXqda, and NVivo). Qual. Quant. (2012). doi:10.1007/s11135-012-9714-3
  10. Gaussier, E., Goutte, C.: Relationship between PLSA and NMF and implications. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 601–602 (2005)Google Scholar
  11. Grun, B., Hornik, K.: TopicModels: an R package for fitting topic models. J. Stat. Softw. 40(13), 1–30 (2011)Google Scholar
  12. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the Twenty-Second Annual International SIGIR Conference, pp. 50–57. ACM, New York (1999)Google Scholar
  13. IBM: Mastering new challenges in text analytics. ftp://public.dhe.ibm.com/common/ssi/rep_wh/n/IMW14301USEN/IMW14301USEN.PDF (2010). Accessed 21 Dec 2012
  14. Intel IT Center: Peer research—big data analytics. http://www.intel.com/content/www/us/en/big-data/data-insights-peer-research-report.html?wapkw=big+data (2012). Accessed 24 Aug 2012
  15. Kintsch, W., Mangalath, P.: The construct of meaning. Top. Cogn. Sci. 3, 346–370 (2011)CrossRefGoogle Scholar
  16. Laundauer, T., Dumais, S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)CrossRefGoogle Scholar
  17. Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  18. Leech, N., Onwuegbuzie, A.: Qualitative data analysis: a compendium of techniques and a framework for selection for school psychology research and beyond. Sch. Psychol. Q. 23(4), 587–604 (2008)CrossRefGoogle Scholar
  19. Lifchitz, A., Jhean-Larose, S., Denhière, G.: Effect of tuned parameters on a LSA multiple choice questions answering model. Behav. Res. Methods 41(4), 1201–1209 (2009)CrossRefGoogle Scholar
  20. Lo, S.: Web service quality control based on text mining using support vector machine. Expert Syst. Appl. 34, 603–610 (2008)CrossRefGoogle Scholar
  21. Merlo, A., Goodman, A., McClenaghan, B., Fritz, S.: Participants’ perspectives on the feasibility of a novel, intensive, task-specific intervention for individuals with chronic stroke: A qualitative analysis. Phys. Ther. 93(2), 147–157 (2013)Google Scholar
  22. Nakov, P., Popova, A., Mateev, P.: Weight functions impact on LSA performance. In: Proceedings of the EuroConference Recent Advances in Natural Language Processing, pp. 187–193 (2001)Google Scholar
  23. Nakov, P., Valchanova, E., Angelova, G.: Towards deeper understanding of the LSA performance. In: Proceeding Recent Advances in Natural Language Processing, pp. 297–306 (2003)Google Scholar
  24. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)CrossRefGoogle Scholar
  25. Patton, M.: Enhancing the quality and credibility of qualitative analysis. Health Serv. Res. 34(5 Part II), 1189–1208 (1999)Google Scholar
  26. Poortman, C., Schildkamp, K.: Alternative quality standards in qualitative research? Qual. Quant. 46, 1726–1751 (2012)CrossRefGoogle Scholar
  27. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  28. Porter, M.: Snowball: A Language for Stemming Algorithms. http://snowball.tartarus.org/texts/introduction.html. (2001). Accessed 6 Aug 2012
  29. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0. http://www.R-project.org/ (2012)
  30. Riordan, B., Jones, M.: Redundancy in perceptual and linguistic experience: comparing feature-based and distributional models of semantic representation. Top. Cogn. Sci. 3, 303–345 (2011)CrossRefGoogle Scholar
  31. Russom, P.: TDWI Best Practices Report: Big Data Analytics. The Data Warehouse Institute. http://tdwi.org/research/2011/12/sas_best-practices-report-q4-big-data-analytics/asset.aspx?tc=assetpg (2011). Accessed 27 Oct 2011
  32. SAS: Getting Started with SAS Text Miner 12.1. http://support.sas.com/documentation/onlinedoc/txtminer/12.1/tmgs.pdf (2012). Accessed 21 Dec 2012
  33. Swanborn, P.: A common base for quality control criteria in quantitative and qualitative research. Qual. Quant. 30, 19–35 (1996)CrossRefGoogle Scholar
  34. Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T., Lapis, G.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill, New York (2012)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Triss Ashton
    • 1
  • Nicholas Evangelopoulos
    • 2
  • Victor Prybutok
    • 2
  1. 1.College of BusinessThe University of Texas-Pan AmericanEdinburgUSA
  2. 2.College of BusinessUniversity of North TexasDentonUSA

Personalised recommendations