Complexity of Textual Data in Entrepreneurship and Innovation Research

  • Beth-Anne Schuelke-LeechEmail author
  • Betsy L. Barry
Part of the FGF Studies in Small Business and Entrepreneurship book series (FGFS)


Innovation and Entrepreneurship are complex activities. They are also primarily language and relationship based. That is, it is largely through verbal communications (speech and text) that ideas are developed and business transacted. New methods are arising which are changing the way that we understand and can investigate innovation and entrepreneurship. Big Data Analytics allow researchers to uncover relationships and meaning in text documents, using a mix of quantitative and qualitative methods. This chapter shows that the complexity issues in innovation and entrepreneurship research with text comes from three sources. The first form of complexity is technical complexity. The second source of complexity is from language itself. The third source of complexity is in the concept itself. Each of these is discussed in detail. Complexity can either be addressed by simplifying the data or finding a mechanism for dealing with the complexity. A method of text data analytics using Corpus and Computational Linguistics deals with the complexity without eliminating data, allowing for a more nuanced investigation of innovation and entrepreneurship. The methodology is demonstrated by investigating how technological innovation and entrepreneurship are discussed in the United States Congress, using a corpus from 1981 to 2014.


Big data Linguistics Text analytics Unstructured data 


  1. Bailey, A., & Schonhardt-Bailey, C. (2008). Does deliberation matter in FOMC monetary policymaking? The Volcker Revolution of 1979. Political Analysis, 16(4), 404–427.CrossRefGoogle Scholar
  2. Barry, B. L. (2008). Transcription as speech-to-text data transformation (Doctoral dissertation). Retrieved from
  3. Barry, B. L., Smith, S., Schuelke-Leech, B.-A., & Darwin, C. (2015). From big data to better data: Issues in text-based analytics. I/S: A Journal of Law and Policy for the Information Society, 11, 45–57.Google Scholar
  4. Beise, M., & Stahl, H. (1999). Public research and industrial innovations in Germany. Research Policy, 28(4), 397–422.CrossRefGoogle Scholar
  5. Berry, M. W., & Browne, M. (2005). Understanding search engines: Mathematical modeling and text retrieval. Philadelphia: SIAM.CrossRefGoogle Scholar
  6. Calomiris, C. W., & Haber, S. H. (2014). Fragile by design: The political origins of banking crises and scarce credit. Princeton, NJ: Princeton University Press.Google Scholar
  7. Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS Quarterly, 36(4), 1165–1188.Google Scholar
  8. Congress. (1999). Privacy in the digital age: Discussion of issues surrounding the internet (Hearing before the Committee on the Judiciary, United States Senate, One Hundred Sixth Congress, First Session). Retrieved from
  9. Congress. (2013a). Improving technology transfer at universities, research institutes, and national laboratories (Hearing before the Subcommittee on Research and Technology, Committee on Science, Space, and Technology, House of Representatives, One Hundred Thirteenth Congress, First Session). Retrieved from
  10. Congress. (2013b). Departments of Labor, Health and Human Services, Education, and Related Agencies appropriations for 2014 (Hearing before the Subcommittee on Appropriations, House of Representatives, One Hundred Thirteenth Congress, First Session). Retrieved from
  11. Congress. (2013c). VA mental health care: Ensuring timely access to high-quality care (Hearing before the Committee on Veteran’s Affairs, United States Senate, One Hundred Thirteenth Congress, First Session). Retrieved from
  12. Darwin, C. M. (2008). Construction and analysis of the University of Georgia Tobacco Documents Corpus (Doctoral dissertation). Retrieved from
  13. Decker, P. T. (2014). Presidential address: False choices, policy framing, and the promise of “Big Data”. Journal of Policy Analysis and Management, 33(2), 252–262.CrossRefGoogle Scholar
  14. Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. In J. R. Firth (Ed.), Studies in linguistic analysis, special volume of the philological society (pp. 1–32). Oxford: Basil Blackwell.Google Scholar
  15. Grimmer, J. (2010). A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis, 18(1), 1–35.CrossRefGoogle Scholar
  16. Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12.CrossRefGoogle Scholar
  17. Kuechler, W. L. (2007). Business applications of unstructured text. Communications of the ACM, 50(10), 86–93.CrossRefGoogle Scholar
  18. Labov, W. (1972). Introduction. In W. Labov (Ed.), Sociolinguistic patterns (pp. xiii–xviii). Oxford: Blackwell.Google Scholar
  19. McGregor, E. B., Jr. (2004). Primed for public administration theory [Review of the book The public administration theory primer by H. G. Frederickson & K. B. Smith]. Journal of Public Administration Research and Theory: J-PART, 14(2), 253–257.Google Scholar
  20. Meyer, R. J. (2006). Why we under-prepare for hazards. In R. J. Daniels, D. F. Kettl, & H. Kunreuther (Eds.), On risk and disaster: Lessons from Hurricane Katrina (pp. 153–174). Philadelphia: University of Pennsylvania Press.Google Scholar
  21. Pirog, M. A. (2014). Data will drive innovation in public policy and management research in the next decade. Journal of Policy Analysis and Management, 33(2), 537–543.CrossRefGoogle Scholar
  22. Stubbs, M. (1996). Text and corpus analysis: Computer-assisted study of language and culture. Oxford: Blackwell.Google Scholar
  23. Stubbs, M. (2001). Computer-assisted text and corpus analysis: Lexical cohesion and communicative competence. In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (pp. 304–320). Malden, MA: Blackwell.Google Scholar
  24. Teece, D. J. (1992). Competition, cooperation, and innovation: Organizational arrangements for regimes of rapid technological progress. Journal of Economic Behavior & Organization, 18(1), 1–25.CrossRefGoogle Scholar
  25. Yu, B., Kaufmann, S., & Diermeier, D. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33–48.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.John Glenn College of Public AffairsThe Ohio State UniversityColumbusUSA
  2. 2.BDataSmartColumbusUSA

Personalised recommendations