Skip to main content
Log in

On developing indicators with text analytics: exploring concept vectors applied to English and Chinese texts

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

This paper investigates how high-quality, vocabulary-based classifiers, useful for competitive intelligence, can be found for relatively small corpora of publicly available documents. Two corpora of recent annual reports are examined and compared, one in English and one in Chinese. The paper tests whether vocabularies can predict whether firms are relatively innovative or not, examining vocabularies of both content words and function words. We find that indeed the tested vocabularies do produce effective indicators or classifiers and, surprisingly, that function words are especially effective. The paper also provides extensive conceptual and theoretical background to frame the investigation in the context of an EMCUT problematic, that of mapping entities to classification schemes using information derived from text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We are aware of the distinction that is made in parts of the relevant literature between assignment and matching, where matching is assignment when the entities on all sides are players in a game. Such, for example, is the case in the two-sided matching problem (Gale and Shapley 1962). We note, however, that although we might substitute assignment for matching in the name of our problem, the resulting acronym is less felicitous, there is little risk of confusion in keeping the present name, and indeed there will often be an element of strategic interaction in the composition of the relevant texts.

References

  • Andrew JP, Manget J, Michael D, Taylor A, Zablit H (2010) Innovation 2010: a return to prominence—and the emergence of a new world order. Boston Consulting Group, Boston, MA

  • Bird S, Klein E, Loper E (2009) Natural language processing with python. O’Reilly, Sebastopol, CA

    Google Scholar 

  • Blair DC, Kimbrough SO (2002) Exemplary documents: a foundation for information retrieval design. Inf Process Manage 38(3):363–379

    Article  Google Scholar 

  • Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299

    Article  Google Scholar 

  • Bowman EH (1973) Corporate social responsibility and the investor. J Contemp Bus 2:21–43

    Google Scholar 

  • Bowman EH (1984) No access content analysis of annual reports for corporate strategy and risk. Interfaces 14(1):61–71

    Article  Google Scholar 

  • Breiman L, Friedman R, Olshen R, Stone C (1984) Classification and regression trees. CRC Press, Boca Raton, FL

    Google Scholar 

  • Camiciottoli BC (2010) Discourse connectives in genres of financial disclosure: earnings presentations versus earnings releases. J Pragmat 42(3):650–663

    Article  Google Scholar 

  • Chang LC, Hsu CH, Chang YY (2012) The construction of taiwan’s financial early warning system: the text-mining technique-based analysis. Taiwan Bank Q 63(1):182–217. In Chinese. http://www.bot.com.tw/Publications/Quarterly/Documents/63_1/63_1_7.pdf

  • Chen GT, Kimbrough S, Lee T (2004) A note on automated support for product application discovery. In: Dutta A, Goes P (eds) Proceedings of the fourteenth annual workshop on information technologies and systems (WITS2004), Washington, DC, pp 128–133

  • Chen R, Sharman R, Rao H, Upadhyaya S (2007) Design principles for critical incident response systems. Inf Syst E-Bus Manage 5:201–227. doi:10.1007/s10257-007-0046-0

    Article  Google Scholar 

  • Chou CH, Sinha AP, Zhao H (2008) A text mining approach to Internet abuse detection. Inf Syst e-Bus Manage 6(4):419–439

    Article  Google Scholar 

  • D’Aveni RA, MacMillan IC (1990) Crisis and content of managerial communications: a study of the focus of attention of top managers in surviving and failing firms. Adm Sci Q 35:634–657

    Article  Google Scholar 

  • den Hertog P, van der Aa W, de Jong MW (2010) Capabilities for managing service innovation: towards a conceptual framework. J Serv Manage 21(4):490–514

    Article  Google Scholar 

  • Forsman H, Temel S (2011) Innovation and business performance in small enterprises: an enterprise-level analysis. Int J Innov Manage 15(3):641–665

    Article  Google Scholar 

  • Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Mon 69(1):9–15

    Article  Google Scholar 

  • Gebauer J, Tang Y, Baimai C (2008) User requirements of mobile technology: results from a content analysis of user reviews. Inf Syst E-Bus Manage 6(4):361–384

    Article  Google Scholar 

  • Gottschalk LA (1995) Content analysis of verbal behavior: new findings and clinical applications. Lawrence Erlbaum Associates, Hillsdale, NJ

    Google Scholar 

  • Gottschalk LA, Gleser GC (1969) The measurement of psychological states through the content analysis of verbal behavior. University of California Press, Berkeley, CA

    Google Scholar 

  • Gottschalk LA, Winget CN, Gleser GC (1969) Manual of instructions for using the Gottschalk-Gleser content analysis scales: anxiety, hostility, and social alienation—personal disorganization. University of California Press, Berkeley, CA

    Google Scholar 

  • He ZL, Wong PK (2004) Exploration versus exploitation: an empirical test of the ambidexterity hypothesis. Organ Sci 15(4):481–494

    Article  Google Scholar 

  • Kabanoff B, Keegan J (2007) Studying strategic cognition by content analysis of annual reports: a validation involving firm innovation. In: Chapman R (eds) Proceedings of managing our intellectual and social capital: 21st ANZAM 2007 Conference, Sydney, Australia, pp 1–14

  • Kimbrough MR, Kimbrough SO, Murphy P (2011) On using text analytics for event studies. In: Proceedings of the 2011 international conference on artificial intelligence and law (ICAIL 2011)

  • Kimbrough SO, Lee TY, Oktem U (2012) On deriving indicators from texts. In: Dolk D, Granat J (eds) Modeling for decision support in network-based services, Lecture Notes in Business Information Processing, vol 42. Springer, Berlin, pp 196–225

  • Kimbrough SO, MacMillan I, Ranieri J (2007) Process and system for matching products and markets. United States Patent 7,257,568 http://www.uspto.gov

  • Kimbrough SO, MacMillan I, Ranieri J, Thompson JD (2011) Categorized document bases. United States Patent 7,917,519 http://www.uspto.gov

  • Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications, Thousand Oaks, CA

    Google Scholar 

  • Li H, Cai Z, Graesser AC, Duan Y (2012) A comparative study on English and Chinese word uses with LIWC. In: Proceedings of the twenty-fifth international Florida artificial intelligence research society conference, Association for the Advancement of Artificial Intelligence, pp 238–243

  • Loewenstein J, Ocasio W, Jones C (2012) Vocabularies and vocabulary structure: a new approach linking categories, practices, and institutions. The Academy of Management Annals Available online 13 March 2012.doi:10.1080/19416520.2012.660763

  • Lukas BA, Ferrell O (2000) The effect of market orientation on product innovation. J Acad Mark Sci 28(2):239–247

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  • March JG (1991) Exploration and exploitation in organizational learning. Organ Sci 2:71–87 http://proxy.library.upenn.edu:2054/login.aspx?direct=true&db=epref&AN=OS.B.GA.MARCH.EEOL&site=ehost-live

    Google Scholar 

  • Mitchell T (1997) Machine learning. Mcgraw-Hill, New York, NY

    Google Scholar 

  • Morris R (1994) Computerized content analysis in management research: a demonstration of advantages and limitations. J Manage 20(4):903–931

    Google Scholar 

  • Muller E, Zenker A (2001) Business services as actors of knowledge transformation: the role of kibs in regional and national innovation systems. Res Policy 30(9):1501–1516

    Article  Google Scholar 

  • Neuendorf KA (2002) The content analysis guidebook. Sage Publications, Thousand Oaks, CA

    Google Scholar 

  • Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Pers Soc Psychol Bull 29(5):665–675

    Article  Google Scholar 

  • OECD-EUROSTAT (1997) Proposed guidelines for collecting and interpreting technological innovation data. Oslo Manual, 2nd edn. OECD-EUROSTAT, Paris

  • Oliveira MD, Murphy P (2009) The leader as the face of a crisis: Philip Morris’ CEO’s speeches during the 1990s. J Public Relat Res 21(4):361–80

    Article  Google Scholar 

  • Pennebaker JW (2011) The secret life of pronouns: what our words say about us. Bloomsbury Press, New York, NY

    Google Scholar 

  • Prester J, Bozac MG (2012) Are innovative organizational concepts enough for fostering innovation? Int J Innov Manage 16(1):1250005

    Google Scholar 

  • Raisch S, Birkinshaw J (2008) Organizational ambidexterity: antecedents, outcomes, and moderators. J Manage 34(3):375–409

    Google Scholar 

  • Shadish WR, Cook TD, Campbell DT (2001) Experimental and quasi-experimental designs for generalized causal inference, 2nd edn. Wadsworth Publishing, New York, NY

    Google Scholar 

  • Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54

    Article  Google Scholar 

  • Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188

    Google Scholar 

  • Uotila J, Maula M, Keil T, Zahra SA (2009) Exploration, exploitation, and financial performance: analysis of S&P 500 corporations. Strateg Manage J 30(2):221–231

    Article  Google Scholar 

  • Vagnani G (2012) Exploration and long-run organizational performance: the moderating role of technological interdependence. J Manage (forthcoming). Published online 6 December 2012 at http://jom.sagepub.com/content/early/2012/12/04/0149206312466146.

  • Walter F, Battiston S, Yildirim M, Schweitzer F (2012) Moving recommender systems from on-line commerce to retail stores. Inf Syst E-Bus Manage 10:367–393. doi:10.1007/s10257-011-0170-8

  • Wang HY, Liao C, Kao CH (2012) A credit assessment mechanism for wireless telecommunication debt collection: an empirical study. Inf Syst E-Bus Manage 1–19. doi:10.1007/s10257-012-0192-x

  • Weber RP (1990) Basic content analysis, 2nd edn. Sage Publications, Newbury Park, CA

    Google Scholar 

  • Wei CP, Chen YM, Yang CS, Yang C (2010) Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews. Inf Syst E-Bus Manage 8:149–167. doi:10.1007/s10257-009-0113-9

  • Wei CP, Lin YT, Yang CC (2011) Cross-lingual text categorization: conquering language boundaries in globalized environments. Inf Process Manage 47(5):786–804

    Article  Google Scholar 

  • Yang HC, Hsiao HW, Lee CH (2011) Multilingual document mining and navigation using self-organizing maps. Inf Process Manage 47(5):647–666

    Article  Google Scholar 

  • Yen CC, Chi DJ, Lin SJ (2008) A study for detecting enterprise financial statement fraud. Asian J Manag Humanit Sci 3(1–4):15–30. In Chinese. http://www.asia.edu.tw/ajmhs/vol%203/02.pdf

    Google Scholar 

  • Yen CC, Lo LK, Chi DJ, Huang YJ (2009) The integrated methodology of classification and regression trees and random forest for information disclosure prediction: consideration of corporate governance indicator. In: Sixth conferences on operations research society of Taiwan. In Chinese.http://edoc.ypu.edu.tw:8080/paper/antai/2009%E5%B9%B4--%E7%AC%AC%E5%85%AD%E5%B1%86%E5%8F%B0%E7%81%A3%E4%BD%9C%E6%A5%AD%E7%A0%94%E7%A9%B6%E5%AD%B8%E6%9C%83%E7%90%86%E8%AB%96%E8%88%87%E5%AF%A6%E5%8B%99%E5%AD%B8%E8%A1%93%E7%A0%94%E8%A8%8E%E6%9C%83/(42)%E6%95%B4%E5%90%88%E5%88%86%E9%A1%9E%E8%BF%B4%E6%AD%B8%E6%A8%B9%E8%88%87%E9%9A%A8%E6%A9%9F%E6%A3%AE%E6%9E%97%E6%96%BC%E8%B3%87%E8%A8%8A%E6%8F%AD%E9%9C%B2%E9%A0%90%E6%B8%AC%E4%B9%8B%E7%A0%94%E7%A9%B6.pdf

Download references

Acknowledgments

We would like to thank the general environment and several people at KSRI (the Karlsruhe Service Research Institute) for discussions that helped to clarify the EMCUT concept and its use in applications pertaining to framework validation and to assessments of well-being. In particular we thank Niels Feldman, Margeret Hall, and Marc Kohler. Finally, thanks to the anonymous referees and the handling editor for a number of constructive and useful comments that have improved the clarity of the paper. We also gratefully acknowledge financial support for this research from the National Science Council of Taiwan (award NSC 101-2410-H-259-079-).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven O. Kimbrough.

Electronic supplementary material

Below is the link to the electronic supplementary material.

PDF (142 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kimbrough, S.O., Chou, C., Chen, YT. et al. On developing indicators with text analytics: exploring concept vectors applied to English and Chinese texts. Inf Syst E-Bus Manage 12, 385–415 (2014). https://doi.org/10.1007/s10257-013-0228-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-013-0228-x

Keywords

Navigation