Understanding Benefits and Limitations of Unstructured Data Collection for Repurposing Organizational Data

  • Arturo Castellanos
  • Alfred Castillo
  • Roman Lukyanenko
  • Monica Chiarini Tremblay
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 300)


With the growth of machine learning and other computationally intensive techniques for analyzing data, new opportunities emerge to repurpose organizational information sources. In this study, we explore the effectiveness of unstructured data entry formats in repurposing organizational data in solving new tasks and drawing novel business insights. Unstructured data accounts for more than 80% of the organizational data. Our research analyzes the implications of using unstructured data entry formats for propagation of organizational styles. We study this phenomenon in the context of case management in foster care. Using natural language processing and machine learning, we show that unstructured data formats foster entrenchment and propagation of individual organizational styles and deviations from the industry norms. Our findings have important implications both to theory and practice of business analytics, conceptual modeling, organizational theory and general data management.


Systems analysis and design Text mining Stylometry Unstructured data Institutional theory Case management 


  1. 1.
    Gantz, J., Reinsel, D.: Extracting value from chaos. IDC Iview 1142, 1–12 (2011)Google Scholar
  2. 2.
    Boudreau, M.-C., Robey, D.: Enacting integrated information technology: a human agency perspective. Organ. Sci. 16, 3–18 (2005)CrossRefGoogle Scholar
  3. 3.
    Wand, Y., Weber, R.: On the deep structure of information systems. Inf. Syst. J. 5, 203–223 (1995)CrossRefGoogle Scholar
  4. 4.
    DeSanctis, G., Poole, M.S.: Capturing the complexity in advanced technology use: adaptive structuration theory. Organ. Sci. 5, 121–147 (1994)CrossRefGoogle Scholar
  5. 5.
    Burton-Jones, A., Grange, C.: From use to effective use: a representation theory perspective. Inf. Syst. Res. 24, 632–658 (2012)CrossRefGoogle Scholar
  6. 6.
    Berg, M., Goorman, E.: The contextual nature of medical information. Int. J. Med. Inform. 56, 51–60 (1999)CrossRefGoogle Scholar
  7. 7.
    Berg, M.: Implementing information systems in health care organizations: myths and challenges. Int. J. Med. Inform. 64, 143–156 (2001)CrossRefGoogle Scholar
  8. 8.
    Eveleigh, A., Jennett, C., Blandford, A., Brohan, P., Cox, A.L.: Designing for dabblers and deterring drop-outs in citizen science. In: Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems, pp. 2985–2994. ACM (2014)Google Scholar
  9. 9.
    Lukyanenko, R., Parsons, J., Wiersma, Y.F.: The IQ of the crowd: understanding and improving information quality in structured user-generated content. Inf. Syst. Res. 25, 669–689 (2014)CrossRefGoogle Scholar
  10. 10.
    Van Kleek, M.G., Styke, W., Karger, D.: Finders/keepers: a longitudinal study of people managing information scraps in a micro-note tool. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2907–2916. ACM (2011)Google Scholar
  11. 11.
    Lukyanenko, R., Parsons, J., Wiersma, Y., Wachinger, G., Huber, B., Meldt, R.: Representing crowd knowledge: guidelines for conceptual modeling of user-generated content. J. Assoc. Inf. Syst. 18, 2 (2017)Google Scholar
  12. 12.
    Jabbari Sabegh, M.A., Lukyanenko, R., Recker, J.C., Samuel, B., Castellanos, A.: Conceptual modeling research in information systems: what we now know and what we still do not know (2017)Google Scholar
  13. 13.
    Burton-Jones, A., Volkoff, O.: How can we develop contextualized theories of effective use? A demonstration in the context of community-care electronic health records. Inf. Syst. Res. (2017)Google Scholar
  14. 14.
    Lukyanenko, R., Parsons, J.: Information quality research challenge: adapting information quality principles to user-generated content. J. Data Inf. Qual. (JDIQ) 6, 3 (2015)Google Scholar
  15. 15.
    Tremblay, M.C., Berndt, D.J., Luther, S.L., Foulis, P.R., French, D.D.: Identifying fall-related injuries: text mining the electronic medical record. Inf. Technol. Manage. 10, 253–265 (2009)CrossRefGoogle Scholar
  16. 16.
    Sørlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. 98, 10869–10874 (2001)CrossRefGoogle Scholar
  17. 17.
    Larsen, K., Bong, C.H.: A tool for addressing construct identity in literature reviews and metaanalyses. MIS Q. 40, 529–551 (2016)CrossRefGoogle Scholar
  18. 18.
    Castillo, A., Castellanos, A., Tremblay, M.C.: Improving case management via statistical text mining in a foster care organization. In: Tremblay, M.C., VanderMeer, D., Rothenberger, M., Gupta, A., Yoon, V. (eds.) DESRIST 2014. LNCS, vol. 8463, pp. 312–320. Springer, Cham (2014). doi: 10.1007/978-3-319-06701-8_21 CrossRefGoogle Scholar
  19. 19.
    Luther, S., Berndt, D., Finch, D., Richardson, M., Hickling, E., Hickam, D.: Using statistical text mining to supplement the development of an ontology. J. Biomed. Inform. 44, S86–S93 (2011)CrossRefGoogle Scholar
  20. 20.
    Jepperson, R.L.: Institutions, institutional effects, and institutionalism. New Institutionalism Organ. Anal. 6, 143–163 (1991)Google Scholar
  21. 21.
    Giddens, A.: Central Problems in Social Theory: Action, Structure, and Contradiction in Social Analysis. University of California Press, Berkeley (1979)CrossRefGoogle Scholar
  22. 22.
    Sewell Jr., W.H.: A theory of structure: Duality, agency, and transformation. Am. J. Soc. 98, 1–29 (1992)CrossRefGoogle Scholar
  23. 23.
    Hughes, E.C.: The ecological aspect of institutions. Am. Sociol. Rev. 1, 180–189 (1936)CrossRefGoogle Scholar
  24. 24.
    Barley, S.R., Tolbert, P.S.: Institutionalization and structuration: Studying the links between action and institution. Organ. Stud. 18, 93–117 (1997)CrossRefGoogle Scholar
  25. 25.
    DiMaggio, P.J., Powell, W.W.: The iron cage revisited: institutional isomorphism and collective rationality in organizational fields. Am. Soc. Rev. 48(2), 147–160 (1983)CrossRefGoogle Scholar
  26. 26.
    Lakoff, G.: Women, Fire, and Dangerous Things. University of Chicago Press, Chicago (1987)CrossRefGoogle Scholar
  27. 27.
    Roach, E., Lloyd, B.B., Wiles, J., Rosch, E.: Principles of categorization (1978)Google Scholar
  28. 28.
    Smith, E.E., Medin, D.L.: Categories and Concepts. Harvard University Press, Cambridge (1981)CrossRefGoogle Scholar
  29. 29.
    Smith, E.E.: Concepts and thought. In: The Psychology of Human Thought, p. 19 (1988)Google Scholar
  30. 30.
    Parsons, J.: An information model based on classification theory. Manage. Sci. 42, 1437–1453 (1996)CrossRefGoogle Scholar
  31. 31.
    Fodor, J.A.: Concepts: Where Cognitive Science Went Wrong. Clarendon Press, Oxford (1998)CrossRefGoogle Scholar
  32. 32.
    Murphy, G.L.: The Big Book of Concepts. MIT Press, Cambridge (2004)Google Scholar
  33. 33.
    Corter, J., Gluck, M.: Explaining basic categories: feature predictability and information. Psychol. Bull. 111, 291–303 (1992)CrossRefGoogle Scholar
  34. 34.
    Lukyanenko, R., Castellanos, A.: Introducing information gradient theory. In: Breakthroughs and Emerging Insights from Ongoing Design Science Projects: Research-in-progress papers and poster presentations from the 11th International Conference on Design Science Research in Information Systems and Technology (DESRIST 2016) 2016, St. John, Canada, 23–25 May (2016)Google Scholar
  35. 35.
    Walls, J.G., Widmeyer, G.R., El Sawy, O.A.: Building an information system design theory for vigilant EIS. Inf. Syst. Res. 3, 36–59 (1992)CrossRefGoogle Scholar
  36. 36.
    Eisenhardt, K.M.: Building theories from case study research. Acad. Manag. Rev. 14, 532–550 (1989)Google Scholar
  37. 37.
    De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Rec. 30, 55–64 (2001)CrossRefGoogle Scholar
  38. 38.
    Adomavicius, G., Sankaranarayanan, R., Sen, S., Tuzhilin, A.: Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst. (TOIS) 23, 103–145 (2005)CrossRefGoogle Scholar
  39. 39.
    Sparck Jones, K.: Automatic indexing. J. Doc. 30, 393–432 (1974)CrossRefGoogle Scholar
  40. 40.
    Kachigan, S.K.: Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, New York (1986)Google Scholar
  41. 41.
    Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)Google Scholar
  42. 42.
    Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)Google Scholar
  43. 43.
    Berry, M.W., Castellanos, M.: Survey of text mining. Comput. Rev. 45, 548 (2004)Google Scholar
  44. 44.
    Abbasi, A., Chen, H.: CyberGate: a design framework and system for text analysis of computer-mediated communication. Mis Q. 32(4), 811–837 (2008)Google Scholar
  45. 45.
    Holmes, D.I.: The evolution of stylometry in humanities scholarship. Literary Linguist. Comput. 13, 111–117 (1998)CrossRefGoogle Scholar
  46. 46.
    Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. (CSUR) 18, 323–364 (1986)CrossRefGoogle Scholar
  47. 47.
    Shanks, G., Tansley, E., Nuredini, J., Tobin, D., Weber, R.: Representing part-whole relationships in conceptual modeling: an empirical evaluation (2002)Google Scholar
  48. 48.
    Evermann, J., Wand, Y.: Towards ontologically based semantics for UML constructs. In: Kunii, H.S., Jajodia, S., Sølvberg, A. (eds.) ER 2001. LNCS, vol. 2224, pp. 354–367. Springer, Heidelberg (2001). doi: 10.1007/3-540-45581-7_27 CrossRefGoogle Scholar
  49. 49.
    Wand, Y., Storey, V.C., Weber, R.: An ontological analysis of the relationship construct in conceptual modeling. ACM Trans. Database Syst. (TODS) 24, 494–528 (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Arturo Castellanos
    • 1
  • Alfred Castillo
    • 2
  • Roman Lukyanenko
    • 3
  • Monica Chiarini Tremblay
    • 4
  1. 1.Baruch College (CUNY)New York CityUSA
  2. 2.Cal PolySan Luis ObispoUSA
  3. 3.University of SaskatchewanSaskatoonCanada
  4. 4.College of William and MaryWilliamsburgUSA

Personalised recommendations