Dealing with Data Deluge at National Funding Agencies: An Investigation of User Needs for Understanding and Managing Research Investments
- 1.2k Downloads
This paper provides in-depth, applied and contextualized insights about the particular challenges members of federal government funding agencies face when dealing with data deluge. We present the findings of qualitative research conducted with members of a federal US funding agency. The findings point out specific needs for understanding investment portfolios broadly and tracking the evolution and impact of ideas. They show limitations of existing solutions and their negative effects on labor, time, and personal stress. Based on these findings, we make specific suggestions for the design of automated tools that can help funding agencies understand and manage their portfolios.
KeywordsUnderstanding users User research Knowledge management Data deluge Funding agencies Research investments Portfolio mining
Government funding agencies play an important role in both knowledge advancement and economic development in the United States and elsewhere. They set the agenda and make investments that stimulate research in areas that are considered of strategic importance for the country. As is the case with any kind of investment, it is important for government funding agencies to be able to understand investments, track their impact, and assess ROI. However, given the nature of the output of such investments (knowledge and knowledge products), results are often hard to quantify and assess. As a result, many government funding agencies in the United States struggle with understanding their large investment portfolios. The National Science Foundation charged a subcommittee with investigating this problem. A report released in 2010  pointed out that portfolio management is done mostly manually, which requires tremendous amounts of time and effort, that are only increasing as the quantity of information grows. The report called for the development of automated tools to help funding agencies understand and manage their investment portfolios. This paper contributes to addressing this need by presenting data about the users of such portfolio management tools. The goal of this research is to provide a better understanding of the environment and problems decision makers who work inside governmental funding agencies struggle with on a daily basis. This user research leads to guidelines and requirements that can inform the development of effective, user-centered [4, 12] portfolio management tools.
The importance of being able to understand, manage, and assess knowledge about investment portfolio cannot be understated. In the United States alone, the investment in research and development is measured in billions (about $140 billion annually for the past five years) . Government funding agencies need to be accountable to taxpayers and to demonstrate the impact of these investments. Moreover, understanding the impact of investments is necessary for future decision-making and strategic planning about what areas of research to stimulate through funding.
The intellectual products that result from federal funding present a particular type of challenge for portfolio management. The sheer quantity and diversity of these products (research papers, tools, patents, learning materials, etc.) that result from more than 10,000 awards a year that the National Science Foundation alone  funds, makes it very difficult to systematically assess and quantify impact. As more and more data is being generated, agencies face the need to manage knowledge generated from this data: insights about the nature of intellectual properties being produced, the social and community processes of producing these intellectual properties, their adoption and practical applications. Governmental agencies are not alone in this struggle. The problem of too much data and information is known as data deluge and has been documented widely. Knowledge management can provide a solution to this problem.
2 Related Work
2.1 The Problem of Data Deluge
With computer hardware development following Moore’s law and the rapidly growing adoption and speed of the Internet, a huge influx of data is being generated from various sources. Not only is this a problem in the scientific-research field or business world, where overflow of data is not news, but also federal funding agencies face the challenge of spectacular increases in data volume. For federal funding agencies, the data deluge comes from the growing body of historical and continuously incoming funding documents - including funding solicitations, submitted proposals, awarded project information, published papers, project evaluations, etc. This data requires aggregation and interpretation, which forms knowledge to help with agencies’ decision making.
In the knowledge pyramid, knowledge is formed through integration and making sense of data and information [6, 7, 17]. In this view, data refers to raw and objective entities collected directly; information is data processed into meaningful patterns; while knowledge is an aggregation of information that can become a guide for people to take actions in certain work conditions [7, 13, 16, 17]. Tuomi  suggested an inverse view of this hierarchy claiming that knowledge influences the process of collecting and processing data.
Ideally, more data can improve decision making as data-supported insights rather than intuition can bring more rationality into the decision-making process. However, as the cost of collecting data decreases, many organizations find that the overwhelming quantity of available data makes it difficult to process it into information and knowledge. With the unprecedented growth of data that cannot be managed, accessed, analyzed, and integrated into actionable knowledge, the availability of even more data can cause the “file-drawer” issue. Large amounts of data remain in virtual “file drawers” and are never seen or used . Opportunities from deriving actionable insights from these data are missed. In a world where knowledge has been recognized as a core competence for an organization [3, 14, 15, 21], the difficulty of transforming knowledge into data can be a serious liability. Knowledge is recognized as an important organizational asset that influences an organization’s strategies and decision-making processes. So, how can organizations such as national funding agencies address this problem? In the next section, we review existing knowledge management systems (KMS) and evaluate their fit for large governmental funding agencies.
2.2 Knowledge Management in Organizations
As discussed in the previous section, data deluge in an organization is actually becoming a knowledge management (KM) problem. To address the problem, researchers have studied knowledge management systems (KMS) intensively.
Knowledge management is defined as “the generation, representation, storage, transfer, transformation, application, embedding and protecting of organizational knowledge” (p. 218) . It aims to increase the ability of innovation and responsiveness of an organization .
Knowledge management nowadays needs to do much more than just provide storage and access to data. It is expected to provide more exploration and easier distribution, to help with organizational problem solving, decision-making and strategic planning. The environment of organizational decision-making has been recognized as complex and ill structured due to high volume of data, with many semi-structured problems occurring at the strategic planning stage of decision-making [5, 20, 25]. This situation requires a solution that existing search technologies do not address. The challenge posed by the data deluge and failure of knowledge management is not that of finding a particular item; rather, the challenge is to make sense of knowledge developments, to be able to characterize the field of knowledge - its growth, evolution, and impact. Researchers have pointed out the need to make sense of the body of knowledge and to derive actionable insights by investigating the knowledge that is buried in the overflow of data in order to know more about “what is known, how, and by whom” (p. 721) . In this context, knowledge management tools should offer exploitation of the known as well as exploration of the unknown, which could help maintain the mental model as well as encourage building new mental models for organizational members .
Within this framework, researchers have studied the inclusion of knowledge management into decision support systems [5, 27, 29]. For example, KMS was found to cater to high-level executive decision-making , and to promote sharing and amplifying individual knowledge . However, despite all these efforts for different business organizations, federal funding agencies are left out on the battlefield: they are for the most part still buried in massively abundant data and lack tools for exploring their own funding portfolios . Funding agencies might require different approaches to knowledge management and they have called for solutions customized to their own organizations’ complex needs .
One notable example of a project that undertakes knowledge management for funding agencies is Science and Technology for America’s Reinvestment (STAR) Metrics . Development and adoption of the ambitious STAR Metrics project has been difficult and slow, and at the time of data collection for this research project it was yet to meet the needs of decision makers at funding agencies.
System adoption often depends on multiple factors such as perceived value and usability . Designing usable systems that provide meaningful and pleasant user experiences and help users achieve their goals requires not only a theoretical understanding of the problem of data deluge and knowledge management, but also applied, nuanced, and local understandings of the people who struggle with this issue on a daily basis and of their work environment [3, 6]. Design of KMS especially needs insights of operations and structures of the target organization. According to Gold et al. , there are three major infrastructures of an organization that influence the capability and efficiency of creating and utilizing new knowledge: the technical, structural, and cultural. Thus, examination of the existing technical solutions, and organization norms and contexts could help with the system design. Furthermore, as pointed out by Yim et al. , positions at different hierarchical levels might require different categories and representations of knowledge. As suggested by Faniel and Zimmerman , in-depth investigation of how researchers manage their data can provide useful insights for improving the design of systems to share and reuse data. User research needs to be carried out first in order to understand federal funding agencies’ difficulties of dealing with big data. To the best of our knowledge, there is no similar user study addressing funding agencies’ perspectives, however, we did find one reflecting researchers’ consideration of building infrastructure to manage large data. Beagrie and Rowlands conducted a qualitative study with online surveys and interviews in 2008 . This study revealed the importance of local data management for researchers as well as the concern regarding secure storage and access control to research data. While members of funding agencies may share these needs and concerns, more research is needed to understand their particular situation before designing technical solutions that will be both useful and usable. To address this need, we conducted user research inside a federal US funding agency. The methods and results of this research effort that benefited from unprecedented access to decision makers inside a federal funding agency are presented next.
We conducted in-depth focus group interviews with decision makers and support staff at a US federal funding agency. Data was collected during two months in the fall of 2011. A total of 31 participants, including five members of the higher administration, 19 program officers, and seven support staff (science analysts) from different parts of the organization participated in the focus groups. The participants were selected through key informant sampling facilitated by a member of the organization who invited selected colleagues to participate in this effort. The selection was based on the participants’ likelihood of being information rich cases who could share rich insights on the issue at hand. The focus group discussions resulted in 476 min of data, which was transcribed under confidentiality contract by a professional transcription service. All research abided by the strict confidentiality guidelines of the funding agency and was approved by the Institutional Review Boards of the authors’ institutions.
Focus group transcripts were subjected to thematic analysis . Four researchers read through the transcripts repeatedly in order to identify recurring codes in the data and organize them into themes. Findings were discussed until agreement was reached upon a set of major themes, presented next.
We organize the results around three major themes that describe how decision makers at this federal agency cope with the problem of data deluge. The first major theme focuses on users’ information needs. The second one discusses the inadequacy of existing systems to meet information needs. The third theme addresses the personal and institutional stress associated with the need to derive actionable insights from large data.
4.1 Theme 1. Information Needs: Beyond Search
We found that the most pressing information needs were not for individual items that could be retrieved through search. They were for qualitative, contextual, or historical knowledge that would describe and characterize the existing state of the institution’s investments and thus provide a foundation for strategic decision-making.
During focus group discussions, agency members mentioned repeatedly the need to visualize and understand the current state of their funding portfolios. They expressed the desire to be able to quickly grasp the big picture of funding portfolios in order to understand the distribution of institutional funds across geographic districts, institution types, research problems, and so on.
And so being able to both get a 40,000-foot view of what a portfolio is, to understand it in broader terms, and then to be able to dig down very deeply… is something that program officers need, but cannot currently do with available tools.
And we don’t have that. We have no way of aggregating data, we don’t even have similar measures across projects, or any way of determining equivalencies so you can begin to aggregate.
This need was even more pressing for organization members who held short-term positions with the institution. As opposed to the permanent employees who had historical knowledge by virtue of their long tenure, short-term employees struggled with understanding the portfolios they had taken over and were in charge of managing for a limited period of time. They mentioned spending a lot of time and effort trying to gain an overall understanding of their portfolios before they felt they could make informed decisions as to future research agendas the institution should encourage. Short-term employees relied on conversations with their colleagues for historical information on funding trends, as this quote from a program officer illustrates:
But I think, too, as a [short-term employee], I mean, I rely very heavily on permanent staff. You know, because from a historical perspective, they have the most knowledge about sort of the trends and what’s been funded and where, you know, where they see things going. So that’s a critical resource for me.
Both short-term and permanent employees provided similar reasons why they thought big-picture portfolio characterization was needed. One of the reasons was the need to understand funding trends and to ensure portfolio diversification by avoiding repeated funding of the same topic or idea. Another reason was the need to be able to assess impact efficiently. The agency officers hoped that a global understanding of funding portfolios could help them ascertain what projects, topics, practices, and ideas were successful and made an impact. In other words, they needed better methods to evaluate the return on funding investments. This impact data is used for institutional evaluation, reporting, and public accountability, but also for informing strategic directions for future growth, as this quote from a program officer shows:
I think it would help us to be able to reflect a little bit on, you know, getting back to what are some of the big changes that [this organization] has helped push and sort of having a better version of those stories and what really was it that started it and maybe to realize that $100,000 and $150,000 proposal was maybe what got something started, but that there were several more steps and several other things that had to line up. And so what are realistic expectations for how transformative one funding decision might be or how does that fit into the context of what else is going on. That would sort of help us think about the impacts of our decisions.
In summary, the greatest information needs of decision makers at this federal agency were actionable knowledge: clear, insightful analyses of existing data that would showcase trends and provide an easy to grasp description of the overall current state of affairs. The institution employs several analysts who are in charge of producing such insights manually, since automated solutions do not exist at the moment. However, because of the inadequacy of existing systems, the analysts end up spending their time inefficiently instead of focusing on higher-level tasks. The inadequacy of existing systems is the second major theme we identified.
4.2 Theme 2. Inadequacy of Existing Systems
Currently, the federal agency we studied houses information in several databases accessible through various user-facing systems. The databases are accessible by decision makers, but because they are cumbersome and time-consuming, program officers usually ask analysts to pull up information and prepare reports for them. The analysts’ usual work process consists of several time-consuming stages that require a lot of manual processing and filtering of results before any analysis can be performed. This is explained by some major inadequacies we identified with existing systems, as follows:
Existing systems support search and filtering of results, but not analytics and visualizations. Those need to be performed manually. Even though manual analysis should address difficult, sophisticated problems that machines cannot solve reliably, a lot of the analysts’ time is spent refining search results. This problem stems from the second inadequacy we identified.
Existing systems cannot cope with growing data, shifting categories, and ambiguous concepts. In most of these systems, information is stored and retrieved according to predefined database categories. As the types of information change, they may not fit existing categories. Also, the limited number of categories includes only major characteristics that database designers identified as relevant when they planned the system. The system is incapable of automatically adapting to growing data and data types and to new categories that may become relevant. Also, the system is not designed to reconcile linguistic differences between similar concepts, thus producing either repetitive or incomplete search results. Interviewees made several comments about the difficulty of capturing shifting categories in taxonomies:
What happens is that you don’t have a specific category because that thing is something that is just arising and you don’t know if it’s going to be anything of importance, so you don’t have that. So then like remote, use of remote instruments, so you don’t bother to put it down because you think this guy is doing it, but then all of the sudden you have 20 of them or 10 of them, so do you go back and comb the old ones to see if any of them did it.
The consequences of these limitations are that a lot of manual filtering is required before analysts can obtain a reliable and complete list of search results they can later run analyses on. Besides being time consuming, the system displays a third inadequacy: it is difficult to use.
Manual filtering of search results is difficult and requires both topic expertise and institutional knowledge. Therefore, analysts who are not topic experts need to spend a lot of time communicating with decision makers in order to gain an understanding of the key terms and concepts they should search for, and be able to make judgment calls about the relevancy of search results. In addition to topic knowledge, analysts also need institutional knowledge. They need to understand the institution’s processes and procedures and the way these are coded and recorded into the various databases. Without such understanding, they run the risk of producing unreliable search results. For example, when calculating a funding rate, they need to know how to address the situation of a principal investigator transfer. In the absence of solid institutional knowledge, proposals that were transferred to a different investigator could be counted twice, thus producing incorrect funding rates. Many such cases need to be filtered manually, and often times analysts spend days reviewing search results one by one and deciding whether to include them in the list of valid search results they will perform analyses on. In the next quote, an analyst explains how long it took to filter thousands of research proposals in order to identify the 250 documents that were needed for an analysis:
For that specific dataset that I pulled up because of the limitations I have in being able to filter out proposals I had to do it broadly. Like I searched by program code, which pulls up a lot of unnecessary proposals that are false hits, so that took me about two and a half weeks of just searching through.
The extent of manual labor needed to track down impact for a particular topic is illustrated by this analyst’s story:
It’s just getting us a list of 2,000 proposals or awards that could potentially implement best practices and have good outcomes. The next step would be to separate out all these 2,000 proposals I’ve pulled out by program manager and then beg each program manager to look at this 20 that they get to see if they remember these proposals coming in and if they know what has been done subsequently with what they’ve researched.
The difficulties posed by working within the limitations of existing systems create personal and institutional stress, as explained in the third major theme we identified.
4.3 Theme 3. Personal and Institutional Stress
As a result of these difficulties, decision makers state that they often forego information requests. They understand how time-consuming they are for analysts, and that they have to be very judicious with their requests in order to avoid overloading the analysts and allow them time for high priority tasks. Decision makers employed by this federal agency either short or long term are highly capable, well-trained individuals who have demonstrated expertise in their own areas of scholarly research. Yet, they depend upon analysts even for relatively simple reports, and this presents a cause of stress and dissatisfaction.
Another major cause of stress and dissatisfaction is captured by a phrase research participants used repeatedly during focus group discussions: “firefighting mode.” Decision makers felt that a lot of their time was spent solving problems rapidly, on an emergency basis. The fact that producing reports and analyses is so time-consuming compounds this problem and leaves little to no time for reflection and achieving broad, contextual understanding that participants wished for, as this quote from a program officer illustrates:
We don’t really even have time to anticipate, kind of think about and anticipate the kinds of questions we might be asked or the kinds of things we would like to know because we – there really isn’t that time set aside that we can spend really doing that self-reflection.
5 Discussion and Design Implications
The fundamental insights emerging from our study point to the urgent need for a system that is designed primarily with these constituents in mind. A simple search engine that will provide a list of search results is just not sufficient to meet these users’ needs, as Theme 1 shows. Clearly, our own analyses have shown that there are existing efforts at the national level that attempt to characterize the portfolio of projects that form the research ecosystem of federal agencies. Yet – these tools are not used daily by decision makers at funding agencies (as reflected in Theme 2 of our results section). The results presented in the previous section point to several fundamental design implications:
First, regardless of how sophisticated a tool for knowledge management and portfolio analysis may be, the user-facing interface needs to be simple and easy enough to use that decision makers will actually consider its affordances. The design of the tool itself needs to make affordances obvious while shielding end-users from the complexities of data mining and visualization. One of the core problems with many of the current generation of knowledge management tools is that they are designed by experts in data mining with minimal consideration of how complex it may be for users to operate on a daily basis. Therefore they create decision makers’ dependency upon research analysts. As our results show, this is a source of stress and organizational inefficiency.
Theme 3 that was raised in our analyses also supports this requirement. Potential users of knowledge management and analysis systems are constantly in “firefighting mode.” The results show that they are hard-pressed for time and are under constant pressure to deliver. Adding a tool or a set of tools that will require them to spend an enormous amount of time learning the systems and their functionalities will not allow rapid diffusion. Therefore, the tools themselves need to feed into the users’ day-to-day workflow. If the users have to visit yet another site and go through extensive training (or other time consuming activities) before the tools become valuable to them, the portfolio mining tools would have failed in their design.
In many instances, current tools make the assumption that providing a search box is sufficient to allow exploration by the end users. Our study reveals that in most cases potential users of knowledge management systems do not even understand where to begin their exploration process. Theme 2 in our analyses also showed that current data mining systems used in the funding agencies are inefficient in a way that the organization depends highly on tacit knowledge (i.e., personal knowledge, which is hard to communicate and spread) rather than explicit knowledge [5, 24]. Collecting and codifying tacit knowledge into explicit knowledge has been recognized as a core step for an organization to lower the cost of creating new knowledge. A good knowledge management system should be able to facilitate this process and leverage tacit knowledge within an organization. Therefore, the second equally significant design direction that emerges is that not only does the tool need to be simple for users to understand, but it also needs to provide suitable explorative vantage points for the end users as well as learn over time from patterns of usage.
Third, results from our analyses point out that these users need tools that not only capture current information in usable forms, but also engage in historical time-slice based exploration. In essence, they look for a more epistemic approach to their portfolios rather than a static view of their research ecosystem as it exists currently. A successful tool would have to support dynamic exploration of historical data in order to enable the identification of trends and assessment of impact.
Impact assessment and evaluation of ROI emerged as important topics in focus group discussions and they lead to a fourth design implication: Users at funding agencies need tools that show the connections between knowledge products and can trace the development of a funded proposal into papers, publications, patents, materials, commercialization, as well as their citation and adoption rates. These are some of the ways that the tool could help assess impact and ROI, although the operational definition of these concepts is still in need of more research.
Implicit in the results is also the need for the stakeholders to have an open discourse about the characteristics of their portfolios with each other, which points to a fifth design implication: A community aspect is needed for these tools. As much as the design would allow, the tools need to provide venues for co-construction of vocabulary and knowledge artifacts among the community of users. It is in this context that folksonomies  play a critical role. Traditionally, portfolio-mining systems provide or have constructed static taxonomies for end users to consider. This approach works to a certain degree in certain scientific contexts. For example, medical research renders itself better to rigid classification. However, research in other fields can be much more fluid and therefore far more difficult to categorize and classify. Therefore, a tool needs to consider these complexities carefully in its design. One of the best ways to approach this scenario is to allow human expertise to play a significant role in shaping and evolving the vocabulary and terminologies that are part of the toolsets, which points to the sixth design implication we identify: Portfolio mining tools need to be intelligent enough to evolve folksonomies bottom-up from existing information and to continuously learn from users in order to improve categorization of knowledge products. The system developed based on these design requirements is discussed in [34, 35].
This paper set out to provide in-depth, applied and contextualized insights about the particular challenges members of federal government funding agencies face when dealing with data deluge. We presented the findings of qualitative research conducted with members of a funding agency. The findings point out specific needs for understanding investment portfolios broadly and tracking the evolution and impact of ideas. They show limitations of existing solutions and their negative effects on labor, time, and personal stress. Based on these findings, we make specific suggestions for the design of automated tools that can help funding agencies understand and manage their portfolios.
Further research is needed to explore creative technical solutions that would address these challenges and to evaluate their viability and potential utility for solving the particular problems of decision makers at funding agencies. Further research can investigate the difficulties scholars who do not make funding decisions encounter as they make sense of growing amounts of knowledge in their respective fields. That information would inform the development of similar knowledge mining tools for researchers.
This research is supported by NSF awards TUES-1123108, TUES-1122609, TUES-1123340, TUES-1122650.
- 1.Beagrie, N., Beagrie, R., Rowlands, I.: Research data preservation and access: the views of researchers. Ariadne (2009). http://www.ariadne.ac.uk/issue60/beagrie-et-al/
- 2.Boyatzis, R.E.: Transforming Qualitative Information: Thematic Analysis and Code Development. Sage Publications, Inc., New York (1998)Google Scholar
- 4.Cooper, A., Reimann, R., Cronin, D.: About Face 3: The Essentials of Interaction Design. Wiley, Hoboken (2007)Google Scholar
- 6.Davenport, T.H., Prusak, L.: Working Knowledge: How Organizations Manage What They Know. Harvard Business Press, Boston (2000)Google Scholar
- 10.Gold, A.H., Malhotra, A., Segars, A.H.: Knowledge management: an organizational capabilities perspective. J. Manage. Inf. Syst. 18(1), 185–214 (2001)Google Scholar
- 11.Hackbarth, G.: The impact of organizational memory on IT systems. In: Proceedings of the Fourth Americas Conference on Information Systems, pp. 588–590 (1998)Google Scholar
- 12.Hartson, R., Pyla, P.: The Ux Book: Process and Guidelines for Ensuring a Quality User Experience. Morgan Kaufmann, San Francisco (2012)Google Scholar
- 13.Hemsley, J., Mason, R.M.: Knowledge and knowledge management in the social media age. J. Organ. Comput. Electron. Commer. 23(1–2), 138–167 (2013)Google Scholar
- 19.NSF. NSF FY 2013 Budget Request to Congress (2012)Google Scholar
- 22.Rogers, E.M., Rogers, E.: Diffusion of Innovations, 5th edn. Free Press, New York (2003)Google Scholar
- 26.Tuomi, I.: Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. In: Proceedings of the 32nd Annual Hawaii International Conference on System Sciences, HICSS-32, p. 12 (1999)Google Scholar
- 28.Weinberger, D.: Everything Is Miscellaneous: The Power of the New Digital Disorder. Times Books, New York (2007)Google Scholar
- 30.Zack, M.H.: Managing codified knowledge. Sloan Manage. Rev. 40(4), 45–58 (1999)Google Scholar
- 31.NSF. Discovery in a Research Portfolio: Tools for Structuring, Analyzing, Visualizing and Interacting with Proposal and Award Portfolios (2010)Google Scholar
- 32.NSF. NCSES Proposed Federal R&D Funding for FY 2011 Dips to $143 Billion, with Cuts in National Defense R&D - US National Science Foundation (NSF). http://www.nsf.gov/statistics/infbrief/nsf10327/
- 33.STAR METRICS. https://www.starmetrics.nih.gov/
- 34.Liu, Q., Vorvoreanu, M., Madhavan, K.P.C., McKenna, A.F.: Designing discovery experience for big data interaction: a case of web-based knowledge mining and interactive visualization platform. In: Marcus, A. (ed.) DUXU 2013, Part IV. LNCS, vol. 8015, pp. 543–552. Springer, Heidelberg (2013)Google Scholar