Introduction

Opportunities for countless scientific discoveries are anticipated from data intensive research and the application of computational methods and visualization tools to vast and growing data stores. These opportunities allow scientists to generate new questions, expose unseen or novel patterns, and to answer questions that challenge our global societies (Newman et al. 2003; Kelling et al. 2009). Data intensive research depends on the acquisition, organization and long-term management of these data collections. Such research also depends on the development and implementation of tools and systems for data integration, retrieval, and analysis. The emergent fields of informatics and data curation aim to meet these challenges, and each will require a skilled, professional workforce to meet the needs of the 21st century scientific enterprise.

Informatics is “the science of information,” its focus of study is “the representation, processing, and communication of information in natural and artificial systems” (Fourman 2002). An emerging application area for informatics research is within the Earth and space sciences—sometimes referred to as geoinformatics. The American Geophysical Union (AGU), a conglomerate of 50,000 Earth and space science researchers, has developed a focus group “concerned with issues of data management and analysis, large-scale computational experimentation and modeling, and hardware and software infrastructure needs, which ultimately provide the capability to change data systems into knowledge systems that support the range of Earth and space science interests.”Footnote 1

In August of 2009 a group of graduate students and established researchers met for a three-day geoinformatics workshop. There were dual goals for the workshop; first, there was an emphasis on further identifying and extending the geoinformatics community. The expertise needed for such a community to flourish ranges from domain experts in the Earth and space sciences, to computer science, library science, and information systems. To this end, the workshop had a strong focus on bringing together as diverse a group of attendees as possible. Of particular interest for participation were graduate students who would serve as the next generation of geoinformatists. The second goal of the workshop was to identify the infrastructural needs and common problems facing the geoinformatics community. We feel that the workshop was successful in achieving both of these goals, and the identified needs and problems are briefly outlined in subsequent sections. This special issue highlights selected presentations from the workshop, and the contributions of these papers are summarized in the following sections. .

Informatics as design science

Software as an instrument

In the Earth and space sciences, instruments are built to test and validate scientific theories. This instrumentation is deployed on spacecraft, and at various locations around the world, in order to collect data relevant to a particular hypothesis. Weigel (2009) argues that software is a scientific product analogous to an instrument. In geoinformatics, and more generally in informatics as a whole, software allows us the ability to test not only how something might work, but also why it works. Just as an Earth scientist needs to understand an instrument in order to make sense of its data, a geoinformatist needs to understand the software that was deployed in a given environment. Weigel concludes that prevailing viewpoints need to change, that software needs to be considered research, and that methods need to be put in place for software to live on long after a project ends.

Such thinking echoes the Design Science research of Hevner et al. (2004) and March (March and Smith 1995) and such an approach to geoinformatics should be encouraged. The geoinformatics field is not only technical, but also socio-technical. One must understand IT systems and also the environment in which they operate. Building, evaluating, and justifying—concepts familiar to Earth and space scientists—need to be brought to the forefront in software development and deployment.

Software reuse

Fundamental to the survival of geoinformatics is the reuse of established knowledge and practices. A key theme repeated throughout the workshop was of software reuse. This theme involves the necessity of software repositories, documentation, and an understanding of how and why software works. To this end, Marshall et al. (2009) advocates for software readiness levels and demonstrates this approach in an Earth science software reuse portal.

Data infrastructure, trust, and organizational implications

Downs and Chen (2009) argue for better integration between data providers and the existing infrastructure. One solution they recommend is that data providers and archives must work together early and often. This echoes the general agreement across the information professions that new kinds of collaborations must happen at the local and institutional levels (Green and Gutmann 2007), as well as at the national level (Choudhury et al. 2009), which is now being undertaken by the National Science Foundation’s DataNet initiative.Footnote 2

Emerging education initiatives

Another significant theme evident throughout the workshop was education, and the need for new programs to train the informatics and data curation workforce. Branch et al. (2009) presented research showing that gaps in current undergraduate education, particularly in areas related to geo-spatial technologies, are harming the efficiency and flow of the scientific enterprise. This undergraduate education has gaps in skill areas that then must be acquired on the job. Borne’s (2009) discussion of “astroinformatics”, presented the links between research and education, and a new program on Data Science. Importantly, it was noted that faculty are using a great deal more “real” data in their teaching. Librarians engaged in current data curation research projects have also identified this change.

The need for specialized programs to address current and emergent problems in informatics is congruent with the need for similar educational programs in data curation. Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Many organizations have begun to address these needs, and graduate programs are now being established. One such program is the data curation specialization in the Master of Science degree program at the Graduate School of Library and Information Science (GSLIS) at the University of Illinois.Footnote 3 The Data Curation Education Program is a specialized curriculum that focuses on data collection and management, knowledge representation, digital preservation and archiving, data standards, and policy. Data curation includes not only data archiving and digital preservation, but also active management and appraisal of data over the life-cycle of scientific interest. Students in the program are expected to enter the workforce ready to take responsibility for assimilation and management of data in ways that add value and promote sharing across institutions and disciplinary specializations.

Summary and conclusions

The workshop brought together people with a wide range of experiences and research efforts, and out of this diversity emerged several fundamental issues and overlapping objectives. The geoinformatics community is still nascent, yet we hope that through the workshop we have begun to identify the key infrastructural challenges and research needs, as well as some of the people who will address them. In this special issue we report on the state of geoinformatics infrastructure from a set of diverse viewpoints, and we believe that the collaborations, discussions, and research that resulted from the workshop will pave the way for addressing future challenges. We encourage readers to view all presentations from the workshop at: http://essi.gsfc.nasa.gov/presentations.html