Introduction

Seven years ago the U. S. National Academies convened a ‘Symposium on Electronic Scientific, Technical, and Medical Journal Publishing and Its Implications’ (National Research Council 2004). Most of the discussions and conclusions from that meeting revolved around changing business models in academic publishing and existing structures for peer-review and quality control. However, some remarks touched on the changing face of research itself and how that will affect and be affected by new technological advances and means of disseminating research methods, data, and findings. Since then many technologies have evolved, particularly those related to cyber infrastructures, and what was only a vision then of ubiquitous and comprehensive digital research environments is now with us. It is clear that future scholarly communication will be electronic, offering possibilities to blur the boundaries between research activities and research publishing in highly integrated, interactive electronic environments, and will rest on a radically different economic business model (Brown et al. 2007).

While many people in society have made fundamental changes in their operations such as online business, location based services, and eGovernment, much potential of above-mentioned technologies remains untapped in day-to-day research activities. In the geospatial domain we see technical solutions emerging for some of the main components to realize the geospatial web and cyberinfrastructure (Scharl 2007; Harvey and Raskin 2011), including open and distributed development of analytical tools (Yang et al. 2008), portals for access to public, private, and volunteered geographic data production disseminated through web portals (Goodchild 2007; Goodchild and Glennon 2010), as well as some remaining issues where early frameworks need to mature further into implementation, for example the need for broader participatory approaches to spatial analysis and decision making (Kingston 2007; Rinner et al. 2008), support for evolving and distributed ontologies (Mika 2007), feedback mechanisms from data user to producer (van Ooort et al. 2010) and emergence of spatial semantic web communities (Tummarello and Morbidoni 2008).

The research paper goes ‘live’

Our focus in this guest editorial is offering a vision how the research paper can be extended in digital research environments. The profound changes that new technologies have had on the way we “do science” eluded to above, have not had the same dramatic influence on the way we “report science” and the peer-review process (Lackes et al. 2009). Jahnke and Koch (2009) discuss the effect of Web 2.0 to academia and claim that it does not provide many changes to education, but brings more changes in research. There are publications about the potential of Web 2.0 for research and education in academia (c.f. Boulos et al. 2006; Greenhow et al. 2009; Lankshear and Knobel 2007; Ullrich et al. 2008). But, there are only a limited number of works that deal with ‘live’ paper-writing that fully consider the potential of Web 2.0 technologies (Stuart 2009) for spatial data sharing, editing, and analyzing to merge research and publication to revitalize scientific interactions. Most work so far has focused on augmenting existing papers with annotations or linkages to other resources on the web (c.f. Ceol et al. 2008; Pafilis et al. 2009) and some even embed support for visual analysis tools (Attwood et al. 2010). Still, Lackes et al. (2009) argue that existing websites of research database and communication do not support the full bidirectional-flow of information between the author(s) and the reader(s). They suggest a conceptual design of a framework that supports networking, tagging, project, discussion, literature, evaluation, user information management, and search. These ideas provide provocative insights into new modes of scholarly communication and we see the relevance to GIScience to further investigate the possibilities for a multi-directional process of data providing, analysis, and writing as part of the scientific research process and conversation. As openness, academic freedom and rigorous academic scrutiny need to remain fundamental pillars of scholarly writing, we argue that new technologies enables a new paradigm for scientific writing and communication, one in which the background literature, the researcher’s thinking, motivation, data, methods, experiment, results, interpretation, and conclusions can come closer to a direct dialog between authors and readers.

Making research live: the live paper

The goal of the live research report we outline here is a richer engagement with science beginning with the presentation and discussion of research in the form of a research paper. By turning the paper into a more interactive environment, researchers and readers would all be able to communicate and discuss a study without being restricted by the unidirectional flow of written narrative, static, 2-dimensional graphics, and comments on web pages.

We suggest that this enhancement of research engagement can be achieved through recent developments of information architectures based on “web-services” that provides a standardized framework for linking together computers running different software applications such that they can inter-operate and be combined like Lego blocks into new solutions. These so-called Web 2.0 (OReilly 2007) technologies have effectively transformed web pages from being rather static and one-directional in their communication, into highly interactive and user-centered platforms for collaboration and two-way communication. Some examples of existing Web 2.0 solutions exploit social interaction (e.g., Facebook.com and LinkedIn.com), data capture and distribution (e.g., Google Maps and ArcGIS online), and workflow of any projects (e.g., pipes.yahoo.com). Specifically, using Facebook or Linked-In the users exchange information from every-day chat to business-related data. Also, Google Maps or Google Earth enables users to create and add data—either geometry or attribute data—through cloud-based services, and distribute the users’ maps to other users online. In addition, Pipes (pipes.yahoo.com) provides the user tools to collect, aggregate, manipulate, geocode, and translate existing data from other users’ “pipes” on the web.

We organize the following presentation of the live paper concept around two guiding questions: how can each section of a traditional research report (the introduction, methods and data description, analysis, results, and conclusion) be enriched by web services? And, what implications would such environments have on the practice, culture, and economy of scholarly work? Our presentation follows a relatively typical research report outline with four/five generic research report sections; Introduction, Data and Methods, Analysis/Experiment, Results, and Summary/Discussion/Conclusion, all preceded with “The live…” pretext to indicate that it describes potentials for such a report section to deliver more than just the traditional text narrative.

The live introduction

The purpose of a research paper introduction is primarily to provide a background for the presented research such that we can see that the authors have considered other relevant research and how the presented study relates to existing work. In this way an introduction serves to contextualize the paper. A regular paper does so through a narrative with references to related work, but a reader often has to invest significant time to look up citations in order to construct a mental image of how the authors have framed their research. Also, even though well-written papers may come close to communicating the author’s ideas there are still significant restrictions imposed by the format of written narrative that makes it hard to encapsulate many important aspects of the reported research, such as the process of interpretation, cross-linking, and evaluation that is part of a typical literature analysis performed by the researcher (c.f. Pike and Gahegan 2007). So, what are the possibilities to use above mentioned technologies to enable narratives that become omnidirectional in that you can interactively trace back the origins of thoughts through the referenced literature and critically engage in the way that literature has been interpreted by the author and by others?

While online access to published work has enabled direct linking to cited works, a reader is still typically left with the author’s narrative of how these works relate to each other and to the presented research. Existing citation databases contain much inherent information that is not yet currently exploited by an electronic paper. The value of exploring scholarly citation networks for patterns and gaining knowledge about scientific publications in general has been known for some time (Boyack et al. 2002; Cronin 2001; Garfield 1979; Leydesdorff 1994), and there is vibrant research into improving our understanding of sciences in general through insights offered by mining e.g. the inter-linkages between publications, authors, institutions, funding agencies, places, and knowledge domains (Börner and Scharnhorst 2009). Some attention has also been paid to revealing “narratives of science” through networks between publications (Cronin 2001) by semi-quantitatively summarizing the position of a particular paper in its research domain, its arguments and claims in relation to cited works, or its relationships with other domains of science (c.f. Uren et al. 2006). With the increasing sophistication and ability to personalize the web experience with Web 2.0 technologies it is clear that bibliography management can do better to support the act of doing research rather than just the task of compiling bibliographic entries for a printed journal typical of most popular software, e.g. BibTeX and EndNote (Wilde et al. 2008).

Li et al. (2002) developed an early prototype ClaiMaker to manually annotate claims in documents and provide typed relations between papers in a way that enabled inference about debate and ideas in the literature. The AI literature has recently devoted significant attention to argumentation-based methods (Bench-Capon and Dunne 2007). Börner et al. (2010) discusses challenges to facilitate the use of “computational scientometrics” and mention at least nineteen existing toolkits that support acquisition, analysis, and visualization, for example CiteSpace (Chen 2006), Citation Mapping Tool of ISI Web of Knowledge (Thomson Reuters 2010), and Network Workbench Tool (NWB Team 2006). Typically these tools access either online or local network databases, such as ISI Thomson Scientific or the National Science Foundation (NSF) award database, and perform an array of advanced network analysis, modeling, and visualization of research metadata in physics, biomedical, and social science (Herr et al. 2006). Many outputs come as terse visual summary graphics, often with interactive capabilities such that users can navigate the network and explore other perspectives on e.g. collaborator structures or funding resources. For a complete picture, it is obviously necessary to support full interoperability for a variety citation databases such as ISI, Scopus and Google Scholar. Alternatively, a “unified citation index” has been stressed in the scientometrics field (Cronin 2001).

Another potential to enrich the context of a research report is to clarify the personal and institutional relationships that may exist among the authors and citations presented in a research paper. The importance people in general pay to social networks is manifested by the surge of social networking sites on the web e.g. Facebook, MySpace, and LinkedIn, and interest seems to increase by the day. While many of these platforms, at least potentially, allow for some kind of tracking down people you know (or would want to know) all of them are essentially data silos with little or no possibility for cross linkages. Some existing software have started to address this need and seek to leverage web techniques and collaboration abilities such as refWorks (http://www.refworks.com/), ShaRef (http://www.dret.net/projects/sharef/), and Mendeley (http://www.mendeley.com). But we argue that Web 2.0 can do better and several technologies are specifically developed to support annotation of personal and citation networks. In particular the Semantically-Interlinked Online Communities (SIOC) initiative (http://www.sioc-project.org) together with the Friend of a Friend (FOAF) vocabulary (http://www.foaf-project.org) have recently achieved increased adoption as a formal and open standard for expressing personal profile and social network information. Similar to how HTML describe linkages between sites and resources on the web these standards allow users to annotate anything they create, use, or comment on with links to their own identity. It is perfectly feasible to use these social mark-ups to generate social and affiliation networks based on individual web pages through which researchers’ influences and institutional settings can be mined and explored in a manner similar to what is presently done with citation networks. This would open possibilities to investigate even further some of the social construction behind research, such as particular views promoted by groups of scientists, and lead to richer insight into complexities of the social dynamics in “doing science” (Kuhn 1970), including visual analytics (Skupin 2009). For example, in order to answer questions about what institutional heritage that may influence a researcher’s motivation and perspective we could use these social networks as an indicator of how ideas and research traditions get passed on from a mentor to advisees. A particular lab or school may spawn many scholars that now operate in new environments but still carry on a legacy of their adviser and the institution that formed their thinking. Currently, it takes a long time and exposure to a field for someone to recognize such provenance issues. Enhanced means to expose such dependencies in a live paper would increase the transparency of research reporting.

The live data and methods

The data section of a research report typically provides readers with a detailed description of the data used in the researcher’s analysis. Information may include the type of data, where it is from, how it was collected, the time when it was collected, the geographical boundaries or dispersal of the data, and all other necessary information (i.e. resolution, parameters, fields, summaries, etc.). In that sense it serves the same purpose as the metadata that often accompanies data bases (e.g. following FGDC metadata standard). In the live research paper, the data can become a key resource for scientific exchange. This description of the data should, in an ideal situation, be sufficient to arm readers with the information that would be required to repeat the experiment. The data description should also provide readers with additional contextualizing information which helps to further describe and frame the researcher’s assumptions and hypotheses. Usually, the data itself, such as satellite images, maps, analysis records and survey results, are not supplied as part of the traditional publication for the obvious reason of infeasibility. With the prevalence of electronic publication and Internet resource access, direct provision of actual data has become increasingly feasible. Of course this refers strictly to practical feasibility, as there remains a question of the researcher’s willingness to provide raw data to the research community, let alone to the general public. Yet, with the contemporary progression toward the availability of diverse and expansive spatial data on the World Wide Web, it is possible to access, display, and/or manipulate raw data in a live report.

The need for Geographic data portals became a top priority during the 1980’s when many national, and international organizations launched Spatial Data Infrastructure (SDI) programs in order to provide better access to geographic information (Maguire and Longley 2005). The results of these efforts are manifested by many Geoportals and SDIs that provide easy access to data documented by metadata. However, for complete duplicability of an experiment, neither the data nor the metadata alone is sufficient. Researchers applied a methodology to the data, for conducting the experiment or collecting material for analysis from the field. Traditionally, the methods of the experiment are described in detail, or previous works on employed methodologies are referenced. By enhancing the live Data with live Methods, the data become easily manipulated by the readers based on an author-defined set of built-in parameters or criteria. The live Methods might allow readers, for example, to run codes written by the authors, to initiate the processing chain designed by the authors based on Web services, or to replace existing built-in parameters and/or routines with the readers’ specification. The action of the experiment could then for example be applied to other data of more recent time, another research place, or finer resolution than the original data, which could be obtained, synthesized, and concluded by the readers. As an example, using one such existing virtual research environment, in myExperiment (http://www.myexperiment.org) users can share “digital research objects” such as “workflows” and “files” to facilitate various studies by multiple different researchers (Goble et al. 2010). Benel and Lejeune (2009) provides another example with experiments showing how live data addition and analysis using tag information of texts could benefit from Web 2.0. By adopting a service-oriented architecture (Foster 2005), a live data-and-methods section become services offered by interconnected servers and service chains (Friis-Christensen et al. 2009).

Thus, readers can “use” the paper, and their interactions with the live Data and Methods can allow them to ask and answer questions, and explore alternative conditions or scenarios through direct and instantaneous engagement with the experiment. The live research paper environment allow these operations to be examined, ideally step by step, by reviewers and readers alike, and to be executed under the control of the readers, accepting readers’ interactions and parameters input. In a sense, a paper written/implemented with these live elements resembles a software application.

The live discussion

Similar to the introduction section, the discussion in a paper gives the author(s) an opportunity to reflect on their findings and further tie their arguments back into the existing literature. Because of this we anticipate that similar mechanisms, that was presented in the “live introduction” section above, applies to a live discussion. But more importantly, the idea of a live ongoing discussion around a research report would have a much more fundamental impact on the research communication process. There is an ongoing discussion about the merits of both peer-reviewed journals and open-contribution sites. The live paper approach offers both. Papers can be submitted for peer reviews and not published until the data is vetted properly. Papers can also be published without the reviews and stand on their own merits. In both cases can comments from readers be solicited in the form of discussion boards, but in a live paper environment the data and experiments can become part of that discussion since all users will have the ability to create their own modifications and make these available for scrutiny and discussion as well.

Once a live paper is published, adding crowd-sourced techniques and statistics (Brabham 2008) to the traditional peer-review. Web 2.0 social media protocols can be leveraged to provide academic publishing with two types of review: that of experts and that of the general public. Crowd-sourcing could also be used to highlight exemplary papers and pollinate them across disciplines. In the same way websites such as digg.com provide a valuable filter when drinking from the internet fire hose, crowd-sourcing can sift scientific data into manageable streams. Researchers, as well as institutions, could also benefit from usage data. For instance, just as amazon.com knows how many people looked at purchasing Darwin’s Origin of Species, institutions would have tangible data on the impact of their publications culled from site hits and reference metrics.

This still leaves us with what we have identified as two big obstacles of a live paper system; authentication and copyright. A lot of the functionality of the live paper system, as outlined above, depends on authentication which in the case of access to citations is rooted in journal subscriptions and copyright. Authentication within an institution is relatively easy but problems arise when people want to collaborate across different institutions and thus different authentication domains. Without a federation of authentication servers or a neutral third-party it may be difficult to pass authentication requests across domain boundaries.

Related to this obstacle is the copyright issue. To maintain a copyright-based system a scheme will have to be agreed upon for live papers within the system, and access to previously published and copyrighted materials will also have to be arranged. This arrangement would have to address both the access to older material and encapsulation of the material for offline reading, possibly by consumers that do not hold a subscription. There is also a blurring of the boundary between ‘original’ and ‘duplicated’ works; if somebody take your own analysis workflow and replace the input data with their own, would that be enough to call it a new paper?

One route forward would be to take academic publishing in-house. This might sound like an expensive, unfunded mandate. Universities and research institutes though, pay millions of dollars in subscription and acquisition fees to provide faculty and students access to academic books and journals (Houghton et al. 2009). These fees could be transitioned to pay for infrastructure, both machine and human. The difference between the existing model and our proposed system is that after funds for a live paper system has been spent the research belongs to the university it was created at and is available to the public, free of charge. The live paper model also scales more efficiently than the traditional paper system. If ten big universities, or even eleven, pooled their funds they could create redundant data centers using internet pipes they already own. Their staffs are already used to working on distributed servers across the hall and even across the state, therefore physical location would not be a serious issue. There are technical issues that will have to be overcome, but the critical roadblocks to live paper publishing will be institutional.

In the introduction we mentioned the possibility to track and evaluate research traditions, heritage, and legacy through social and institutional markups. With tractable data and methods this aspect could have a profound impact on reward, tenure, and promotion processes. It is well-known that different scientific fields put different emphases on open access to data and methods. One of the reasons to protect one’s own dataset from open scrutiny is the embedded issue of tenure and promotion which all too often rest on the number of publications that one manages to produce, and this is sometimes intimately linked with making the most of a hard-earned data set. At an individual level it makes perfect sense to be protective about a primary data set, but from a perspective of the greater good it is likely to be more beneficial if it could be made openly available for anyone to further our knowledge. Part of the solution can be to better reward the creation of important data for others to conduct research on, but unless there are ways to reliably measure the impact of such dataset this is not likely to happen. Again, the framework put forth here can provide a means to that end since many data sets could be made openly accessible through a license agreement and use of a key that not only unlocks the data resource, but also provides a means for tracing its use. Any live report that uses such a data set in the process would then automatically carry the data origin through the key and enable automatic tracking of its use in experiments.

Concluding discussion

Live papers ultimately point to a fundamental shift in academic publishing. We argue that enlivening the research paper—through solutions such as the concept of the live paper outlined in this editorial—can bring huge changes to the processes of science. We have demonstrated that readers of live papers can actively participate in the conceptualization of a problem as well as data analysis and not be limited to receive one perspective on the problem from a traditional publication. It would allow for research to be communicated with greater depth, detail, and complexity, thus enhancing the transparency of experiments and their design, as well as promoting a more dynamic exchange of knowledge and scientific inquiry between scholars, practitioners and the public.

So, when are we ready to start [writing, reading, versioning] live research papers? In this presentation we have addressed this question from at least three perspectives. One is related to developing enabling technologies, the second is related to when we as both creators and consumers of research papers are willing to change our own practices, and the third is related to when existing institutions, such as promotion structures, publishers, and libraries, are ready to change. As we have shown in this paper, we believe that the necessary technology for The Live Paper already exists and can be implemented at any time. We are certainly not the only ones who are thinking in these directions (c.f. Lorimer 2010) but we believe that geographic information science researchers have a special opportunity to lead these developments because of the nature of our discipline, data availability, and analytical tool kits. As for the second question, we think that, in a similar way the internet and electronic dissemination of information have had a relatively fast uptake among practitioners, The Live Paper could fast become an accepted form of research publishing because of the apparent benefits to the research venture. Unfortunately, as we have also discussed, there are several issues, including copyright and institutional barriers such as promotion and tenure processes, hindering a rapid transition into a Live Paper centered research reporting.