A recurring theme in our editorials has been the limited success of data sharing in practice: too few neuroscientists share their data or models.Footnote 1 A growing number of stakeholders have become concerned about promoting data sharing, as evidenced by a recent survey by a major publisher.Footnote 2 To improve data sharing, efforts have been made to encourage more scientists to contribute their data by giving them rewards, like the citable data paper,Footnote 3 or by contacting them personally based on publications listing suitable data Footnote 4 and publicly listing both available and unavailable data, including non-responders.Footnote 5 Databases containing shared neuroscience data have been made easier to use and find.Footnote 6 But surprisingly little has been done to help motivated neuroscientists organize their data in such a way that it is ready to be shared,Footnote 7 despite the huge opportunities for improvement.

It is a sad fact that - although most scientific data is acquired digitally - very few research laboratories in academic environments have adopted modern data management practices. Though it may seem an improvement to the scientist involved, there is really not much difference between having data in a bunch of poorly named files on a student’s hard drive compared to having boxes with tapes or microscopy slides on a shelf. In both cases the data is not annotated, not searchable and poorly linked to the description of the experiment, which, even now, is usually still handwritten in a lab notebook. In some aspects the current situation is even worse because new problems like poor back-up and security policies and obsolescence of data formats have arisen.

The solutions to poor data management seem fairly simple and can be summarized as:

  • annotate data at the time it is generated, with as much automation as possible; this also applies to processed data

  • digitize everything, including especially the lab notebook

  • store all data in a centralized database with proper backup, archiving, access control, security, etc.

Note that the first two goals strongly affect the workflow during actual experiments at the bench, while the last goal is really a central facility that can be deployed at different levels of granularity (the lab, department, institute, …). Few scientists worry about data annotation and, if it happens, it is either a feature of the capturing software - the most logical solution - or annotation is added at a later date when requested 7. Which data are not yet digitized is very domain specific, but the boom of cheap sensors driven by the smartphone revolution makes it likely that everything will be recorded digitally soon. Setting up a quality database that can handle a large variety of data sets is a daunting prospect for the average scientist and should not be built from scratch. Instead the future should bring easy to install and use products comparable to, for example, iTunes to manage music and video collections. Note that putting data into the cloud is usually not a complete database solution. Though the cloud assumes backup and access control, it often is just a file repository (e.g., dropbox), lacking the essential indexing and search facilities that a real database offers.

The astute reader may have noticed that I did not mention lab notebooks in the preceding paragraph. This is because, although it has received relatively little attention in the neuroinformatics community, much recent progress has been made in electronic lab notebooks (ELNs). Several viable commercial solutions are now available that are suited for labs in academic environments Footnote 8 and many labs are trying out generic products like Evernote. Footnote 9 , Footnote 10 Many of the ELNs are built on the software infrastructure that gave us social media and therefore emphasize the collaborative and interactive aspects of research in a lab. For example, LabGuru Footnote 11 is a web-based environment that allows lab members with different roles to interact on scientific projects and, because one can access its functionality from mobile devices, it is easy to use at the bench. But while it has many interesting features, including ways to specify where samples and reagents are stored physically in a lab, it does not provide real data management. Until recently, database management was available only for expensive ELNs aimed at industry, like Accelrys ELN,Footnote 12 but it is now being offered by RSpace ELN Footnote 13 which, interestingly, builds on a project that was originally neuroinformatics.Footnote 14

ELNs coupled to databases for local research projects may be the perfect tool to introduce naive scientists to modern data management, because they extend a familiar concept that is central to their daily research work. Nevertheless, despite the rapid progress of ELN development, their impact is still limited. 10, Footnote 15 This reflects the rather conservative attitude of scientists towards anything that impacts lab culture, a poor training in basic computer science and the lack of perception of a problem. It is noticeable that in the survey mentioned earlier 2, only 20 % of participants responded that they lacked the resources needed to share their data while the overall lack of data management procedures suggests that, in practice, this is a much more widespread issue.

Returning to neuroinformatics, it is quite obvious that current ELN development is mainly geared towards molecular and pharmaceutical sciences. There are no neuroscience examples on the slick websites of the companies developing these products. Conversely, there are very few neuroinformatics projects that develop solutions aimed to be deployed by a research lab, Footnote 16 the focus remains on top-down approaches that promote centralized databases and the development of tools needed for such databases, like ontologies and data standards. The ELN community, by contrast, seems to offer a more practical bottom-up solution that also makes the data actually available for effective sharing. For example, records can be ‘tagged’ 11. This is really a form of data annotation but in an intuitive, non-standardized way. By providing good templates for many experimental protocols these tags become, however, standardized by default. It seems time for the neuroinformatics field to investigate whether the current ELN systems can fully serve our community. If so, we should strongly promote the use of ELNs in neuroscience labs. If not, we should encourage projects to extends ELNs to neuroscience because their potential to enhance data sharing is obvious.Footnote 17