Statistical research functions much like a supply chain, beginning with the identification of demands, such as open problems in science and technology, and the gathering or generation of data (e.g., through surveys, clinical trials, or scientific experiments). These demands and data fuel the creation of innovative analytical solutions, encompassing fresh theories and methodologies for study design and data analysis, and, increasingly, the development of software tools to disseminate these solutions to end-users. To ensure proper and effective utilization of new methods and tools, it is crucial to systematically evaluate them, enabling users to understand their properties, capabilities, and limitations. Ultimately, their value must be tested and demonstrated by solving real-world problems. Thus, demands, data, theory, methods, software, as well as their evaluation, validation, and application, all serve as vital components within this supply chain. It is important to note that this supply chain cannot function well without the presence of well-trained and creative statisticians.

Publication plays a significant role in disseminating new research ideas and results. However, within the current publication ecosystem in the discipline of statistics, not all components of the supply chain receive equal support. Most statistical journals primarily focus on publishing theories, methods, and their innovative applications to new problems. While some statistical journals have recently begun featuring software tools, manuscripts that describe new demands, data, and resources for method evaluation often receive much less attention.

When it comes to data, most statistical journals currently do not consider the publication of a new dataset in itself, as editors and reviewers tend to perceive limited novelty in terms of theory and methods in such a manuscript. However, data is the new oil for the twenty-first century. For example, the ImageNet dataset with tens of millions of human-annotated images has been an important driving force for the recent advances in computer vision and deep learning [1, 2]. The massive amounts of text data on the Internet enable training large language models (LLMs) such as ChatGPT [3,4,5], revolutionizing artificial intelligence (AI). Besides large datasets, high-quality small datasets also have tremendous value. For instance, finding optimal data analysis solutions for emerging new technologies (e.g., single-cell or spatial omics technologies) requires benchmark datasets with ground truth information, but many such benchmark datasets are small due to the difficulty of conducting experiments to collect ground truth information. Compiling these foundational datasets is non-trivial, requiring vast amounts of time and effort. Timely dissemination of these data, along with instructions on how to use them, will benefit the whole research community and accelerate innovation. While journals for publishing data have emerged in several other disciplines, it remains an uncommon practice in statistics. This has at least two ramifications for our discipline: First, it slows down the dissemination of data for innovation. While there are public repositories for certain data types (e.g., the Gene Expression Omnibus [6, 7] and Sequence Read Archive [8] database for microarray and high-throughput sequencing data), unified data repositories are not yet available for many others. Without a pathway to publish a valuable dataset (e.g., a benchmark dataset that can be used to compare different statistical methods) without new theory and methods, one has limited options to publicize the data and allow the community to benefit from it. Second, it weakens our workforce. It disincentivizes statisticians from leading the efforts to build critical data resources, which can take tremendous amounts of time but too frequently are not recognized enough by, for example, hiring or promotion committees who evaluate candidates based on their publication list. In the long run, this will result in a loss of related talent, creating a bottleneck in the supply chain of our discipline. Without timely access to high-quality data, our innovation will lag behind.

One likely solution is to enhance our publication ecosystem to support the publication of valuable datasets along with their usage instructions. For instance, the Journal of Statistics and Data Science Education considers “Datasets and Stories” articles, which describe “the pedagogical uses of multivariate dataset(s)” [9]. Statistics in Biosciences now welcomes submissions of articles that present broadly useful new datasets and resources for statistical research and education (e.g., [10]). These resource articles are published as part of the journal’s “Case Studies and Practice Articles.” Embracing this new publication type will offer new opportunities for innovation. For example, as more datasets with well-annotated ground truth are published, they may be used to build a crowd-sourced benchmark data compendium for comparing different statistical methods developed for a common task. This would allow for a more robust and less biased assessment of different methods. Improved benchmarks, in turn, can more effectively guide method developers in identifying effective models, techniques, and optimal directions to improve their methods. The publication of data will also allow us to better recognize the vital contribution of those who worked hard on generating, collecting, cleaning, compiling, and annotating the data, thus helping to cultivate the expertise indispensable for the prosperity of the discipline.

Data is just one example of how we can strengthen our supply chain. Similarly, improved support for the other traditionally under-supported supply chain components is also crucial. Timely introduction of new problems and challenges emerging from science and technology to the statistical community, for instance, could help statisticians participate in and contribute to important scientific innovations in the first place. This could be facilitated by publishing commentaries or perspectives by those who work at the interface between statistics and domain sciences. Besides benchmark datasets, there are many other types of resources critical for statistical research, such as computational pipelines for method evaluation and educational materials for emerging topics. They can be naturally published as resource articles. By encouraging these new types of publications, we can create a more balanced publication ecosystem to support statistical innovation. There are various avenues to facilitate this process. One option is for the statistical community to consider establishing new specialized journals tailored for these emerging content types. Alternatively, recognizing that the creation of new journals may involve substantial time and resources, existing journals could incorporate new article types, allowing the community to capitalize on their established platforms for a swift response to emerging needs and opportunities. Regardless of the chosen approach, the success of these new publication types in statistical journals hinges on editors, reviewers, and authors embracing a more inclusive mindset. It is crucial to acknowledge the vital role and significant contributions of each component in our supply chain. Ultimately, the productivity of a supply chain is determined by its bottlenecks. To enhance the success of our discipline, we must ensure that there are no bottlenecks in our supply chain.