1 Background

The use of national personal identification numbers (PIN) for records linkage across health facilities and population-based studies is limited in Sub-Saharan Africa (SSA) countries. The disintegration of healthcare research with different participant identifiers across studies creates methodological challenges in linking data from multiple sources to answer a diverse range of policy-relevant, clinical, administrative, and research questions. When there is a linkage of medical records across health facilities such as the national registers, researchers can design and implement diverse research studies, potentially linking with cohort/population-based data to answer research questions seldom answered in standalone investigations that often require large sample sizes [1]. And, importantly from a financial point of view, prospective observational studies (or rather prospectively reported exposure-outcome data) are not as expensive [2]. Large-scale epidemiological studies on morbidity and mortality have consistently utilized unique PIN to link individuals’ health facility records and registers with population-based observational and experimental studies and have been highly successful in high-income countries [2,3,4,5]. The linked data have been a source of state-of-the-art investigations influencing global, regional, and national policies on the prevention and treatment of diseases.

The situation in SSA is not the same. The existing health management information systems mostly rely on paper-based data collection systems, collect aggregated data on specific health indicators, and use separate unique identifiers. Having such systems, it is impossible to establish a clear linkage between client information and population-based survey/ cohort data, among others. Issues regarding the quality and accessibility of such data have been raised in the literature [6, 7]. It has been established that without a common identifier in at least two datasets, the linkage is rather impossible [1,2,3, 6,7,8]. Indeed, it is challenging to link census, demographic and health surveys, sentinel surveillance surveys, insurance, and other observational and administrative data across health facilities in SSA. Recognizing the inherent methodological and legal-ethical record-linkage challenges [1], this commentary aimed to provide recommendations as a foundation for improved linkage of electronic health records in SSA to enhance healthcare research.

2 Main text

Linkage of health services data is a complex process that requires health system thinking, multisectoral collaboration between the public and private sectors, and long-term investments. Of note, the disparity in the quality of healthcare between government and private facilities is large in SSA, where patients are willing to spend out-of-pocket in private clinics/hospitals just to avoid the long waiting times in government-owned facilities. For this reason, huge data volumes are owned by the private sector, hence their willingness to share data is necessary. The same may have the desired technical capacity and infrastructure to support data collection and linkages. Box 1 below contains a few examples of countries in SSA championing the linkage of healthcare data.

Box 1: Example of countries in SSA championing linkage of healthcare data

A few countries in SSA have paved the way and have proven successful given government commitments and collaborations with other stakeholders. Such examples include South Africa, where the Provincial Health Data Centre of the Western Cape Province has recorded over eight million unique individuals and 15 million attendances in a decade [9], whose linkage has improved over time [8]. In Tanzania, the Ministry of Health has invested in developing, evolving, implementing, and rolling out the centralized electronic Government of Tanzania Health Operations Management Information System (GOTHOMIS) for primary healthcare facilities in collaboration with different stakeholders, including development and implementing partners, targeting all facilities in the country [10]. Still, even in these countries, challenges exist in the national coverage of such systems, data quality, and linkage with cross-sectional and prospectively collected population-level healthcare data apart from the national registers and other Health Information Systems patient-level data.

Our recommendations below seek to stimulate critical thinking among healthcare stakeholders and governments in SSA, looking to the future of the infinite benefits of collecting linkable high-quality patient and population data, safeguarding participant information, and data sharing towards promoting open science to advance public health. Given this, herein are our recommendations:

  1. a.

    Despite the limited resources, the transition from paper-based data collection and storage systems to the use of electronic health data records, across all health facilities, and where possible, at all levels of care. Most SSA countries rely on specially designed registers/forms to mostly collect aggregated and seldom patient-level data, which has been proven to hamper the quality. Even where patient-level data is collected, challenges exist in linkage to other data sources. Electronic data collection systems are a prerequisite to improving data quality [6]. SSA countries implementing this transition will benefit from additional technical and financial support. At the health-facility level, strengthen the collection of individual-level apart from aggregated data often used to monitor key indicators. Efforts should also be made to improve the quality of electronic medical records for future data linkage.

  2. b.

    Capture the PIN across all healthcare data sources in the country to enable future linkage of multiple data sources. Medical records and health insurance numbers are useful but often are unique across healthcare facilities, hence impossible to link individual medical records, while population-based surveys and clinical trials use their identification systems. With proper legal and ethical guidelines, the national PIN is a potential form of identification, which should be scaled up for all individuals, and used in all healthcare data sources countrywide, as a common identifier [3, 9]. All other identifiers, such as health insurance, social security, and medical record number should complement PIN and not vice versa. Although conventional personal identifiers can be used [11], without a common identifier, the use of deterministic and probabilistic matching methods will pause a methodological difficulty or prove impossible [1, 3, 11]. Hence, independent researchers and organizations should be encouraged to capture the unique national PIN, if available, across all studies and follow transparent data-sharing protocol/ guidelines. Other important non-health-related data sources such as census, migration, and registrations of birth and death should also be electronic and must capture PIN.

  3. c.

    The protection of privacy and client/participant data should be taken into the highest possible consideration. The legal and ethical regulations in SSA should be enhanced following local and international standards to provide clear guidelines on collecting the PIN and other identifiable information, procedures for data anonymization, data access, data sharing, and protection of the intellectual property rights regarding data ownership [1, 9]. In addition, the local ethical review boards and data-sharing committees must be strengthened to deal with these sensitive ethical and legal issues.

  4. d.

    Centralized pooling, merging, and sharing of pseudo-anonymized data from multiple sources is highly recommended and should be by a dedicated government entity such as the Ministry of Health, National Bureaus of Statistics, or Commissions of Science and Technology [3, 9]. Researchers should not be allowed to access certain identifying patient information [3]. Procedures must be in place on when to update data linkage. To achieve this, the SSA governments should work with the private sectors and international partners to secure the necessary resources, such as dedicated servers, software, and technical capabilities necessary for data linkage, anonymization, sharing, and storage.

  5. e.

    The legal and ethical regulations must state when these linked data must be destroyed or are not available for research, e.g., once the research is completed by the individual or group of investigators, and should follow the national and international regulations [3, 9].

  6. f.

    Recognizing a bulk of resources invested in collecting, linkage, and storage of healthcare and related administrative data, the research institutions and the governments should provide a transparent and reasonable amount of money to be paid for the linked and pseudo-anonymized data access and to what entity, apart from the ethical clearance fees. Such information is essential to aid researchers in carefully planning their studies and include data access fees in their budgets.

  7. g.

    Public–private partnerships in the healthcare sector are encouraged in creating and using data systems and removing fragmentation. As noted elsewhere [3], “Without these partnerships, timely and comprehensive health information would not be available from private organizations and their patient populations to answer pressing health services and policy research questions”.

  8. h.

    In the Western countries, where the national PIN was implemented early, its importance was immediately recognized but has been appreciated even more over time. In contrast, the majority of SSA countries have functioned without the use of PIN. For stakeholders to appreciate the importance of healthcare record linkage and using PIN, training frontline providers in collecting patient-level data is paramount [9]. The educational curriculum in medicine, nursing, and pharmacy, amongst others, should put a strong emphasis on the collection and utility of quality healthcare data beyond patient care. This would encourage these crucial stakeholders to appreciate its importance and motivate them to do their part.

3 Conclusion

High-quality linked data in SSA are scarce though constitute a rich source of information for answering policy-related, administrative, clinical, and research questions. SSA countries should prioritize establishing a robust foundation for high-quality data collection and future linkage.