FormalPara Key Points

Artificial intelligence (AI) algorithms can process and analyze pharmacovigilance-related data but need to be first trained with good quantities of quality data, which is the fundamental issue to be addressed.

The technical challenges for AI-based pharmacovigilance in resource-limited settings are lack of high-quality databases, insufficient human resources, weak AI technology and insufficient support from governments.

AI-based pharmacovigilance detection, improving training and education, and informing government of the benefits of AI-based pharmacovigilance help to solve these challenges in settings with limited resources.

A collaborative research network, pharmacogenomic research and practices, and advanced machine-learning algorithms will improve AI-based pharmacovigilance in resource-limited settings in the future, but it is important to consider the particular contexts.

1 Introduction

Pharmacovigilance (PV) aims to reduce the incidence and severity of adverse effects by collecting, monitoring, researching, assessing, and evaluating relevant information [1]. It plays a significant role in improving clinical care, drug regulation, and public health, and prevention of potential harms from approved medicinal products [2].

PV in the low- and middle-income countries (LMICs) is resource-limited. This is reflected in various ways. Human resources are insufficient. Healthcare professionals (HCPs) are busy with huge workloads. In China, it was reported that HCPs in the outpatient departments of large hospitals serve around 100 patients per day [3]. It is hard for HCPs to conscientiously spend extra time on filling out the individual case safety reports (ICSRs). Furthermore, the electronic health record (EHR) system or PV system may not be smart enough to assist HCPs. One study reported that it took an average of 53 seconds for a well-trained HCP to report an adverse drug event (ADE) within the EHR system [4]. Due to a lack of training opportunities and investment in education, a large number of adverse drug reactions (ADRs) go unreported. A study estimated that the proportion of unreported ADEs in clinical practice could be up to 90% [5]. Underreporting and selective reporting cause sampling variance and reporting bias [6]. There is a lack of real-world data, EHRs, and insurance claims databases, which makes it difficult to estimate the true risks of medicines use [7]. The above issues reflect another problem: lack of funds.

However, PV in LMICs is currently growing, although in settings with limited resources there is still progress to be made. Many LMICs have created national PV systems and joined the WHO’s global PV network in the past decades. Cohort event monitoring has massively increased and is continuing to be used for post-marketing surveillance [8, 9]. With the rapid development of artificial intelligence (AI) technologies, automatic processes have been widely used in various fields of medicine [10, 11]. Moreover, it is commendable that the PV community is generally open to technology. A variety of software, tutorials and the latest technological advances are available from open sources. This offers a unique chance for LMICs to improve every aspect of health care using AI. For example, in Africa, AI technology has been applied to improve the diagnosis of birth asphyxia in low-resource settings, and assist in the diagnosis of diabetic retinopathy, tuberculosis, etc. [12]. Another recent example occurred during the COVID-19 pandemic. In South America, mobile applications and web-platforms used AI algorithms (e.g., decision trees) to analyze the symptoms and to provide specific advice related to COVID-19 [13]. Machine learning, deep learning, natural language processing (NLP) [14], and other AI technologies have been adopted to improve PV systems [15, 16]. These technologies have been used to automatically high-throughput process or analyze PV-related information [17], such as the detection and extraction of adverse events from an unstructured text by NLP [18,19,20] and detection of potential PV signals in large databases using unsupervised Bayesian methods [21].

In this paper, we summarize the challenges for AI-based PV in resource-limited settings from the system, human resources, technology and government support perspectives, while providing possible solutions. We also discuss the future prospects of AI-based PV. A summary of the key points of AI-based PV in resource- limited settings is depicted in Fig. 1.

Fig. 1
figure 1

The key points of artificial intelligence (AI)-based pharmacovigilance in resource-limited settings. EHR electronic health records

2 Challenges of Artificial Intelligence (AI)-Based Pharmacovigilance in Resource-Limited Settings

2.1 AI-Assisted Reporting for Pharmacovigilance (PV) Database Establishment

Data is the key to AI technology. Therefore, it is important to establish a comprehensive PV database. Every country is in a unique situation when establishing their PV database. Generally, PV is often initiated by HCPs, starting with spontaneous reporting of ICSRs. Several large-scale databases for PV or PV systems have been building up in both developed and developing countries, such as the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS)Footnote 1 and the Vaccine Adverse Event Reporting System (VAERS)Footnote 2 in the United States, the pharmacovigilance database in France, China’s pharmacovigilance systemFootnote 3 and VigiBaseFootnote 4. VigiBase is maintained by the Uppsala Monitoring Center and contains data contributed from more than 150 countries around the world. However, for LMICs, the main issue when establishing a PV database is underreporting. This is due to a combination of multiple factors, including the poor infrastructure of reporting systems, low financial support, and the lack of human resources and relative policies. A group of data reflects the situation: Western countries (the United States, countries in the European Union, etc.) have contributed approximately 70% of the data in VigiBase, whereas only 0.9% of VigiBase ICSRs were contributed by Africa in 2019. The number of ICSRs received by the National Medicines Regulatory Authorities in Kenya, Ethiopia, and Tanzania (mainland) were 35.0, 6.7, and 4.1 per million inhabitants, respectively [22]. Alongside underreporting by HCPs, self-medication accounts for a large proportion of medicine use in LMICs, particularly in rural and remote areas, which may not be recorded by any medical records. Hence, how to use AI to assist in increasing the reporting rate of ADEs would be the main priority in resource-limited settings; for example, by extracting unreported ADEs recorded in the EHR [23].

2.2 Human Resources Challenges: Training and Education

Inadequate training opportunities and education leads to a shortage of HCPs for PV in LMICs. A traditional PV system utilizes professionals in four areas: operations, surveillance, systems and Qualified Person for Pharmacovigilance (QPPV). In addition to drug safety supervisors, drug safety physicians, data/system security administrators and QPPVs necessary for traditional PV, AI-based PV also requires experts in various fields of AI, such as engineers for developing NLP and machine learning algorithms. This is another huge challenge for LMICs. Training for AI-based PV is cross-specialty, even cross-language, and is offered in only a small number of countries, with few LMICs on the list. Although AI technologies have been put into use for PV [1], there are currently few specific courses for AI-based PV, even on a global scale, and opportunities to attend such courses may be limited by insufficient financial support for HCPs from LMICs. Despite this, training on AI technologies, such as deep learning, NLP, and data mining, could be carried out for HCPs in developed countries. However, AI training opportunities in LMICs are in short supply. This type of training is beneficial for AI-based PV since the use of these advanced techniques helps data processing as well as data analysis.

A shortage of professionals for AI-based PV is common in LMICs. Education would be a way to increase the number of experts in both PV and AI. The biggest step for AI-based PV in education would be to introduce courses to undergraduates in teaching institutions for students in related majors. Education on AI-based PV could not only cultivate more interdisciplinary professionals but also improve their awareness of PV data. An example of insufficient education can be found in China; although it has established an online spontaneous self-reporting ADR monitoring system in recent years [24], a study of PV in Western China showed that the ADR reporting rate was extremely low because of unclear feedback pathways and a lack of understanding about the seriousness of ADRs [25]. For AI education in LMICs, increasing the rate of higher education is the main challenge. Also, the interdisciplinary nature of AI-based PV would be a problem for AI education, as it requires AI engineers to be familiar with medical knowledge.

2.3 Technological Challenges

There are two technological challenges for AI-based PV in resource-limited settings—data integration and data annotation.

Each medical institution supports its own data model, which may differ from those of others, and each type of data model may be unique, with differing terminologies and value representations [26]. Data from different sources needs to be integrated and converted into a common data model (CDM)-compatible format among multiple institutions. This requires both HCPs and engineers to spend time and effort transforming the data. In LMICs, however, language might be a barrier to data transformation, because it needs to be managed in several local languages or expression habits.

Data annotation is important for AI-based PV because the algorithms and training models rely on annotated data to make predictions on those that are not annotated. The lack of high-quality data for training the model and the high cost of annotation restricts research on AI-based PV. In a structured database, the potential PV signals are not annotated even though a large number of ICSRs exist in these databases. Additionally, unstructured data such as clinical notes, medical records, biomedical literature, and social media posts enrich PV information more than structured data. A study found that only 28.6% of adverse reactions to statins were recorded in a structured format in the hospital information system (HIS), while the rest were recorded in unstructured clinical narratives [27].

Nevertheless, the biggest technological challenge for AI-based PV is term variation. The terms used (e.g. the names of drugs and diseases) are less formal and highly variable, the description of side effects may not be clear, and poor grammar and spelling mistakes may be common [28, 29]. Besides, the diversity of local languages makes it difficult for data processing, integration, and normalization. Concept normalization helps to solve this issue and improve data quality, reduce dimensions, increase retrieval recall, and integrate various data sources. However, concept normalization is an intensive and tedious task. Sources from social media [30], medical records [31], and biological literature [32] contain a large number of aliases, abbreviations, and informal terms that require concept normalization to achieve entity alignment. Nonetheless, the diversity of language in LMICs and lack of mapping between local languages and standard vocabularies are major gaps. This weakens the technology migration of mature technology platforms in developed countries.

2.4 Regulations and Funding for Pharmacovigilance

In AI-based PV, government plays a critical role in decision making. For most LMICs, the PV systems in their countries are young and are not underpinned by strong legal or regulatory provisions, and thus require stronger regulations to engage local industry and HCPs [33]. Although many of these countries have developed PV guidelines, the regulatory basis for the PV system is sometimes too weak for the authority to enforce regulatory actions, even in resource-limited countries with relatively well-functioning PV systems [33]. Other regulations related to AI-based PV are those around data safety and privacy protection. This is vital as AI technologies need to access a large amount of personal information, and the leak of this information may result in severe consequences.

Moreover, the lack of financial support is another challenge. This includes the support of existing systems, improvement of information infrastructure, remuneration of HCPs and AI engineers, and education. Many countries may have set up their PV system, however, without a sustainable budget to support it, the system will not work well in the long term. In this regard, China and India provide good examples whereby financial support has led to these countries establishing sustainable PV systems [33]. In addition, the development of AI demands financial support for both hardware and software. Hardware is the foundation that provides a platform for software to run, while the application of AI software realizes the operation of the system. This can be a huge expense as the equipment in LMICs is often lacking. Governments therefore need to issue relevant policies not only to guarantee the budget but also for the education of HCPs and AI engineers.

3 Recommended Solutions

3.1 Electronic Health Record (EHR) Improvement and Establishing a Database for an AI-Based Pharmacovigilance System

EHR improvement helps to store and manage clinical data. Integration of EHRs into the health care systems of LMICs is still in the nascent stage [34]. In addition to releasing relative policies to ensure the implementation of EHRs in the health care system, the improvements should focus on workforce training, system quality and integrity. Critical factors include identification of the patient, standards for data exchange, education and training, storage of EHRs and quality assurance. Important information such as laboratory results, medical records, and medication orders in EHRs can be integrated and used for AI-based PV modeling. Liu et al. reported that medication orders and inpatient laboratory test results helped to validate and identify ADRs [35]. In MADE1.0 [36], researchers developed algorithms for drug safety surveillance in unstructured clinical notes and achieved micro F1 scores from 0.826 to 0.868 [37, 38]. The performance of AI-based PV detection can be further improved by integrating with external databases [39]. Embedding AI algorithms into the EHR system can monitor PV signals actively, send reminders, and fill in forms automatically.

In addition to EHR, biomedical literature and the social network are other approaches for collecting PV information. As complementary data, these PV signals proved to be valuable information in public health [28], and such enrichment of the database benefits AI-based PV by minimizing the biases caused by data incompleteness. Compared with countries with complete PV systems, this multi-sourced information might be more important in LMICs because it could compensate for their incomplete PV systems and provides reasonable information to local HCPs for prescriptions. While the literature is written in natural language and published in the database for retrieval and acquisition, integrating this data requires NLP-related technologies such as named entity recognition, named entity relationship extraction, and concept normalization [40]. Similarly, PV information shared through social media, such as Facebook, Twitter, and websites, is also unstructured text and needs to be processed as a complementary source of formal ADE reporting for LMICs. With the application of NLP, this information could also be collected for the database and used for AI-based PV.

3.2 Offering Training Opportunities and Courses in Education Institutions

The early development of national PV projects largely depends on a small number of trained professionals and their ability to inspire and train others. It is important to integrate PV within the curricula for health training institutions. India has mandated that all medical schools include PV training in undergraduate courses [33]. AI-based PV classes are suggested to be set up at graduate schools of computational medicine or biomedical informatics. This is worthy of consideration by all LMICs as this will raise the capability of informaticians or programmers to utilize AI in the PV domain and improve the practice of data reporting, which is an ongoing process for many countries or regions. Furthermore, related training in PV practices based on machine learning, information retrieval, knowledge organization, etc., can be beneficial for resource-limited countries. In addition, as AI-based PV is interdisciplinary, it is necessary to train multidisciplinary professionals. However, this is still in the exploratory stage.

3.3 Implementation of AI Technologies

As computers and network technologies are rapidly becoming available in healthcare systems, medical data is expanding. Although it requires greater investment and infrastructure to develop AI technologies for LMICs, it is and will be a great opportunity to catch up with the big data era and leverage the latest technology. For data processing, with the development of medical vocabularies that are interoperable among countries, such as SNOMED CT, LOINC, RxNorm and MeDRA, concept standardization becomes feasible in the pharmacovigilance area, which will then help improve the accuracy and recall of side-effect signaling systems. In resource-limited settings, mapping local nonstandard terms to standard vocabularies will greatly improve data quality and promote international PV research and practices.

3.4 Government Supports

Governments can guide and regulate the development of AI-based PV in terms of funds, talent policies, laws and regulations, etc. In LMICs, there are a lot of problems in establishing AI-based PV systems, such as a lack of financial, material and human resources. However, it is important to make the government aware of the benefits and cost effectiveness of such systems. For instance, AI can minimize human workload in data reporting and mining, and help in analyzing vast medical data, potentiating predictive and preventive healthcare. With knowledge of the benefits comes the need for regulations to be issued to (i) increase efforts for AI-based PV development, including funding, training and education, infrastructure and technology; (ii) ensure the operation of the system; (iii) supervise the enforcement of the regulations; and (iv) protect data security and personal information security.

4 Future Perspectives

In general, AI-based technologies can automate or facilitate almost every aspect of PV in case processing, reporting and risk tracking, which reduces the total processing time [41, 42]. Four potential future directions of AI-based PV in resource-limited settings are proposed.

4.1 Multinational Collaboration for Data Sharing and Validation

Multinational collaboration is a feasible mechanism to collect data to build up an AI-based PV platform for LMICs. Under this mechanism, LMICs share and verify data with each other. The data becomes rich and diverse, covering multiple regions and remaining highly representative across geographies and subpopulations. Based on this, the AI-based PV platform can analyze, discover, and validate PV signals across countries.

In some LMICs, such as those in the Economic Community of West African States, PV plays a part in regional or subregional initiatives involving economic communities. PV centers are established in one or two countries at first, and are used by the others [33]. In this situation, a CDM is an important part of the multinational collaboration. Based on the CDM, countries can integrate data from different sources, share standardized data, and analyze data with the same tools. Countries should choose or design a CDM according to their own circumstances. Rivera et al. [43] proposed five key areas and 12 consensus recommendations for designing sustainable linked data resources to generate actionable evidence in healthcare research. VigiFlow allows LMICs to maintain a database at low cost and helps them report ICSRs in compliance with international standards so that new safety signals can be identified early [33]. There are various tools or systems for data exchange, such as OHDSI [44], The FDA’s Sentinel System [45], CNDOES and ADVANCE [46]. Of these, OHDSI provides vocabularies and software for the full process of collaborative research, from data conversions to data analysis, and maintains an active community involving a large number of HCPs, researchers, engineers and enterprises. These experiences are beneficial to multinational collaboration and are worth learning from.

Additionally, databases from developed countries, such as FAERS and VAERS, may be leveraged to help inform PV in limited resource settings. On one hand, LMICs are still accumulating post-market evaluation data for drugs, biologics, vaccines, etc. Being informed of PV signals from these databases is therefore beneficial to improve the safety of drugs and protect public health. On the other hand, the geographic correlations between races and ADR reporting rates in developed countries can be analyzed by integrating these databases and the population data. This may uncover potential racial differences in ADRs and provide some useful information for LMICs with common races [47].

4.2 Integration of Multiple Data Sources to Discover and Verify PV Signals

Spontaneous reports from HCPs and consumers have some limitations, such as bias and under-reporting. At present, there have been studies using AI algorithms to detect PV signals from other data sources, such as biomedical literature [48, 49], knowledge graphs [50] and genetic data [51]. These data sources are complementary, observing drug safety from different perspectives.

Mower et al. [52] pointed out that using biomedical literature and spontaneous reports together performed better than using either source alone for drug side-effect prediction. Zheng and Xu [53] constructed a disease comorbidity network from FAERS and found that it correlated with the human genetic network and the disease treatment network.

Some ADRs are related to inherited characteristics of the genome. With the integration of pharmacogenomic tests and the decision support system based on these data, PV signal triggering will become more accurate on the personal risk of potential drug side effects [54]. In the case of limited resources, if it is known that some biomarkers are especially prevalent, ADRs can be anticipated and prevented in advance.

Two typical examples are HLA-B*1502 and HLA-B*5801. Non-carriers of HLA-B*1502 and HLA-B*5801 have reduced risk of two severe cutaneous adverse reactions, Stevens-Johnson syndrome and toxic epidermal necrolysis, caused by carbamazepine and allopurinol, respectively, whereas these two biomarkers are widely prevalent in Asia [55, 56].

Moreover, Internet of Medical Things (IoMT) devices are capable of measuring the heart rate, blood pressure, temperature, blood glucose etc. of a patient [13]. This allows researchers to analyze changes in health parameters during ADRs and helps to determine the cause [57].

4.3 Expansion to Special Populations and the Dynamically Evolving Landscape

An important issue in resource-limited settings is that special populations, women, children, indigenous populations, etc., have low representativeness in previous data sources or evidence systems. Furthermore, due to the development of new drugs and shifts in population characteristics, limited resource settings and their associated populations may evolve.

In general, the above issues can be generalized as continuously increasing population coverage. Reactive learning algorithms can handle the dynamically evolving landscape. The algorithms sample the most valuable data from a large number of new data by similarity measurement and sampling strategy. Such algorithms can cover more diverse populations with fewer samples, which also means lower costs, and can continue to adapt to the dynamically evolving landscape [58]. For example, during the COVID-19 pandemic, investigators in Greece used a reinforcement-learning system for target data collection to maximize the detection of infected asymptomatic travelers. Compared with random sampling, the reinforcement-learning system improved testing efficiency up to 2–4 times during peak travel [59].

4.4 Routine, Affordable and Sustainable Regulatory Monitoring

In resource-limited settings, without sufficient self-driven effort from multinational companies, which are mainly based in developed countries, a routine regulatory requirement enforced by the government is an essential way to promote effective monitoring of drug side effects. Cost-effectiveness analysis of the pilot projects mentioned above will help evaluate the affordability of such technologies [60]. Multi-stakeholder involvement will be the feasible solution to maintain a sustainable AI-based PV system on a national level. Government, pharmaceutical companies, high-tech IT companies, hospitals and patient advocacy groups are all important stakeholders and they can contribute different perspectives.

It is important to consider particular contexts in resource-limited settings when developing and implementing AI-based technologies for pharmacovigilance. For instance, (i) government-driven regulatory monitoring is essential since there is not sufficient self-driven effort from multinational companies; (ii) the required computational resource needs enough financial support, which might be complemented through strategic collaboration; (iii) the need for data collection to be expanded to special populations (women, children, indigenous populations, etc.) is noteworthy because there was low representativeness in previous data sources or evidence systems; (iv) the AI-based PV system should cover popular drugs that the local people accept, such as Western drugs and herbal medicine; (v) manpower is key to the sustainable development of AI-based PV, thus education and training in resource-limited settings will play a crucial role.

5 Conclusion

The advantage of an AI-based PV system includes automatic detection, an evidence generation network, multiple data source integration, AI algorithms optimization, and concept standardization, which could largely help to minimize the human labor workload and facilitate development of PV. The challenges for LMICs can be summarized into four categories: establishing a database, lack of human resources, weak AI technology and less-than-optimal support by the government. The implementation of AI technology can reduce the time-consuming burden of manual case processing. Using these advanced technologies presents an opportunity for LMICs to greatly improve existing PV systems and make PV more comprehensive and affordable.