Skip to main content

Surveillance of COVID-19 using a keyword search for symptoms in reports from emergency medical communication centers in Gironde, France: a 15 year retrospective cross-sectional study

A Correction to this article was published on 25 August 2021

This article has been updated

Abstract

During periods such as the COVID-19 crisis, there is a need for responsive public health surveillance indicators related to the epidemic. To determine the performance of keyword-search algorithm in call reports to emergency medical communication centers (EMCC) to describe trends in symptoms during the COVID-19 crisis. We retrospectively retrieved all free text call reports from the EMCC of the Gironde department (SAMU 33), France, between 2005 and 2020 and classified them with a simple keyword-based algorithm to identify symptoms relevant to COVID-19. A validation was performed using a sample of manually coded call reports. The six selected symptoms were fever, cough, muscle soreness, dyspnea, ageusia and anosmia. We retrieved 38,08,243 call reports from January 2005 to October 2020. A total of 8539 reports were manually coded for validation and Cohen’s kappa statistics ranged from 75 (keyword anosmia) to 59% (keyword dyspnea). There was an unprecedented peak in the number of daily calls mentioning fever, cough, muscle soreness, anosmia, ageusia, and dyspnea during the COVID-19 epidemic, compared to the past 15 years. Calls mentioning cough, fever and muscle soreness began to increase from February 21, 2020. The number of daily calls reporting cough reached 208 on March 3, 2020, a level higher than any in the previous 15 years, and peaked on March 15, 2020, 2 days before lockdown. Calls referring to dyspnea, anosmia and ageusia peaked 12 days later and were concomitant with the daily number of emergency room admissions. Trends in symptoms cited in calls to EMCC during the COVID-19 crisis provide insights into the natural history of COVID-19. The content of calls to EMCC is an efficient epidemiological surveillance data source and should be integrated into the national surveillance system.

Introduction

All available sources of health-related surveillance data should be explored to better understand and predict epidemic outbreaks such as those we are experiencing during the COVID-19 pandemic [1].

The number of people who test positive by polymerase chain reaction or chest computed tomography images with characteristic lung damage is the most reliable indicator of the number of people with the virus. It is, however, heavily dependent on the screening strategy, which varies greatly from one country to another. The number of people entering emergency rooms (ER) with symptoms suggestive of SARS-CoV-2 infection was monitored nationally in France from March 16, 2020, with specific coding of the main diagnosis in ER summary reports centralized by Santé Publique France in the OSCOUR® Emergency Department Surveillance Network [2] set up in 2004. The French health authorities are also monitoring the number of COVID-19 patients hospitalized, admitted to intensive care units, and the number of deaths in hospitals and in nursing homes (EHPAD).

Although they are not currently used as monitoring tools, reports of the content of calls to emergency medical communication centers (EMCC) are a source of information that needs to be considered for health surveillance during such a pandemic. A scoping review on the utility of the use of calls-based syndromic data for surveillance of infectious diseases published in 2019 concluded on few reported examples while the system is perceived to achieve time gains in detection of outbreaks [3]. By dialing 15, French citizens can get medical advice and, if necessary, a medical mobile care unit can be sent to the scene. Content analysis of these calls may be useful in describing patterns of symptoms in the population during an epidemic period. We hypothesized that a symptoms keyword-search algorithm in free text call reports is a valid procedure as to perform this task when compared to human-based classification. If so, this result could be integrated into the main surveillance system for possible future epidemic waves.

Thus, we assessed keyword-based classification performance compared to human-based classification of clinical reports created from calls to an EMCC in France to monitor trends in symptoms potentially related to COVID-19 during the year 2020. Then, we compared these trends with those of the previous 15 years.

Methods

Setting

The Gironde department (1.6 million inhabitants) is served by a medicalized EMCC known as SAMU 33 (Service d’Aide Médicale Urgente de la Gironde) which answers calls to the French toll-free number dedicated to medical emergencies (the “15”). A call is first received by a medical assistant, and then an emergency physician or a general practitioner (depending on the severity of the case) decides on the appropriate response, from medical advice to the dispatch of an ambulance or a mobile intensive care unit [4].

Clinical reports

For all cases handled, a written clinical report was created and updated by the medical assistant and the physicians, using the various telephone interactions with the patient, family, witnesses, and then with the paramedics if applicable.

Extraction of call reports from the EMCC of the Gironde department, France

Data recorded as a result of a call are stored in the digital medical record system of the EMCC of the University Hospital of Bordeaux. All call clinical reports from January 2005 to October 2020 were retrieved under the supervision of Dr. Cedric Gil-Jardiné and Dr. Catherine Pradeau.

Design and classification procedure

We conducted an ecological prospective study of these call clinical reports based on symptoms mentioned during the call. An assessment of symptoms cited in call reports was conducted using a keyword search. To account for misspelling and syntax variation we searched for these symptoms by applying a fuzzy matching method using the Levenshtein distance defined as the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other [5]. Symptoms associated with negation were also excluded from searches (for example, the keyword search for “fever” excluded the term “no fever”). Symptoms were fever, cough, muscle soreness, dyspnea, ageusia and anosmia. Levenshtein distances were set at 1 for all but ageusia (agueusie in French), for which we noticed that the u after the g was very frequently omitted.

Classification using a keyword-based search was performed with R version 4.0.2 using the adist function (utils v3.6.2).

Performance assessment

A validation sample was built with a selection of call reports that were coded by the authors of the study. To maximize the number of symptoms, we selected the reports of calls made between three days before and three days after March 25, 2020, the day with the highest number of calls with ageusia, anosmia and dyspnea. The reliability of keyword-based and human-coded classifications was presented using Accuracy and Cohen's kappa coefficient.

Emergency room admissions for suspected COVID-19 in the Gironde department, France.

We retrieved daily aggregated data on ER admission for suspected COVID-19 from the “Santé Publique France” Geodes website (https://geodes.santepubliquefrance.fr/). All admission records from the ER participating in the French OSCOUR® network are routinely transmitted to “Santé Publique France”.

Results

We conducted a retrospective cross-sectional study in the EMCC of Bordeaux University Hospital. We retrieved 38,08,243 call reports from January 2005 to October 2020. The validation sample (Table 1) included 8539 manually coded reports. Cohen’s kappa measuring reliability ranged from 59 (dyspnea) to 75% (anosmia), showing that the procedure using a keyword search yielded results that are close to the assessment by clinician reading the reports. The lowest performance was observed for dyspnea for which the keyword search identified 1941 reports as compared to 1516 when manually coded.

Table 1 Comparison between manually coded symptoms and keyword-search results

A distinct signal was observed when plotting (Fig. 1) the daily number of calls citing the selected six symptoms as a function of time in the 2005–2020 period, the weakest but also most specific ones being “anosmia” and “ageusia”. Figure 2 plots the same data for the year 2020 alone. From January to October 2020, the median daily number of calls to EMCC with a report was 883, with a total number of 2,80,418 and a peak of 1930 on March 14, 2020. In 2020, 24% of call reports cited at least one of the selected six symptoms. During the peak on March 14, 2020, this proportion was 54%; cough and fever were by far the most prevalent, but a similar peak was found for muscle soreness. In March 3, 2020, the number of calls reporting cough reached 208 calls per day, higher than any level in the past 15 years, and peaked on March 15, 2020, 2 days before lockdown. Calls referring to dyspnea, anosmia and ageusia peaked 12 days later and were concomitant with the daily number of emergency room admissions.

Fig. 1
figure 1

Trends in symptoms from selected keyword searches from 2005 to 2020

Fig. 2
figure 2

Trends in symptoms from selected keyword search in 2020

Discussion

Keyword-based classification of symptoms retrieved in EMCC medical reports globally showed good performances. All six symptoms were cited in an unprecedented number of calls during the COVID-19 epidemic, and were present in up to 44% of call reports during the peak of the epidemic on March 14, 2020. The breakdown of calls by symptoms during the COVID-19 crisis paralleled the natural history of the disease [6], with cough, fever and muscle soreness, followed by dyspnea, ageusia and anosmia. A delay was observed between the rise in calls for flu-like symptoms and the rise in ER visits for suspected COVID-19.

The curve began to rise 20 days before the increase in ER visits. One could hypothesize that the peak of calls recorded around March 14 was due to the concern, if not anxiety, caused by the announcement on television of the closure of public places by the French President on that day. However, in a more affected part of the country, the Ile-de-France region, the peak was reached much (10 days) earlier [7], suggesting that most of the calls we recorded were more motivated by symptoms than by concerns raised by communication by the authorities. EMCC call content is therefore probably the most predictive early indicator of the start of the epidemic, as recently shown by Riou and colleagues who found in the Ile-de-France region a strong correlation between calls regarding suspected COVID-19 and the number of patients in intensive care, with a delay of 23 days [7]. However, while the number of calls for flu-like symptoms proved to be an early and relevant signal, its intensity was probably increased by the authorities’ request to citizens not to go directly to the ER and to contact instead the EMCC.

In the context of the COVID-19 epidemic, several research teams have used a similar approach, attempting to investigate the internet or social media to build early indicators of the epidemic [8, 9]. However, no such signal could be found from a Google keyword search [1, 10], as the peak for cough, fever, coronavirus or COVID-19 was not reached until the week of 15–21 March.

Generalizability and limitations

Not all calls are handled by EMCC, a proportion of them remain unanswered and this proportion increases during peak periods. It is therefore likely that around March 14 the number of attempted calls was higher than those handled. The study was done in Gironde, a department with a reportedly low rate of SARS-Cov-2 infection if compared to the Ile-de-France and the north-east regions of France. However, lockdown and fear of the epidemic affected all French people and the Gironde EMCC are the third largest in terms of the number of calls received in France, which has made it possible to build up a sufficiently large database.

Our study spans nearly 16 years and we cannot rule out that the way in which the reports were written may have changed over this period. The role of assistants changed in 2008 with a subdivision of the work, one assistant dealing with the reception of the call (geographical coordinates, reasons) and another one dealing with the clinical evaluation transmitted by the field emergency services. It is unlikely, however, that this change modified the likelihood of symptom-related keyword occurrence. During the COVID-19 crisis, assistants can be expected to ask questions more systematically about symptoms as rare as ageusia and anosmia. The clear time lag between the peak of fever and cough and the peak of anosmia/ageusia suggests, however, that their occurrence is not only the result of a systematic question for patients with influenza-like symptoms. Ideally, the classification of call reasons would be validated by a final diagnosis made by a practitioner, for example during the ER visit. This proved infeasible because the individual identification number is not reported in call reports. In addition, ER visits corresponds to only a portion of calls.

Conclusion

Given the fast spread and severity of COVID-19, the availability of reliable surveillance platforms is crucial for timely monitoring and for responding with adequate control measures to the COVID-19 epidemic and others to come, together with other major events with public health consequences. This work illustrates how content analysis of calls to EMCC can be used to describe the symptoms and signs of a new disease whose natural history was initially not or little known.

Availability of data and material

For security reasons, the database used in the research cannot be made publicly available. An extraction of the database with a number of selected variables (keyword occurrence) can, however, be made available on reasonable request.

Code availability

Code used for the keyword-search procedure is available as supplementary material.

Change history

References

  1. WHO (2020) Coronavirus (COVID-19) events as they happen. https://www.who.int/emergencies/diseases/novel-coronavirus-2019. Accessed 7 July 2021

  2. Josseran L, Fouillet A, Caillère N, Brun-Ney D, Ilef D, Medeiros H (2010) Assessment of a syndromic surveillance system based on morbidity data: results from the Oscour network during a heat wave. PLoS ONE. https://doi.org/10.1371/journal.pone.0011984

    Article  PubMed  PubMed Central  Google Scholar 

  3. Duijster JW, Doreleijers SDA, Pilot E, van der Hoek W, Kommer GJ, van der Sande MAB, Krafft T, van Asten LCHI (2019) Utility of emergency call centre, dispatch and ambulance data for syndromic surveillance of infectious diseases: a scoping review. Eur J Public Health 30(4):639–647. https://doi.org/10.1093/eurpub/ckz177

    Article  PubMed Central  Google Scholar 

  4. Adnet F, Lapostolle F (2004) International EMS systems: France. Resuscitation 63(1):7–9. https://doi.org/10.1016/j.resuscitation.2004.04.001

    Article  PubMed  Google Scholar 

  5. Levenshtein VL (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10(8):707–710

    Google Scholar 

  6. Richardson S, Hirsch JS, Narasimhan M et al (2020) Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York city area. JAMA 323(20):2052–2059. https://doi.org/10.1001/jama.2020.6775

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. The COVID-19 APHP-Universities-INRIA-INSERM Group, Riou B (2020) Emergency calls are early indicators of ICU bed requirement during the COVID 19 epidemic. medRxiv. https://doi.org/10.1101/2020.06.02.20117499

    Article  Google Scholar 

  8. Li C, Chen LJ, Chen X, Zhang M, CP P, Chen H (2020) Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020. Euro Surveill. https://doi.org/10.2807/1560-7917.ES.2020.25.10.2000199

    Article  PubMed  PubMed Central  Google Scholar 

  9. Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, Niakan Kalhori RS (2020) Predicting COVID-19 incidence through analysis of google trends data in Iran: data mining and deep learning pilot study. JMIR Public Health Surveillance. https://doi.org/10.2196/18828

    Article  PubMed  PubMed Central  Google Scholar 

  10. Google Trends (2020). https://trends.google.fr/trends/?geo=FR. Accessed on 3 June 2020

Download references

Acknowledgements

We thank the University Hospital of Bordeaux for providing the logistical support that allowed us to access and analyze the data needed for the manuscript in such a short period. We are also grateful to Julien Anjoubault, Clarisse Marguinaud, Virginie Cocuelle, Delphine Vauthier, Alexandra Barbe, François Garreau, Quentin Bana, Claire Riou, Pauline Soubelet and Elisabeth Verbitskaya for their expertise, which allowed proper manual coding for validation, and to Benjamin Contrand, Loïck Bourdois and Marie-Odile Coste for data management and administrative assistance. We also thank Sylviane Lafont for her help at the beginning of the project. BPH IETO Team activities are supported by the Institut National de la Santé et de la Recherche Médicale (INSERM), University of Bordeaux, Ministère de l’Intérieur (Délégation à la Sécurité Routière).

Funding

The study was funded by the French Agence Nationale de la Recherche (ANR-20-COV1-0004–01).

Author information

Affiliations

Authors

Contributions

GJC, PC and LE conceived and planned the study. GJC carried out database management. GJC, and LE conducted the data analysis. All authors were involved in manual coding for validation. All authors contributed to the interpretation of the results. GJC took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research, analysis and manuscript.

Corresponding author

Correspondence to Emmanuel Lagarde.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical approval

No personal data were necessary for this work. All reports were automatically de-identified using an automatic search procedure that was applicable because of the standardized format of personal information inserted in the reports. In terms of the protection of personal health data and the protection of privacy, this work complies with the application framework provided by Article 65–2 of the amended French Data Protection Act and the General Regulation on the protection of personal data. It was approved by the Bordeaux Teaching Hospital Committee for Ethics and Data Protection.

Consent to participate

This work involves retrospective anonymized data that, for privacy and technical reasons, does not allow consent to participate to be obtained from individuals.

Consent for publication

This work involves retrospective anonymized data that, for privacy and technical reasons, does not allow consent for publication to be obtained from individuals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gil-Jardiné, C., Chenais, G., Pradeau, C. et al. Surveillance of COVID-19 using a keyword search for symptoms in reports from emergency medical communication centers in Gironde, France: a 15 year retrospective cross-sectional study. Intern Emerg Med (2021). https://doi.org/10.1007/s11739-021-02818-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11739-021-02818-5

Keywords

  • Emergency medical communication centers
  • COVID-19
  • Symptoms
  • Keywords