Patient healthcare trajectory. An essential monitoring tool: a systematic review
Patient healthcare trajectory is a recent emergent topic in the literature, encompassing broad concepts. However, the rationale for studying patients’ trajectories, and how this trajectory concept is defined remains a public health challenge. Our research was focused on patients’ trajectories based on disease management and care, while also considering medico-economic aspects of the associated management. We illustrated this concept with an example: a myocardial infarction (MI) occurring in a patient’s hospital trajectory of care. The patient follow-up was traced via the prospective payment system. We applied a semi-automatic text mining process to conduct a comprehensive review of patient healthcare trajectory studies. This review investigated how the concept of trajectory is defined, studied and what it achieves.
We performed a PubMed search to identify reports that had been published in peer-reviewed journals between January 1, 2000 and October 31, 2015. Fourteen search questions were formulated to guide our review. A semi-automatic text mining process based on a semantic approach was performed to conduct a comprehensive review of patient healthcare trajectory studies. Text mining techniques were used to explore the corpus in a semantic perspective in order to answer non-a priori questions. Complementary review methods on a selected subset were used to answer a priori questions.
Among the 33,514 publications initially selected for analysis, only 70 relevant articles were semi-automatically extracted and thoroughly analysed. Oncology is particularly prevalent due to its already well-established processes of care. For the trajectory thema, 80% of articles were distributed in 11 clusters. These clusters contain distinct semantic information, for example health outcomes (29%), care process (26%) and administrative and financial aspects (16%).
This literature review highlights the recent interest in the trajectory concept. The approach is also gradually being used to monitor trajectories of care for chronic diseases such as diabetes, organ failure or coronary artery and MI trajectory of care, to improve care and reduce costs. Patient trajectory is undoubtedly an essential approach to be further explored in order to improve healthcare monitoring.
KeywordsSystematic reviews Text mining Healthcare trajectory PPS Semi-automated Word cloud
Acute myocardial infarction
Diagnosis related group
Divisive hierarchical clustering
International classification of diseases
Latent Dirichlet allocation
Programme de médicalisation du système d’information
Prospective payment system
Patient healthcare trajectory is a recent emergent topic in the literature, encompassing broad concepts. Our research was focused on the patient trajectory based on disease management and care, while also considering medico-economic aspects of the associated management. We approached patient care trajectories based on an example; the occurrence of a myocardial infarction (MI). As MI treatment is performed in a health facility, we were able to trace the patient trajectories through the national hospital financing system, using comprehensive hospital databases or registers, regularly collected for billing purposes.
The first prospective payment system (PPS), based on diagnosis-related groups (DRG), was established in the United States in 1983. The objective of this system was to control the expenditures of health care institutions and streamline the costs . Thereafter, similar medical information systems were adopted in many other industrialised countries. Others, like France, also adopted an anonymised database with unique patient identifiers (for instance, through cryptographic hash functions) to facilitate chaining hospital stays [2, 3, 4]. In addition, the gradual increase in fees-for-services enhanced the coding quality . The introduction of these systems enabled new epidemiological and/or economic studies [6, 7, 8, 9] using these databases, with temporal follow-up of patients allowing tracing of their trajectory of care. This review investigated how the trajectory concept is defined, studied and what it achieves.
We carried out a literature search on PubMed using keywords related to trajectory, PPS and MI concepts. We then proceeded in two steps: (1) a non-a priori search with text mining techniques; and (2) a more standard analysis of a sub-selection of documents.
Similar systematic reviews  have been performed before, but without using automatic procedures. However, conducting an automatic search is of considerable interest for processing a large number of documents. Text mining allows better targeting for information retrieval and reduces the search time , while also enabling users to prioritise searches.
Our reviewing strategy is presented in the “Methods” section; the search questions that guided our review, together with the various methods used to address them. The results are reported in the “Results” section. We end with the “Discussion” section, where we present answers to the search questions and comment on the results. To conclude this section, we discuss the different existing text mining techniques used in systematic reviews.
Non-a priori questions
Do studies on the patients’ trajectories exist?
What are the topics in these studies? (support, treatment, costs, etc.)
Which diseases are studied by trajectories?
Is PPS explored in the search?
Is PPS used in studying trajectories?
Are there any studies on the trajectories of patients with MI?
What is studied in MI?
A priori questions
What are the various concepts of the trajectory? (How is this concept defined?)
What is the interest in the subject: have many studies focused on patients’ trajectories?
Which countries conduct studies on trajectories?
What are the objectives of patient trajectory studies?
What methods are used in patient trajectory studies?
What are the characteristics of the studies: number of patients involved, duration of follow-up?
What data is used in these studies: hospital or other?
Step 1: document retrieval
Keywords used in document retrieval
Topics and constraints
C1: Medical context
“trajectories”, “trajectory”, “path”, “pathway(s)”
“prospective payment system”, “PMSI”, “DRG”, “ICD”, “regional information system”, “fee for service system”, “registry”, “Activity-based Payment”
“myocardial infarction” in the title
January 1, 2000 to October 31, 2015
English, French, Spanish and Italian
Step 2: first text mining approach
From the selection of articles gathered in step 1, we created a corpus of texts, divided into three parts, T1 to T3 (corresponding to Table 2 topics), consisting of the title and abstract, in which we removed the keywords (see Table 2) in order to only keep the other terms;
The three parts of the corpus were analysed separately with IRaMuteQ1 software. This is an R interface for multidimensional analysis of texts and questionnaires , allowing statistical analysis of the text corpus ;
We applied the following pre-processing techniques: (a) Lemmatization of texts, (b) Dictionary enrichment: we lemmatized unrecognised terms by TreeTagger2 and added specific medical terms and well-known acronyms such as acute myocardial infarction (AMI). Subsequently, the analyses were conducted with the full forms (nouns, adjectives, adverbs and verbs);
We carried out conventional textual analysis, then similarity analysis and finally clustering. The various tools used were as follows:
Word cloud This is a synthetic representation of the terms distribution: the most recurrent words are in the centre with text size proportional to the number of occurrences. Thus, this kind of representation symbolises, by order of importance, the concepts covered in all of the articles. This method will provide an answer to Q1.
Similarity analysis This graph theory-based technique is conventionally used to describe social representations based on survey questionnaires . Similarity analysis is applied to study the proximity and relationships between elements in a set, in the form of maximum trees. The objective is to reduce the numbers of links between two items, to obtain an acyclic connected graph. The maximum tree is therefore the tree created by the strongest edges of the graph, where the strength is measured by the occurrence of the linked terms. For each corpus, we selected the tree representation described in  and in the algorithm in , to describe communities via the shortest path, thus highlighting the most frequently associated words in the same sentence or text. The graph generates a more precise idea of the content of articles concerning the concepts and themes raised by linking important terms. This method will provide answers to Q3, Q5 and Q7.
Text clustering Reinert clustering  is a form of divisive hierarchical clustering (DHC) that is carried out in several stages, offering a global approach to the corpus. It identifies statistically independent word classes after partitioning the corpus. These classes may be interpreted by their profiles, which are characterised by specific correlated words. DHC summarises this through a dendrogram. This analysis generates a complementary vision with regard to similarity analysis by clustering articles according to concepts, partly identified by similarity analysis, characterised by word groups. This method will supplement the answer to Q7, and address Q2, Q4 and Q6.
Step 3: thorough analysis of the selected articles
We used the sub-selection technique derived from Moher’s method described in , and crossed the sets of themes: T1 and T2, denoted T1∩T2, then T1 and T3, denoted T1∩T3. This selection was performed in the same manner as described in Table 2. We added an additional constraint to better target our study through counting the K occurrence number of the trajectory concept in each document and selecting those for which: K ≥ 2. We counted each time the words “trajectories”, “trajectory” or “pathway” appeared in the titles and summaries of the articles.
Our reading grid was based on that described in the PRISMA3 guidelines. We selected items that could be used to address the a priori search questions (see Table 1), Q8 to Q14: publication year, country of study, number of patients, observation period, methods and objectives. Other items that were irrelevant to our study were not kept. We added the following items: pathologies studied, databases used and definition of the trajectory concept.
Some results, not listed in this paper, can be viewed at the following address: http://www.lirmm.fr/~pinaire/.
Step 1: document retrieval
The document retrieval resulted in a total of 33,514 articles.
Step 2: first text mining approach
We present the results obtained by our method which combined different approaches of lexicographic analysis (see below) following the flow diagram (Fig. 1).
Text clustering Following this clustering, 80% of the articles of T1 were distributed in 11 disjointed clusters, 86% for T2 in five clusters, and 98% for T3 in five clusters. We then performed a second clustering on the sub-corpus of each theme, consisting of articles that were not clustered during the first analysis.
In the second clustering of the 3160 non-clustered articles, we identified five clusters consisting of 99% of the articles. From right to left, cluster 1 pools the concept of studies from a methodological standpoint. Cluster 5 concerns end-of-life issues. Cluster 4 pools the macroscopic aspect of care with public support. Cluster 3 groups studies involving animal experiments. Finally, cluster 2 concerns genetic mutation and anomalies. Three articles could not be clustered due to a lack of information.
Step 3: thorough analysis of the selected articles
Description of the observed items and categories
Better comprehension of a disease and subsequent adaptation of care
Medical instructions to improve the health status and/or avoid its impairment
Comparison of treatments, care process, new drugs
Implementation of care process to improve support
Data processing tool
Creation of a synthetic tool for data visualisation or algorithms formulations to gather and classify data derived from several sources
Hospital & PPS
Hospital or prospective payment system databases
Any type of interview
Questionnaires and multiple choice questionnaire
Pharmacy databases, social security databases, data from GPs, blood bank databases, and patient diaries.
Stroke, MI, heart failure
Colorectal, prostate, breast, lung, bladder, cervical, endometrial
Chronic obstructive pulmonary disease, pulmonary embolism
Multiple sclerosis, schizophrenia, depression
Gout, osteoarthritis, scoliosis, craniotomy, colon penetration injury, pelvic fractures, pain
Cox, Kaplan–Meier models
Param and non-param tests
Chi Squared, Fisher, Student, Kruskal–Wallis, Mann–Whitney tests, etc.
Linear or logistic models
Linear or logistic regression, GLM, logistic model
Latent variable models, Kappa coefficient, meta-analysis, etc.
Tracking costs in the case of a treatment or care process
Care trajectory, with the history of consultations, reasons for hospitalisation, care provided
Series of steps through which the patient passes into an integrated care process, or a series of operational steps of a care team
Symptoms, clinical, cognitive developments
Different types of measures
Disease progression risk
Measurement of time physical activity, patient decision making
Review references sources by item reviewed
Data processing tool
Hospital & PPS
Linear or logistic models
Param and non param tests
We grouped the countries according to continent. For T1∩T2, we noted strong representation from Europe (55% of articles) and the Americas (29%). There were some studies from Oceania (9%) and Asian countries (7%). For T1∩T3, the article distribution was essentially between three continents: Europe (36%), the Americas (28%) and Asia (24%). Australasia was marginal, with 4% of articles. There were some atypical studies with data from multiple continents (8%).
We next considered publication year. For T1∩T2, the results highlighted activity that began developing in 2013. While for T1∩T3, we noted a peak of activity in 2004, and increasing activity in 2012.
The number of patients involved was then analysed, showing that the number of patients ranged from 14 to 6.2 million T1∩T2 (vs 20–30.20 million for T1∩T3), with a median of 859 and an interquartile interval (IQ) of 3250 (vs 604.5 and IQ = 933.25 for T1∩T3), with missing data for three articles (vs five for T1∩T3).
We also focused on the observation duration, measured in months, available in more than 85% articles. Observation duration ranged from 5 to 180 months in T1∩T2 (vs 3 to 240 months in T1∩T3) with a median of 36 months and an IQ of 54 months (vs 12 and 99 months in T1∩T3).
Our method is based on a semi-automatic approach of text mining. We used terms and concepts which emerged from classification techniques rather than the simple presence of words. This approach was structured into two main steps prior to a thorough text analysis of the selected articles. These two steps were based on document retrieval and text mining techniques.
Step 1: document retrieval
For document retrieval, we chose to focus our study on PubMed. The search results are entirely dependent on the choice of keywords, making this a particularly delicate task when definitions may vary between authors and countries. Indeed, we encountered this difficulty for T2. As presented in Table 2, the keywords used were “Prospective Payment System”, “PMSI”, “DRG”, “ICD”, “regional information system”, “fee for service system”, “registry”, “Activity-based Payment”. However some documents used words not in our final selection, such as in , which contains the term “national case-mix system”. Our objective was not to be exhaustive with regard to covering all the publications, but rather to define a general method of analysis. A way to improve our approach would be to implement an adaptive algorithm for keywords enrichment.
Step 2: first text mining approach
The lexicographic analysis was based on three combined tools.
For the word cloud approach, the occurrence of the terms “study” and “care”, for all of the studied fields, means that these articles cover care concept and studies on topics such as diseases or drugs. For T1, the terms “treatment” and “increase” reflect a focus on patient healthcare trajectories. Thus, there are many studies on patient trajectories. Here we have answered Q1.
For the similarity analysis, the results showed that for T1, studies were closely related to care, disease and more specifically to cancer. In response to Q3, the studied diseases were those causing severe and chronic organ dysfunction: heart, kidneys, or lungs. We noted that the cancer concept was also closely related to that of genetics. We found here that the use of the keyword “pathway” highlighted all articles pertaining to cell signalling or gene pathways [20, 21, 22, 23, 24, 25, 26, 27, 28, 29].
For T2, cancer was closely related to the registry data. This highlights the descriptive aspect of the data information, i.e. registry data describing the patient’s cancer history from its diagnosis. We noted that the study concept was related to the disease concept, i.e. cardiac or renal, but also to the various treatments and therapies. In response to Q5, T2 was thus related to research: in disease studies [30, 31, 32, 33, 34, 35, 36, 37], to compare care and coding [38, 39, 40], but also in monitoring of patients over time and the survival rate [31, 41, 42, 43, 44]. Survival rate forms part of a trajectory concept. This trajectory concept may also encompass the registry, i.e. a longitudinal concept containing many concepts related to longitudinality.4
The T3 graph highlights two standpoints regarding MI studies: firstly that of clinicians who study MI, its risks and aggravating factors to gain insight into preventing and, if necessary, managing these patients. Secondly, that of patients with coronary symptoms, which could progress to incidents, which could then progress to AMI requiring hospitalisation and with high risk of mortality depending on the patient’s age. This partly answers Q7.
The text clustering enabled summarisation of the results in order to list the topics studied, asked in Q2, in these articles concerning patient trajectories. The first topic that we covered is disease with, for example, metabolic disorders such as diabetes and cardiovascular complications. Certain articles addressed patients’ feelings, anxieties and disease experience. In the patient trajectory, there was support from the patient’s immediate relations and family, but also health services, such as home nurses. Other articles focused on end-of-life situations, palliative care and processes set up to manage this last stage of the disease. Another topic was clinical research, involving developing cohorts, data collection, and methods used in different studies. Other studies concentrated on hospital organisation, various services, patient care staff, and associated costs. Other articles were focused on the health regulations and recommendations from guides of good practices.
As a response to Q4, our conclusion regarding the two T2 clusterings was that PPS is used in research primarily in the study of diseases, sometimes on disease onset, especially on disease management, associated costs, treatment and possible complications, but also in its coding. The studied diseases included neurological disorders, cancer, irregular heartbeat and cardiovascular diseases, the implantable medical devices to regulate these anomalies, traumas and wounds, infectious diseases, organ transplants, genetic and autoimmune diseases, and finally renal failure. Pregnancy and birth are also studied.
The T3 results, in reply to Q6 and to complete response to Q7, showed that MI is studied from several aspects, with the first regarding the risk factors (socioeconomic, age, hypertension, diabetes). Then there are the biochemical and cardiocirculatory functioning aspects, the various mechanisms which lead to MI and genetic predisposition . In addition, there is the psychological aspect of ill patients and the consequences. There is also emergency management before hospital admission, including transport and first aid. Then there is care at admission, medication management  and associated costs—here the trajectory concept emerges. There is also an aspect regarding the effectiveness of the measures implemented [46, 47, 48, 49, 50, 51, 52] and the different treatments [53, 54, 55]. Another investigated aspect concerns lifestyle, with regard to dietary habits, healthy lifestyle , comorbidities [43, 57] (smoking and/or alcohol), but also environmental factors like atmospheric pollution.
Step 3: thorough analysis of the selected articles
A thorough analysis of the selected articles was performed. Trajectory studies require, first and foremost, a definition of this concept, which is the focus of question Q8. The results showed that in most cases trajectory is characterised by care processes established for a specific disease to improve patient care, facilitate health planning within institutions, ensure prevention, predict the course of the disease and prevent the onset of symptoms.
In response to Q9, we found that interest in patient trajectory studies have increased in the last 5 years. The resurgence of studies in 2013 could be explained by the improvement in the quality of databases as of 2009 (ref), particularly in France, and the possibility of chaining hospital stays and reconstructing patient care trajectories throughout the country.
This interest in trajectories mainly stems from Europe and the Americas with 47 and 29% of studies, respectively. The PPS concept necessarily led to only including countries with a similar health system database organisation. This is a weakness of our study, since countries with a different information health system to the American model were not selected through this filter. Thus we have answered Q10.
Then we sought to determine the rationale for why these studies were conducted and provide a response to Q11. The six-category article distribution we defined showed that the aim of most of the studies was to compare treatments, techniques or care procedures. In each case, the aim was to reduce costs while improving the quality of care. Patient healthcare trajectory studies appeared beneficial in two ways: (1) First, the trajectory provides insight into the course of the disease following medical and surgical care. (2) Secondly, the trajectory may be highly informative regarding the medico-economic aspects so as to be able to streamline the patient’s care management to avoid treatment dispersion.
In addition, the methods used underpinned the rationale of comparative studies as part of care techniques, treatment or care processes. These methods (Anova, comparisons tests, survival models, linear or logistic regression, etc.) are listed in the second part of Fig. 6 which solves question Q12.
We pursued this investigation by assessing the study characteristics, and answered Q13. In the studies, the number of recruited patients was estimated a priori for statistical analyses in good conditions with sufficient power. However, we identified a few studies that were conducted on the entire population, without sampling.
Overall, the study time was short, not more than a few years, which could be explained by economic considerations or a lack of data. For retrospective studies, for example, it was sometimes hard to trace back several years because the information is deleted after a certain period of time.
Next, we investigated the origin of the data used. For T1∩T3, registry data were mostly used. For T1∩T2, hospital databases and hospitalisation billing databases were used, so the studies were mostly hospital-based. Moreover, apart from hospital databases, some studies took patients’ feelings into account via interviews and questionnaires. Some studies required additional information on, for instance, medication [31, 58] through pharmacy databases or non-hospital care [59, 60] with social security databases for complete patient care monitoring.
To supplement previous findings concerning the list of diseases studied, for T1∩T2, the patient trajectories were closely focused on different cancers. Note that this brings us back to the results that were highlighted in step 2. We thus resolved question Q14.
There are many different text mining techniques which are being constantly developed for literature searches and systematic review . In systematic reviews, text mining techniques are used for four purposes : (1) automatic terms recognition to identify and extract terms automatically from texts ; (2) document classification by generating subsets of documents focused on a specific topic [63, 64, 65, 66]; (3) document clustering to group documents into topics. These correspond to topics shared by all the documents in the group they contain and by no other document in the collection [17, 67, 68]; (4) drafting abstracts by selecting sentences from each document based on the significance of its terms, which are combined via classification techniques . Some authors used text mining for other purposes. For example, in  the authors created correspondence databases linking authors with the name abbreviations and processed a co-authorship analysis. In  the authors annotated abstracts in two ways, first the gene or protein of interest, then the protein interactions and/or gene functions. Ultimately, they categorised documents according to these annotations. Thus, combining text mining methods for systematic review is a hot topic [72, 73, 74, 75].
While there is no consensus for a method in conducting a review with a huge number of documents, there are several techniques in text mining already used in various fields to explore text data [76, 77, 78]. Here, we wished to gain an overview of the document content in a recent developing field of inquiry, in order to provide general information and to respond to research questions. Our aim is to maximise the recall to ensure comprehensive study. We also aimed to better select publications, then reviewed them in a classical manner, by creating filters. With our method, searches are conducted based on the meaning of the words and concepts emerging from classification techniques rather than simply the presence of this term and concept. Thus, we conducted an in-depth study to explore the texts, starting by highlighting keywords, which were often used in the abstracts. Word cloud representation was most suited for this step, as it enabled a quick visual reading of the results. However, beyond the visual data display, word clouds do not provide much information.
One way to gain further insight is to highlight a lexical universe attached to those keywords. Thus, the same word may be interpreted differently depending on the terms associated with it. Similarity analysis best addresses this issue. Its tree construction approach connects highly co-occurring networks of terms and allows a better understanding of the most frequently discussed themes through the various items making up each corpus.
The last step in the exploration process is to determine whether it is possible to classify these articles in the topics highlighted by similarity analysis. We compared these results by using Reinert clustering because it has the advantage of respecting the text construction. It is also offers more flexibility than latent Dirichlet allocation (LDA), for example, where the researcher has to pre-determine the number of clusters. Although some authors have proposed solutions for the “optimal” number of topics in topic modelling [79, 80], it is not possible to verify, making this method even harder to apply.
The text mining methods that we selected have proven to be effective in exploring the corpora without a priori and with open-ended questions, allowing us to quickly identify documents associated with subjects beyond genetics. This facilitated the filtering of articles to apply methods with a priori to answer specific questions. Although existing methods for exploring text data to conduct rapid reviews are good, we hoped to validate a non-traditional methodology to conduct more extensive systematic reviews for future research.
In this article, a semi-automatic text mining methodology was applied to investigate patient healthcare trajectory. Patterns were extracted and identified semi-automatically from the published articles in PubMed. With text mining techniques we could analyse large amounts of text data, which would have not been possible otherwise. The originality of our approach lies in assisting a research review on the basis of a semantic approach, from research questions to targeted documents which will be then thoroughly analysed. This method is well-adapted for complex review questions or hard to define topics such as those addressed in public health and more particularly in the context of patient healthcare trajectory literature. Finally, our strategy enabled us to explore the concept of trajectory in the care domain.
We illustrated our search using a frequent cause of hospital stay, the occurrence of a MI. We chose to trace the follow-up of these MI patients through the PPS. We addressed open-ended questions by determining the topics covered in each area, to explore areas transversely, while highlighting studies dealing with patient trajectories with regard to MI, based on PPS data. This semantic approach was demonstrated to be well-tailored for addressing our issues.
Document retrieval on the patient trajectories was combined with two major themes, i.e. PPS databases and MI. The findings showed that this type of study is of interest in the biomedical community; for comparative trajectories of drug prescriptions and costs Sundberg et al. concluded  that: “Drug prescriptions and costs of analgesics increased following conventional care and decreased following integrative care, indicating potentially fewer adverse drug events and beneficial societal cost savings with integrative care”. Similarly, with regards to access to the appropriate treatment in time for cancer patients, Defossez et al. affirmed  that “There is in particular a need to describe and analyse cancer care trajectories and to produce waiting time indicators…The evaluation shows the ability of an integrated regional information system to formalise care trajectories and automatically produce indicators for time-lapse to care instatement, of interest in the planning of care in cancer.” Our study revealed that the trajectory concept, regardless of its form, is being explored, analysed and exploited, especially in oncology through the oncology communicative medical file and multidisciplinary meetings.
To complete this research, it would now be interesting to include studies on patient trajectories in electronic health records. Some recent studies have focused on the use of these new technologies in order to offer patients with mobility difficulties integrated care by pooling electronic records from patients, caregivers or healthcare teams as well as doctors’ follow-ups . However, the implementation of such processes requires considerable organisation and adequate resources  and can lead to technical interoperability problems .
We were also studying patient trajectories in a health environment with MI. We obtained DRG sequences by chaining hospital stays. These sequences represent the chronological pattern of hospital healthcare of patients. We have characterised patient trajectories by such DRG sequences. We have applied sequential pattern mining techniques  to our trajectories in order to highlight frequent hospital trajectory patterns. To our knowledge, this is the first time that this type of approach, by applying sequential patterns to hospital data or registry data, has been used. Our ultimate goal is to build a predictive model of MI trajectories to simulate disease progress in the coming years so as to help anticipate health needs.
Jessica Pinaire (JP) performed data acquisition and conducted analysis. Interpretation was led by Paul Landais (PL). Jérôme Azé (JA) and Sandra Bringay (SB) contributed to the conception and design of the study. JP drafted the manuscript, PL, JA, SB critically revised the manuscript. We confirm that the final version of the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all authors.
We warmly thank Sarah Kabani for her expert editing of the manuscript.
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article.
This research was partly supported by Nîmes University Hospital, and LIRMM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Kendrick S, Clarke J. The Scottish record linkage system. Health Bull. 1993;51:72–9.Google Scholar
- 5.Le Bihan-Benjamin C, Landais P, Chatellier G. Linking hospital stays in the national PMSI MCO database improved between 2006 and 2009: analysis and consequences. J Écon Méd. 2012;30:17–30.Google Scholar
- 12.Ratinaud P, Déjean F. IRaMuTeQ : implémentation de la méthode ALCESTE d’analyse de texte dans un logiciel libre. MASHS2009, Toulouse; 2009.Google Scholar
- 13.Ratinaud P, Marchand P. Application de la méthode ALCESTE à de « gros » corpus et stabilité des « mondes lexicaux » : analyse du « CableGate » avec IRaMuTeQ » . Actes des 11eme Journées internationales d’Analyse statistique des Données Textuelles; 2012. p. 835–844.Google Scholar
- 14.Flament C. Similarity analysis: a technique for researches in social representations. Cah Psychol Cogn. 1981;1:375–95.Google Scholar
- 17.Reinert A. Une méthode de classification descendante hiérarchique : application à l’analyse lexicale par contexte. Cah. L’analyse Données. 1983;8(2):187–98.Google Scholar
- 21.Cresci S, Wu J, Province MA, Spertus JA, Steffes M, McGill JB, et al. Peroxisome proliferator-activated receptor pathway gene polymorphism associated with extent of coronary artery disease in patients with type 2 diabetes in the bypass angioplasty revascularization investigation 2 diabetes trial. Circulation. 2011;124:1426–34.PubMedPubMedCentralCrossRefGoogle Scholar
- 30.Guldbrandt LM, Fenger-Grøn M, Rasmussen TR, Jensen H, Vedsted P. The role of general practice in routes to diagnosis of lung cancer in Denmark: a population-based study of general practice involvement, diagnostic activity and diagnostic intervals. BMC Health Serv Res. 2015;15:21.PubMedPubMedCentralCrossRefGoogle Scholar
- 47.Kristoffersen DT, Helgeland J, Waage HP, Thalamus J, Clemens D, Lindman AS, et al. Survival curves to support quality improvement in hospitals with excess 30-day mortality after acute myocardial infarction, cerebral stroke and hip fracture: a before-after study. BMJ Open. 2015;5:e006741.PubMedPubMedCentralCrossRefGoogle Scholar
- 53.Hagiwara MA, Bremer A, Claesson A, Axelsson C, Norberg G, Herlitz J. The impact of direct admission to a catheterisation lab/CCU in patients with ST-elevation myocardial infarction on the delay to reperfusion and early risk of death: results of a systematic review including meta-analysis. Scand J Trauma Resusc Emerg Med. 2014;22:67.PubMedPubMedCentralCrossRefGoogle Scholar
- 55.Lewis EF, Li Y, Pfeffer MA, Solomon SD, Weinfurt KP, Velazquez EJ, et al. Impact of cardiovascular events on change in quality of life and utilities in patients after myocardial infarction: a VALIANT study (Valsartan in acute myocardial infarction). JACC Heart Fail. 2014;2:159–65.PubMedCrossRefGoogle Scholar
- 60.Bossuyt N, Van Casteren V, Goderis G, Wens J, Moreels S, Vanthomme K, et al. Public Health Triangulation to inform decision-making in Belgium. Stud Health Technol Inf. 2015;210:855–9.Google Scholar
- 64.Joachims T. Text categorization with support vector machines: learning with many relevant. In: ECML-98; 1998. p. 137–142.Google Scholar
- 67.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.Google Scholar
- 73.Paynter R, Bañez LL, Berliner E, Erinoff E, Lege-Matsuura J, Potter S, et al. EPC methods: an exploration of the use of text-mining software in systematic reviews. Rockville: Agency for Healthcare Research and Quality (US); 2016.Google Scholar
- 76.Teich E, Fankhauser P. Exploring a corpus of scientific texts using data mining. Lang Comput. 2009;71:233–47.Google Scholar
- 78.Van Eck NJ, Waltman L. Text mining and visualization using VOSviewer. ISSI NewLetter. 2011;4:51–4.Google Scholar
- 79.Greene D, O’Callaghan D, Cunningham P. How many topics? Stability analysis for topic models. In: Machine learning and knowledge discovery in databases. Springer, New York; 2014. p. 498–513.Google Scholar
- 85.Rabatel J, Bringay S, Poncelet P. Mining sequential patterns: a context-aware approach. Advanced knowledge discovery management. New York: Springer; 2013. p. 23–41.Google Scholar
- 94.Song L, Yan H, Hu D, Yang J, Sun Y. Pre-hospital care-seeking in patients with acute myocardial infarction and subsequent quality of care in Beijing. Chin Med J (Engl). 2010;123:664–9.Google Scholar
- 100.Goderis G, Van Casteren V, Declercq E, Bossuyt N, Van Den Broeke C, Vanthomme K, et al. Care trajectories are associated with quality improvement in the treatment of patients with uncontrolled type 2 diabetes: a registry based cohort study. Prim Care Diabetes. 2015;9:354–61.PubMedCrossRefGoogle Scholar
- 102.Krummenauer F, Guenther K-P, Kirschner S. Cost effectiveness of total knee arthroplasty from a health care providers’ perspective before and after introduction of an interdisciplinary clinical pathway—is investment always improvement? BMC Health Serv Res. 2011;11:338.PubMedPubMedCentralCrossRefGoogle Scholar
- 111.Dely C, Sellier P, Dozol A, Segouin C, Moret L, Lombrail P. Preventable readmissions of “community-acquired pneumonia”: usefulness and reliability of an indicator of the quality of care of patients’ care pathways. Presse Médicale. 1983;2012(41):e1–9.Google Scholar
- 112.Klinkhammer-Schalke M, Lindberg P, Koller M, Wyatt JC, Hofstädter F, Lorenz W, et al. Direct improvement of quality of life in colorectal cancer patients using a tailored pathway with quality of life diagnosis and therapy (DIQOL): study protocol for a randomised controlled trial. Trials. 2015;16:460.PubMedPubMedCentralCrossRefGoogle Scholar
- 117.Noble SI, Nelson A, Fitzmaurice D, Bekkers M-J, Baillie J, Sivell S, et al. A feasibility study to inform the design of a randomised controlled trial to identify the most clinically effective and cost-effective length of anticoagulation with low-molecular-weight heparin in the treatment of Cancer-Associated Thrombosis (ALICAT). Health Technol Assess Winch Engl. 2015;19:1–94.CrossRefGoogle Scholar