Probabilistic Modelling and Decision Support in Personalized Medicine

The concept of personalized medicine, often called the biggest revolution in medicine, is becoming an emerging practice. The article presents personalized medicine in a broader context as an interdisciplinary issue covering the current trends of information and communication technology in medicine, legal aspects, and probabilistic network modelling. Employing the concept of probabilistic network reasoning means extracting the meaningful knowledge, mathematizing it, incorporating the particular patient information and then using inference mechanisms of the created mathematical model for personalized decision support. Bayesian networks can serve as a multidimensional decision support framework representing the real-world medical domain. Their power, together with the possibilities of global sharing of necessary medical knowledge, represents a promising approach of extracting new, often hidden, knowledge about the given medical domain and thus opens up new ways of achieving the delivery of personalized medicine. Establishing patient diagnosis and treatment prognoses are the critical issues in personalized decision support. Mathematical modelling is beginning to play an irreplaceable role here.


Introduction
The concept of personalized medicine, often called the biggest revolution in medicine, is becoming an emerging practice. Turning it into an individualized approach means using specific information about a patient's health condition to make a diagnosis, plan a treatment, or establish a prognosis.
This article presents personalized medicine in a broader context as an interdisciplinary issue covering the current trends of information and communication technology in medical imaging, legal aspects, and probabilistic network modelling.
Probabilistic network modelling [1] represents one of the most promising approaches. Nowadays, it can build on a broad spectrum of currently available knowledge reflecting the examined medical domain. It can be the specific knowledge of medical experts, known general principles of the domain/disease, detailed patient data, knowledge hidden in the vast digital medical datasets, etc. As medical data are mostly collected during daily routines and not as a result of coordinated research activities, the process of collecting and extracting the datasets necessary to build and evaluate the model becomes more difficult.
In short, employing the concept of probabilistic network reasoning means extracting the meaningful knowledge, mathematizing it, incorporating the particular patient information and then using inference mechanisms of the created mathematical model for personalized decision support. In practice, the following initial steps must be performed: • development of a framework enabling the efficient collection of relevant experts' opinions and knowledge from clinical practice about the cases close to the disease under study; • construction of a probabilistic graphical model via a probabilistic machine learning approach or manually based on explicit medical expert knowledge if there is a lack of data; • the expression of the initial health state of a particular patient via a probabilistic distribution.
Probabilistic graphical models are used to answer complex questions that are difficult to solve using traditional probabilistic approaches. One of these models' critical features is their explanatory capabilities, which are essential for discussions with domain experts in the phase of model construction and when communicating computed results. An accurate understanding of relevant human biological mechanisms, interoperability principals in the healthcare domain, and correct understanding of mathematical modelling with its inference capabilities stand behind the successful implementation of the decision support system in the healthcare area focusing on individual patient care, i.e., diagnostic and prognostic reasoning, treatment alternatives modelling, etc.
Personalized decision support systems based on probabilistic graphical models can be integrated with computer-assisted decision support. The critical question is how to incorporate its suggestions into the decision-making process. The role of experienced radiologists will remain irreplaceable, especially in the development of relevant knowledge datasets, data labelling, and in providing additional structural knowledge necessary for model building. Traditionally, experienced radiologists label medical case studies comprising learning material for students in medical faculties or young radiologists in hospitals. The same content can, in principle, serve as reference case studies when categorizing the imaging study of the diagnosed patient.
Medical imaging plays a critical role in diagnostics. Increasing its effectiveness in rare diseases' diagnostics via the employment of appropriate deep learning methods is one of the biggest challenges these days. The regional collaboration of healthcare facilities, especially those with smaller patient populations, is necessary to exploit its potential.
A diagnostic process can be more or less risky, more or less invasive, more or less costly, etc. Appropriate use of suitable artificial intelligence methods enables a highly personalized combination of diagnostic procedures, taking into account a patient's specific health state, and achieving greater diagnostic accuracy while avoiding more invasive or other riskier methods.
Accelerating research in this area also means exploring appropriate organizational models, a framework enabling the collection of relevant information, making it available for clinical practice, and protecting patient privacy at the same time. However, a broad spectrum of related legal questions also arises. For example, who is at fault when the recommendation of the decision support system is wrong?

Related Trends in eHealth
A multidisciplinary approach is an alternative to coping with the ever-increasing complexity, dynamism, and variability of today's healthcare. Emerging scientific disciplines are often methodologically linked to the sciences based on which they were created, but their achievements inspire these sciences retrospectively. One example is medical informatics, an applied science that designs new progressive procedures for many medical problems and contributes to developing healthcare knowledge. Interdisciplinary research in healthcare focuses on the development of knowledge systems, the intelligent use of the experience stored in health databases, the development of new telemedicine technologies, and especially the improvement of diagnostic and therapeutic processes. Therefore, it is necessary to integrate IT, medical, biomedical, legal, mathematical, and economic knowledge and find opportunities for their use in the environment of medical practice, teaching, and research. The daily routine of healthcare information systems and the management of health documentation in electronic form are addressed by many national and international legal standards. From the very beginning, in addition to the limits of current information and communication technologies, it is also necessary to take into account, carefully consider, and understand the legislative restrictions so that the initial considerations respect the relevant legal norms arising from the legal system. Today, the most progressive healthcare institutions use advanced software systems to support decision-making, sophisticated image processing techniques that make it possible to search for similar cases (evidence-based medicine) in available knowledge databases, and a whole range of methods of artificial intelligence. However, applications are often run at a mode that does not support standard medical communication protocols or does not allow the transfer and subsequent use of the information obtained outside a hospital.
The direction of further development of information systems in healthcare can be characterized by patient orientation and the possibility of global access to medical data for shared care. The goal is the facilitation and acceleration of the correct diagnosis formulation, the elimination of repeated examinations, saving the time of the patient and the doctor, etc.
Hospitals and medical research centers form a unified computerized environment these days. Besides traditional healthcare activities, they also cooperate in the area of research. For research purposes, it is necessary to have access to an extensive knowledge database of case studies [2]. Much more important than networked technologies are growing networks of medical specialists. They change their traditional thinking, cooperate on the regional level [3], share specific domain knowledge, information about their patients, etc.

Interoperability
The knowledge sharing assumes effective communication of clinical information systems of cooperating healthcare institutions. Picture Archiving and Communication Systems (PACS) with associated add-on applications providing secure sharing, communication, or other specific functionalities play a crucial role.
One of the necessary conditions for successful communication is the ability to understand the transmitted information correctly. A prerequisite for correct understanding is the structured form of the transmitted medical records related to international standards and classifications. The principle of classification is creating classes consisting of concepts that coincide in a given classification attribute. The aim of the classification is then to classify the object into a particular category. There are currently many classification systems, many terminologies, and healthcare standards, but none cover the full range of needs. The following classification systems are among the most relevant.
International Classification of Diseases (ICD) [4] was created under the auspices of the World Health Organization (WHO). According to this system, diseases are classified into 21 primary groups. Each item consists of two codes. The first code identifies the underlying disease, and the second code identifies the location or complication.
Systematized Nomenclature of Medicine (SNOMED) (IHTSDO) is a worldwide standard of clinical terminology used in IT applications in healthcare [5].
Systematized nomenclature contains more than 350,000 terms from the field of healthcare. Due to its scope, it is primarily intended for use in machine processing of medical information, electronic medical documentation, decision support, etc. The system also includes mapping to other classification schemes such as the ICD. This comprehensive classification system describes individual situations in medicine using six levels. The different levels represent a description of the case regarding topology, morphology, etiology, function, procedure, and syndrome. By combining the SNOMED and Clinical Terms classifications, the SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms), currently the most complete classification in medicine, was created.
Digital Imaging and Communications in Medicine (DICOM) [6] communication standard is a comprehensive international standard for exchanging digital image data. In connection with the development of new technologies, the standard is continuously expanding. The DICOM standard includes a communication protocol and a description of the medical image data format. It defines both the structure of image data and the method of exchanging this data. The DICOM standard defines rules for coding, transmission, and storage of diagnostic descriptions of image studies.
The DICOM standard's structured report [7] consists of a hierarchically arranged structure of information objects containing text and links to pictorial and other relevant data. Each information object has a name and a unique code that allow for an accurate search. The structured report consists of a header and its content, i.e., a diagnostic description of the imaging study. The most important feature of the DICOM standard's structured report is that it is an autonomous object, independent of the information systems used by the specific healthcare institution where it is processed. The broader application of the structured report is conditioned by a change in traditional image information processing methods. Additionally, in comparison with unstructured information, it allows one to easily find and further process the required information using linked software tools. It allows for the placement of a link to relevant image data, including the reason for the reference, and to include the Presentation State object, which defines the display parameters (contrast, zoom, orientation, etc.) of a specific image. It also allows references to other information sources (previous image examinations, previous descriptions, previous measurements, etc.) being taken into account when creating the diagnostic report.

Legal Challenges
The primary major legal concern is obviously the topic of personal data. Data protection is one of the fundamental human rights. Foremost it is guaranteed by the International Covenant on Civil and Political Rights [8]. This multilateral treaty, which has been ratified by most countries in the world, deals with privacy in its article 17. According to it no one shall be subjected to arbitrary or unlawful interference with his privacy, family, home or correspondence, nor to unlawful attacks on his honor and reputation and everyone has the right to the protection of the law against such interference or attacks. In European context, currently the most crucial treaty to protect human rights and fundamental freedoms is the European Convention on Human Rights [9]. The importance of this treaty is based especially on its strong enforcement mechanism though European Court of Human Rights (which for example the Covenant on Civil and Political Rights is lacking). This convention constitutes right to respect for private and family life in its article 8. According to it everyone has the right to respect for his private and family life, his home and his correspondence. The European Court of Human Rights considers in his judicature the protection of personal data, in particular respecting the confidentiality of health data, to be a vital principle in the legal systems of all the Contracting Parties to the Convention [10]. Council of Europe is also responsible for the creation of the Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data (also known as the "Convention 108"). This convention deals in greater detail with personal data undergoing automatic processing [11].
Privacy of individuals needs to be protected especially in connection with challenges posed by new technologies. Each system dealing with personal data must comply with the personal data protection legislation. In the EU, this means that the General Data Protection Regulation (GDPR) rules for such processing must be followed (Europen Union 2016). The GDPR establishes a single legal framework for the protection of personal data across the Union [12].
The GDPR defines personal data as "any information relating to an identified or identifiable natural person ('data subject')". A natural person is considered identifiable if they can be identified, in particular by reference to a certain identifier (such as a name, identification number, location data, online identifier, etc.), or by one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
In its decisions, the European Court of Justice is interpreting this definition rather broadly. According to them, information should be considered personal data even if the identity of the natural person is not clear from the data itself, but it is possible to identify such person by combining that data with additional data that the data controller or data processor is able to obtain [13].
In the context of personalized medicine, it will first need to be determined which parts of the system involve dealing with personal data. The mathematical model itself will not be problematic on its own, though there might be personal data involved in its development. On the other hand, the usage of the model in practice will likely involve dealing with personal data, more specifically the data of the patients.
Because of the GDPR rules, processing of personal data is lawful only if one of the legitimate reasons prescribed in the GDPR is present. Determining the legitimate basis for processing is thus crucial. The controller of the system should assess and evaluate the nature, purpose, scope and context of the processing of the data, and determine the condition which best ensures its legitimacy. As we are discussing data concerning health, article 9 of the GDPR must apply.
Article 9 allow processing if the data subject has given explicit consent (letter a). However, legitimization of the processing of personal data on the basis of the consent of the persons concerned is problematic, as it might be difficult to obtain such consent, which must be free, informed, concrete and unambiguous (article 4, paragraph 11), and also any consent may be revoked by the data subject at any time. It will likely be better to consider usage of legitimate purpose under letter h, which allows processing if it is "necessary for the purposes of preventive or occupational medicine, for the assessment of the working capacity of the employee, medical diagnosis, the provision of health or social care or treatment or the management of health or social care systems and services on the basis of Union or Member State law or pursuant to contract with a health professional"; in this case data must be "processed by or under the responsibility of a professional subject to the obligation of professional secrecy under Union or Member State law or rules established by national competent bodies or by another person also subject to an obligation of secrecy under Union or Member State law or rules established by national competent bodies." Also we could consider if it is not appropriate to use the legitimate purpose of scientific research (letter j).
In our context it will thus be mainly necessary to determine whether providing personalized medicine can be considered to be part of providing healthcare. If so, it will be much easier to conclude that we have the legitimate interest required by the GDPR to process personal data. The specific details regarding what can be considered part of providing healthcare might differ according to national legislation in specific countries. In general, we believe that providing personalized medicine should be within the scope of what is covered by the legitimate purpose under Article 9, letter h, as it is directly connected to the provision of healthcare to the patients for which the medicine will be personalized.
Obviously, even if we can safely conclude that the discussed processing is lawful in general, other appropriate rules set in the GDPR must be respected, such as that it is allowed only to the extent strictly necessary and proportionate, or that the processed data must be protected.
The obligation to protect the data is written in Article 25 of the GDPR. According to the Article, the controller shall, taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures designed to implement data-protection principles, such as data minimisation, in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this regulation and protect the rights of data subjects.
One of such measures can be such as pseudonymisation. Pseudonymisation is defined in the GDPR as "processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person". The management of database we are using for personalised medicine is implemented in such pseudonymised manner when possible.
Appropriate measures to protect the system from unauthorised access must be implemented both to protect personal data, and to ensure the functionality of the system. Running IT services and systems is accompanied by serious legal responsibilities. In the event of disruption of the services provided, a contractual liability might be an issue. And of course, in the event of a data breach, the legal situation is even worse. The amount and potential sensitivity of the data transferred and stored is enormous. Negligence can possibly be a valid claim. The steps the organization takes to prevent the success of the attack will impact its liability. Reasonable care by the IT professionals must be taken to protect against possible attacks [14].
Unquestionably, defending against "all" possible attacks puts enormous strain on resources. That is why risk management and evaluation are relevant. It might be impossible or unrealistic to eliminate all risks, but at least reasonably foreseeable risks should be addressed. How much effort should be spent on their prevention depends on the evaluation of their likelihood, and of the costs associated with the risk of their success compared to the costs to mitigate such an attack.

Probabilistic Modelling and Decision Support
The emerging trend of personalized medicine is closely related to rapid developments in the area of machine learning. Some of its tools are becoming highly topical as they require computational power recently virtually unattainable but currently realistically available.
Decision making, or reasoning in general, under uncertainty is a big challenge of artificial intelligence and probabilistic modelling. There is a family of probabilistic graphical models enabling reasoning under uncertainty based on strong mathematical foundations. Currently, there is a plethora of successful applications in many problem domains, including medicine. One of the most promising approaches in our area of interest are Bayesian networks [15].
Mathematically speaking, the Bayesian network compactly represents the joint probability distribution of random variables encoding the domain. The Bayesian network consists of a directed acyclic graph expressing all the necessary dependencies among the random variables, the so-called qualitative part of the model, and the model's local probability distributions parameters, specifying the probabilistic relationships among the random variables, the so-called quantitative part of the model. Each random variable must be conditionally independent of its non-descendants given its parents in the oriented graph.
In the simple case of the "and" relationship among all the parent nodes the individual conditional probability entries can be calculated as In the simple case of the "or" relationship among all the parent nodes the individual conditional probability entries can be calculated as The joint probability distribution in factorized form can be represented as follows A deeper foundation of probability theory and mathematical statistics is needed to fully understand and apply this modeling approach.
Bayesian networks as a probabilistic framework naturally handles uncertainty in the form of individual health-related factors' description, description of probabilistic relationships between factors, and even the network structure itself at the input. In particular, it provides correctly calculated probabilistic outputs.
The model's graphical nature brings intuitiveness and good understanding by the domain experts, i.e., medical doctors. Nodes of the graph, random variables of interest, represent the health-related factors, for instance, the factors influencing the possible decision. Directed links in the graph can represent statistical as well as causal dependencies among the random variables ( Fig. 11.1).
Its probabilistic inference capabilities are critical features for clinical decision support making. The probabilistic inference results can be, for instance, in the form of the so-called posterior distribution of the random variables representing the desired patient's health factor conditioned on the other health-related factors observed in the patient.
There are two basic alternatives for how to construct the Bayesian network. The model can be constructed manually, based on precise knowledge of the medical domain of interest. An irreplaceable role is then played by medical experts in the medical domain and mathematical experts in the field of probabilistic network models.
Bayesian networks can also be learned from the available datasets. This way, we can learn the structure and the parameters of the network without the explicit medical expert knowledge. In this case, a number of prerequisites posed on the data must be met. It is assumed that individual cases comprising the learning dataset are independent, do not influence each other in any way, etc. The critical issue in the learning process is the identification of local conditional dependencies and local conditional independencies among random variables representing the desired factors in the model. There are many approaches how to learn the Bayesian networks currently implemented. Available software packages make it possible to significantly streamline this demanding activity .
The learning process of Bayesian network consists of two phases, learning the graphical structure of the network and learning the parameters of the network, i.e., Due to massive digitization in healthcare, there is an enormous potential of databases comprising the prior domain knowledge, i.e., millions of medical image examinations, historical medical records, known outcomes of prescribed treatment, etc. Past patient records, often from different healthcare institutions where the patient has been treated, provide essential input data for model creation and evaluation. The selection of appropriate cases from the available datasets must be in line with the purpose for which the model is to serve. All the key factors that are supposed to form the final model, including the desired values, must be identifiable in the selection.
There is and still will be a lot of questions to be answered concerning the optimal complexity of the Bayesian network, i.e., the complexity of relationships among the variables representing the factors of the given medical domain under study. In general, the construction of the Bayesian network is an iterative process refining the model step by step, i.e., when additional knowledge or data is coming. It should be noted that in specific situations also some more straightforward modelling techniques need to be considered.
Bayesian networks belong to the family of so-called nonparametric models. As such, the learning process identifying the probabilistic relationships among the random variables representing the desired health-related factors needs a sufficient amount of data. Employing the Bayesian network modelling approach in cases of rare diseases, i.e., having a dataset of limited size, is a challenge. In this situation, the human expert knowledge plays a critical role in the Bayesian network learning process. As such, it can be supplemented by information from relevant complementary sources, literature, and, where possible, combined with more feasible alternative, learning from available data.
Having the Bayesian network of a given medical domain properly established, we can make inferences. The significant advantage of Bayesians networks in the medical domain is their ability to make inferences even with incomplete/missing data, which can be of great importance when modelling the problem areas of rare diseases.
The basic principle of the network is its ability to recalculate the posterior distribution of unobserved desired variables using the prior distribution and having the distribution of observed variables. The patient diagnosis in the Bayesian networks terminology can be informally described as such a value assignment to a subset of variables representing the disease that maximizes their posterior conditional probability distribution given the evidenced value assignment of variables representing the symptoms and other available manifestation of the disease observed in the patient. In other words, a diagnosis of the patient, including its probability, can be interpreted as the most probable assignment of the concerned variables within the model as defined above.
Similarly, the patient treatment prognoses in the Bayesian networks terminology can be informally described as the posterior conditional probability distribution of a set of variables representing the desired aspects of the patient health condition after treatment given the value assignment of variables representing the selected treatment and value assignment of variables representing the symptoms and other available manifestation of the disease observed in the patient.
The above tasks can be extremely computationally intensive, so until recently it was difficult to apply them in practice.

Conclusion
To summarize, the power of the Bayesian networks, together with the possibilities of global sharing of necessary medical knowledge, represents a promising approach of extracting new, often hidden, knowledge about the given medical domain and thus opens up new ways of achieving the delivery of personalized medicine.
Firstly, the Bayesian networks' ability to capture the existing empirical evidence of a given domain or the disease under study in its complexity and their unique inference mechanisms overcomes the limitations of still prevailing traditional approaches of mathematical statistics. The Bayesian networks allow us to calculate the distribution of the target variable representing the desired health information based on evidenced data of the patient. The target variable can provide personalized predictions, supportive information for personalized decision making, etc. In order to quantify and then compare different treatment strategies, the Bayesian networks' formalism must be extended to include necessary mechanisms from the decision theory.
Secondly, the global cooperation and collection of relevant data for model building eliminate possible misleading results caused by inherent bias when the data comes from only one healthcare institution. The resulting model can be, for instance, validated against the data randomly selected from the databases of different institutions. Incorporating regional and international data into the learning process can further improve the prior distribution and, consequently, the quality of personalized decision support.
This approach benefits from the use of huge international databases, reciprocally then enables the international transmission of complex knowledge about particular medical domains or understanding of diseases under study, including the delivery of meaningful personalized predictions and decision-making support.
One of the main challenges of formal mathematical representation of specific medical domain knowledge is capturing the causality among the factors relevant to diagnostic and prognostic processes. Bayesian networks based on casual relations between the health-related factors enable so-called causal reasoning, i.e., answering causal questions addressing clinical issues. The Bayesian network equipped with a suitable user interface can, for instance, serve as an expert system interactively queried by the diagnosing physician.
The real challenge is discovering something quite new, somehow shift the domain knowledge. In the Bayesian networks terminology, it is called the discovery of latent variables. The Bayesian network is an effective modelling tool for this kind of reasoning. It naturally eliminates the shortcomings of traditional methods of mathematical statistics. Establishing patient diagnosis and treatment prognoses are the critical issues in personalized decision support. Mathematical modelling is beginning to play an irreplaceable role here.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.