Hyperthyroidism therapy: What can decision support systems already achieve?

Currently, only a few theoretical support systems exist for the treatment of hyperthyroidism. They are typically not practically applicable and solely focus on Graves’ disease. The recently developed DigiThy software framework can be used to assist physicians for methimazole dose titration during the treatment of Graves’ disease. In this study, a pool of 60 virtual patients was created to compare physicians’ individual treatment approaches by 8 different physicians and students (including three colleagues, unexperienced with care of Graves’ disease) with the decision support system DigiThy in terms of already defined performance indices. These indices are used to assess the deviation of FT4 from the reference range throughout the treatment. The computer aided treatment algorithms outperformed the usual care approach according to different prespecified criteria for treatment success. Two out of the three unexperienced colleagues improved their treatment success over time, i.e. with more patients treated. In conclusion, our findings suggest that the DigiThy software may be a useful tool for use as a decision support system in routine care of patients with Graves’ disease, while also serving as an effective training tool for the education of physicians. Randomized controlled studies are required before implementation of DigiThy in daily clinical practice.


Relevance
About 20 to 30 persons out of 100,000 of thepopulationsuffer annuallyfrom Graves' disease, an autoimmune disease affecting the thyroid gland [1, p. 430].Women are usually more frequently affected and have a lifetime risk of 3%, while men have a lifetime risk of 0.5%.Graves' disease is also the main cause of hyperthyroidism [2].
This study will therefore focus on support systems made for the treatment of Graves' disease.

Introduction to Graves' Disease
In the case of Graves' disease thyroid-stimulating antibodies produce an overactive thyroid behavior.The antibodies bind to the thyrotropin (TSH) receptor and activate it [3, p. 1236].Due to this stimulation by TSH receptor antibodies (TRAb), hyperthyroidism often develops as the main consequence [4, p. 373].The majority of studies on Graves' disease only determine TRAb antibodies in general, without further distinguishing between stimulating and blocking ones.However, "antibodies against the thyroid-stimulating hormone receptor (TSHR) can activate or block the function of the receptor directly causing hyper-or hypothyroidism, respectively" [1, p. 58].In patients with Graves' disease, the stimulating antibodies (TSAb) are detected with a probability of over 95% [1, p. 438].Given that Graves' disease is the primary cause of hyperthyroidism, it can generally be assumed that the stimulating antibodies are significantly more prevalent, and thus the blocking antibodies (TSBAb) can usually be neglected.

Difficulties due to antibody measurement and lack of data
In everyday clinical practice, no distinction is typically made between the two types of antibodies, and their quantities are measured collectively.This introduces an element of uncertainty in the development and validation of mathematical models [5].Such mathematical models typically serve as the foundation for the development of support systems, as will be demonstrated later in this article.Studies that explicitly deal with the distinction between antibodies are scarce, but would be very helpful for many approaches.In this context, [6], it was confirmed that most patients have stimulating antibodies and only few have blocking ones.However, it was also shown that two out of 98 patients underwent a complete transition from TSAb-positive hyperthyroidism to TSBAb-positive hypothyroidism.It is likely that the quality of recommendation systems for hyperthyroidism therapy could be improved if it would become standard practice to measure antibodies at each visit and distinguish between their characteristics, since support systems heavily rely on measured blood values and estimate their influence on the further development of the disease.In general, the data quality often complicates the utilization and development of support systems.Based on the experience with data from the Graz Endocrinology Registry Study, as well as data from publications by others [7,8], it can be stated that the available evidence on this issue is limited, which makes the verification and development of models challenging.In one registry study, for example, antibodies were often not measured, but TSH, FT3 and FT4 were measured each time, while in another investigation [7], units were mixed up, ex-Hier steht eine Anzeige.

K
Fig. 1 8 Initial dose recommendations according to the American Thyroid Association.Please note that the upper limit of the reference interval must be known for the presentation of the recommendation.Reference ranges refer to the population, which was the basis for the determination, and laboratory methods [23].Accordingly, the exact value for the upper and lower limit of free T4 in a healthy person may vary slightly from clinic to clinic.For this work, a reference interval for free T4 from 9.5 pmol L −1 to 24.0 pmol L −1 was assumed treme time intervals between treatments occurred, and the appointments were not recorded precisely by date.Additionally, it was sometimes only noted in which month a patient visited the clinic.Furthermore in another study [8], only data restricted to the FT4 values were published.

Difficulties in treatment due to high individuality of patients
One of the reasons why the treatment of thyroid diseases is challenging is that the HPT-axis has a highly individual "healthy setpoint" varying significantly from patient to patient [9].In one study [9], sixteen healthy patients were investigated to demonstrate how much the normal concentrations of FT3, T4, FT4 and TSH differ between individuals.Moreover, these values change over time, even for the individuals themselves, albeit to a lesser extent [9].Due to the strong inter-individuals variability, the ultimate goal proposed [9] is to determine the individual setpoint of a person on the basis of genetic analyses and thus aim for personalized medicine.Support systems based on control engineering methods, as in [5] and [10], can also allow individualization of the treatment by setting the targeted hormone concentration setpoint via the control va-riable.In summary, the individual variability of the disease process represents the greatest challenge not only for standard care treatment but also for decision support systems.

Guidelines
Recommendations from professional medical organizations can be interpreted as a simple form of support system.However, few are available and doctors mostly have to rely on their personal experiences regarding dose titration.Among the limited specific recommendations for the treatment of Graves' disease, one is provided by the American Thyroid Association, suggesting "5-10 mg if free T4 is 1-1.5 times the upper limit of normal; 10-20 mg for free T4 1.5-2 times the upper limit of normal; and 30-40 mg for free T4 2-3 times the upper limit of normal" [11, p. 1355].
This recommendation assigns suggested dosages to different FT4 intervals and is shown graphically in .Fig. 1.This plot highlights that the suggested dosing at 48.0 pmol L −1 may be non-practicable.If one were to adhere strictly to the recommendation, a patient with 47.9 pmol L −1 would be prescribed 20 mg MMI, while a patient with 48.1 pmol L −1 would receive a prescription for 30 mg.The measurement of FT4 is strongly dependent on the measurement method.For instance, in one study [12] the FT4 values of the same patients, measured by using two different methods, correlated only with a Spearman correlation coefficient of r = 0.75.If the underlying measurement of FT4 is subject to such uncertainty, the discontinuity in dosage at 48.0 pmol L −1 seems unjustified and inconsistent.

Computer aided support systems
For more advanced computer-aided support systems, a mathematical model is usually required.In order to be able to assess what support systems for hyperthyroidism therapy can already achieve, it is important to assess how well the disease can be described mathematically.
Dietrich et al provides a detailed overview of methods for the mathematical description of the HPT-axis [14].However, it is mentioned that only a few mathematical models are actually used in clinics [14].Many of the models do not consider the antibodies that lead to Graves' disease [15,16].Both models showed in [15,16] are highly sophisticated, using a large number of differential equations and patient specific parameters to model thyroid behavior.Due to the high number of parameters, these models are primarily evaluated qualitatively rather than quantitatively, since tuning numerous parameters present challenges in validation with real patient data.If the number of observations, i.e., blood measurements at a single appointment, is smaller than the number of model parameters, generally one encounters many possible solutions for patient-specific model parameters [17].This, in turn, renders support systems unable to accurately predict future hormone development.Given the significant intra-individual variability among patients and the sporadic measurement of blood parameters-usually performed every few weeks-it is impractical to employ models with dozens of internal patient parameters as clinical support systems.
Nevertheless, it must be noted that these complex models provide an important insight into the disease process and can reflect various disease patterns.For instance, one study [10] demonstrates a possible treatment for Graves' disease using the drug methimazole, administered by a model predictive controller (MPC).The disease process could be modelled using the model shown in the supplementary material of [18], which builds upon the preliminary work of [15,19,20].Although the antibodies that cause Graves' disease are not directly present in the model, the disease process can still be simulated by adjusting parameters such as secretion capacity and introducing a relationship between methimazole and the activity of TPO [10].This makes that study [10] one of the few sources proposing a support system for treating Graves' disease.Unfortunately, it is stated in [10] that this support system can only be used if the hormone concentrations can be measured daily, if the results of the measurements are available immediately and if all states of the model can be measured.Moreover, for this type of MPC, the individual patient parameters would also need to be known.Consequently, the practical application of this support system currently remains not feasible for use in clinical routine.
Another model describing the progression of Graves' disease is provided by Pandiyan et al. [21].This model has 13 patient-specific parameters and has been validated with patient data.Although this model in principle allows for its use as a predictor and thus as a support system, the paper [21] does not explicitly carry out this application.In [22] it was demonstrated that certain inconsistencies in the used input function render it unfit for use as a support system.
Meng published an approach to determine the dose amount for Graves' patient in 2019 [8].This approach does not take into account the antibody concentration.Moreover, the model's structure can lead to drastic changes in FT4 levels in very short periods of time.Such rapid changes would not be possible in real life, considering that the known half life of FT4 is approximately seven days [4].Meng mentions that half-life can shorten to 3-4 days in the hyperthyroid state, but even then the decreases obtained from the model are too rapid.
In conclusion, only the publications [5, 8, 10, 21] provide a foundation for support systems in treating Graves' disease, as indicated by the literature search.Among these, [5] is the only approach that is explicitly designed for a practical use, as it is able to make a decision based solely on FT4 measurement despite long time intervals between follow-up measurements.
In summary, there are currently very few support systems available and none of them is feasible for clinical use.

Test framework
The DigiThy framework, introduced in [5] can be used in several ways as a support system in the treatment of Graves' disease and is available, after registration, as free web application under https://thyroid.tugraz.at/.In total it can serve the following use purposes:

Study design
The aim of the present study is to compare the treatment quality of an existing support system, DigiThy, with the treatment of physicians at different experience and education levels.

Patients, study participants and support systems
Within the DigiThy framework 60 virtual patients were created and assigned to 8 doctors, each with different levels of experience in treating Graves' disease, see .Table 1.The 60 patients were randomly generated by DigiThy, resulting in a wide range of patient-specific parameters.The progression of antibodies also varied among these virtual patients.For some patients, the disease disappeared over time.In these cases, only a few antibodies remained present, leading to such low thyroid stimulation that the FT4 concentration returned to the normal range, even without medication.In such treatments, also a "healthy" TSH concentration is restored.In other patients, the antibodies remained permanently too high, which means that they needed a permanent inhibitor, i.e.MMI, to keep the FT4 concentration within the reference interval.
In addition, there are two computeraided support systems that can treat the virtual patients.The characteristics CATT-V2 Uses successive estimation of patient parameters and subsequently predicts the future development of FT4.On the basis of this prediction an optimal dose is calculated for the next treatment period.Note: Unless otherwise stated, the algorithms use a control interval of 14 days in the first step and 28 days in the second step and for the following treatment periods.All deviations from the center of the reference range, i.e.FT 4 = 16.75 pmol L −1 , are penalized and the dose level is also included.The more dose needed, the higher the value.Therefore, it is considered "better" if a smaller dose was used.J 3 Only FT4 concentrations higher than 19 and lower than 14.5 pmol L −1 are penalized Note: The smaller the value, the "better" the treatment.a Further explanation and formulas accessible in [5] of these algorithms are summarized in .Table 2.
The algorithms operate on the same level of information that would be available in a real-world treatment setting, specifically: -The real patient-specific model parameters are not transferred to the algorithms for the treatment process and must be estimated.-The algorithms only receive the blood values taken during control appointments and not the in-between hormonal fluctuations.
Indeed, doctors utilizing the DigiThy framework with virtual patients can view the blood concentration from the simulation between treatments, providing them with more information than the computer-aided support systems have.It should also be noted that the implemented algorithms can treat patients based solely on FT4 values, rendering them independent from TSH and TRAb measurements.Therefore, these additional measurements could potentially be omitted when utilizing support systems in the treatment of real patients.

Procedure
Each of the 8 study participants treated all 60 virtual patients.As mentioned earlier, some patients can recover and others cannot.The study participants therefore have the opportunity to decide when to finalize the treatments.This can happen under two circumstances: 1.The patient's thyroid hormone levels have stabilized within the healthy reference interval and the patient no longer requires treatment.The physician can then mark the treatment as completed.2. The patient still requires MMI after several months or years of treatment.
The doctor can mark the treatment as completed and refer the patient for e.g.thyroidectomy.
Each doctor, therefore, will have a different treatment duration for each patient, depending on their assessment and the patient's response to treatment.The algorithms then treat the 60 patients using the treatment duration of the physician with the best performance, according to the performance indices.
The participants without experience were also surveyed to rate the usefulness of the simulation framework as a learning platform.They were asked to choose from the following options: very helpful helpful little helpful not helpful In the simulation framework, participants could theoretically set any number of follow-up, thereby conducting virtual blood tests for patients at any desired frequency.Accordingly, a participant could also choose an interval of 1 day between each treatment.However, participants were instructed to select intervals in Dig-iThy that they would realistically employ in clinical practice.Therefore, while the intervals were not strictly determined by the study design, they were implicitly constrained by the intention to reflect real-world clinical practices.This was done to ensure that the study provides a practical and realistic representation of medical practice while still giving doctors some freedom.Participants without experience could find out from guidelines which intervals are common.

Comparison
In principle, there are various possibilities to compare the quality of treatment over several weeks and months.In order to obtain a suitable comparison, three differ-ent performance indices were introduced in [5].All these indices assess in different ways how much the FT4 concentration deviates from the center of the reference interval.In general, these indices are designed such that "smaller" values reflect "better" treatments.The used dosage can also be considered, as in performance index 2.This index not only assesses the FT4 deviation, but also takes into account the use of the drug and favours treatment approaches that require a lower dosage.The meanings of the individual ratings are summarized in .Table 3.

Results
. Table 4 shows the comparison between the computer aided support systems and the study participants, using the performance indices previously defined.In this context, the mean values and standard deviations for all 60 patients are presented.In addition, the mean and median values of the selected treatment intervals are given.
The interpretation of the results is discussed in Sect. 4.

Results of inexperienced physicians
The results of the three study participants without any treatment experience are illustrated in .Fig. 2 using the performance index J 1 .The participants approached the task in a very structured way by first reading through the medical guidelines.As previously discussed in Sect.1.5.1, however, these guidelines are often vague.As shown in the figure, two of the three inexperienced users (Study Users 1 and 5) visibly gained experience, as the performance index decreased over time (indicating "better" treatments), reflecting their adoption of more advanced dosing strategies contrary to the somewhat "simplistic" medical recommendations.Especially after the particularly difficult treatment of patient 21, both appeared to have gained enough experience to manage such complex disease trajectories appropriately.For Study User 5 in particular, the quality of treatment stabilizes considerably after this case.Overall, the results demonstrate that the DigiThy framework can serve as a training platform for inexperienced doctors.
Hier steht eine Anzeige.Qualitative feedback from the participants

K
The endocrinology experts stated that they found the tool valuable due to its high potential to assist inexperienced doctors in their learning process.It was noted that it is often very difficult to familiarize young doctors with the treatment of Graves' in everyday clinical practice because the feedback from patient followup visit is often received only after weeks or months.According to the participating thyroid experts, support systems that have the ability to simulate diseases can have great potential in teaching.Participants without prior practical experience were asked to evaluate the platform as a learning tool.The results of this evaluation are listed in .Table 5.The evaluation ranges from little to very helpful.One of the participants noted that an exclusive focus on blood values does not eliminate the necessity of clinic visits, since this approach seems to be too simple, as it fails to consider the impact of concurrent medications, as well as patient compliance.The simulation framework presently assumes perfect adherence, with patients invariably taking their prescribed medications, which is not always the case in real-life scenarios.
As mentioned earlier, it is crucial for the development of support systems that the underlying mathematical model approximates the course of the disease sufficiently well.This has already been demonstrated in [5] using clinical data.Nevertheless, some of the qualitative feedback from the participants noted that some of the digital cases had characteristics that would not or rarely occur in everyday clinical practice.The participants themselves often emphasized that these observations are very subjective.Nevertheless, some of the statements are mentioned below, as they can open up further research questions, which in turn can lead to improved simulations and subsequent support systems.
The following effects were mentioned: -More frequently than in everyday clinical practice, patients are already within the reference range for FT4, even though the antibody concentration is still elevated.patients.Worth noting is that the expert who did not note any unrealistic effects scored best in all 3 indices among all study participants.Since this participant never had the subjective feeling that the simulation showed unexpected effects, fewer "mistakes" happened.Study participant 8 was the participant who chose the longest intervals.The participant pointed out that intervals longer than 8 weeks can lead to difficulties both in simulation and in reality.Accordingly, the participant considered the DigiThy framework an excellent learning opportunity to ascertain the most effective intervals.

Conclusion
The DigiThy system can be operated using two different automatic dosing algorithms.Based on 60 patients, both algorithm perform better, in terms of all 3 performance indices, than any of the study participants.This can even be achieved with those versions that use treatment intervals of 5 weeks for indices J 1 and J 3 , see CATT-V1-35D and CATT-V2-35D in .Table 4.For J 2 only one expert performs better than CATT-V1-35D and only two experts perform better than CATT-V2-35D.
. Table 6 illustrates how doctors with no previous experience and the experts perform on average according to the performance indices.The experts perform better and have slightly more variation in their treatments.
Nevertheless, all algorithms, even those that apply a 5-week interval, perform better than the best participant.Based on the feedback from the doctors, the simulation framework, while not flawless, is perceived to be realistic and of high quality.Since the algorithms performed noticeably better than the experts, this can be taken as a strong indication that they would also perform very well in everyday clinical practice.
The experts' comments about surprising effects within the simulation mainly concern effects in edge cases.As stated in [5], the patients are determined by 8 patient-specific parameters, which must lie within previously defined limits.By refining these boundaries, a closer alignment between the computer simulation and reality can be achieved, subsequently optimizing the foundation for support systems.

Discussion
A thorough review of the literature has revealed only a sparse number of available treating support systems for hyperthyroidism.Since Graves' disease is the main cause of hyperthyroidism, the focus was placed on support systems for this disease.Among the few theoretical approaches, there are almost no support systems that are practically applicable.
This study using 60 identical virtual patients could show that the support system DigiThy is able to treat the patients "better" based on predefined performance criteria compared to study participants, which included endocrinology experts with years of experience.The outcome suggests that support systems could offer significant advantages in routine clinical settings.
However, such support systems are usually based on mathematical descriptions and some study participants have sometimes perceived the digital descriptions of treatments as insufficiently reflecting real-life experiences in the clinics.While support systems show immense promise, future work must focus on rendering digital simulations even more realistic.This primarily necessitates the availability of real patient data.
In conclusion, the present support systems, as demonstrated in this study, appear to be very suitable for use in daily clinical routine.Validation through a clinical study is planned as a next step within this project.

Thomas Benninger
Institute of Automation and Control, Graz University of Technology Inffeldgasse 21/B, 8010 Graz, Austria benninger@student.tugraz.atAcknowledgments.We'd like to thank all study participants for investing their time in treating the virtual patients.Feedback from medical professionals as well as inexperienced doctors is extremely important for the improvement of the project.Open Access.This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 2 8
Fig. 2 8 Development of the J 1 performance index during the treatments by the three inexperienced doctors.a J 1 values for StudyUser1, b J 1 values for StudyUser5, c J 1 values for StudyUser14

Table 1
Participating Doctors

Table 3
Explanation of the performance indices Performance index a Explanation J 1All deviations from the middle of the reference range, i.e.FT 4 = 16.75 pmol L −1 are penalized J 2

Table 4
Comparison of the performance of all study users and algorithms Same as base algortihm, but uses a longer treatment interval of 35 days after the first 2 appointments b Study User with the best result for each performance index

Table 5
Feedback from participants with no real life experience presuming that the patients are simulated realistically a a The DigiThy framework was considered to deliver realistic behavior, when accesed by thyroid experts[5]

Table 6
Comparison based on the mean values between Beginners (doctors with no experience) and Experts