Introduction

Intrinsic capacity (IC), as a unit concept referring to the physical and mental health attribution that a person can draw during her/his life, has been suggested by the World Health Organization (WHO) in 2015 in the World report on ageing and health  [22]. WHO defines IC as the combination of the individuals physical and mental abilities to do and to be what is valuable to them. Therefore, according to this definition, the functional ability is determined by the interactions a person can establish with the environment. Although there are numerous studies focusing on assessing different domains of functioning at stages of life  [9], less effort has been devoted to aggregation of all the domains and report an overall score of an elderly health status  [6, 7]. An essential element of the IC concept is the holistic and regular monitoring of patient’s capacities. This enables the early-stage diagnosis of declining personal health abilities for the suitable plans of interventions that prohibit the decreasing trend and help in revitalizing the vanishing capacity.

Today’s literature, however, does not clarify how IC score has to be operationalised [10]. Different authors validated the approach by comparing retrospective data and IC scores using different factors and procedures [2]. This fragmentation of the existing experimental results does not allow to identify a standard, unified index. Part of the problem is rooted in the lack of specifications of data acquisition procedures. Comprehensive and regular monitoring of the biological state of a patient implies specific technical requirements to be satisfied. Besides, the adoption of shared standards can significantly impact the reproducibility of the experimental results developed in the field. For these reasons, we focused on specifying a methodology in terms of the engineering requisites to operationalise IC score evaluation.

The main guidance available for implementing IC evaluation is the Integrated Care for Older People, ICOPE handbook published by the WHO [20, 32]. According to the ICOPE handbook, the IC score is a composition of six generic domains; Vitality, Locomotor capacity, Visual capacity, Hearing capacity, Psychological capacity and, Cognitive capacity. It is noteworthy that, different anthologies are resulting from confirmatory factor analysis in the existing literature [2] where the Hearing capacity and Visual capacity could be compressed in one domain named Sensory capacity. Hereafter, we will use the following names for the domains:

  • Vitality is mostly an indication of general physical well-being factors that in the case of reaching risky values, subsequent damages are possible.

  • Locomotion refers to the physical potential of individuals.

  • Mood gives an evaluation of psychological capacity and environmental interactions.

  • Cognition mainly concerns the impairments and examining the proper recognition ad comprehension of facts.

  • Sensory evaluates the status of visionary and hearing capacity.

The ICOPE handbook specifies for each domain a set of reliable clinical and non-clinical measurements suggested and validated by clinicians. Nevertheless, these tests are conceived to be performed in clinics, imposing inherent limits to the frequency of data collection. To overcome these limits appropriate technological solutions must be implemented. Over recent years, there has been an increase in interest in wearable monitoring systems. Smart devices such as smartphones, fitness bands, Bluetooth-enabled blood pressure cuffs, smart scales, and pill bottles, enable the continuous monitoring of patients’ activities state and can significantly impact the quality of the data collection process in terms of timeliness and coverage [4].

The SMART BEAR project [18, 18] provides a comprehensive substructure for long-term continuous examinations and testing the well-being status of older people using wearable devices, mobile apps, and follow-up assessments by trained personnel and physicians. Furthermore, the power of big data analytic engine is exploited for prediction and personalised intervention purposes. Whilst in majority of IC studies, a large volume of required data is collected by performing clinical measurements and questionnaires, in the SMART BEAR project, most of required data in the evaluation of IC could be collected through smart devices and mobile applications, remotely and continuously.

We then designed a methodology based on the standardized data acquisition procedures specified in Fast Health Care Interoperability Resources (FHIR)  [21]. Differently to other studies, our work focuses on engineering the data acquisition procedure following standards that can foster the interoperability of the collected experimental observations [2]. Adopting the composing domains and relevant sub-domains of IC score resulted from current studies  [2], in Table 1, the data format and data collection’s frequencies acquired by devices/questionnaires are described. Each single measurement has a specific value in the IC score which is not clearly presented in the ICOPE handbook and is the matter of debates.

The complication in specifying a unique solution for calculating the IC emerges due to two main reasons: first, the non-universality of impressive factors in intrinsic capacity among different geographical/cultural populations; second, the effectiveness degree of composing factors in the IC score of individuals. In other word, an unified applicable prescription for calculating IC from carried out measurements does not exist. This issue has drawn our attention toward Machine Learning (ML) algorithms for addressing these complications in studying the IC score: (1) Leveraging ML and statistical models to identifying the most important constituting factors of IC, (2) Training models for predicting the variations of IC score and preventing the possible decreases by proper interventions. However, continuous monitoring of the IC for the preventive purposes not only demands analytical capabilities in finding the most effective parameters and proposing an applicable solution but also it is urgent to be considered from engineering point of view in supporting the monitoring task.

In this study, by focusing on this viewpoint, we next introduce the composing component of a designed architecture for monitoring and calculating the IC score in the SMART BEAR project. The susequent section elaborates the proposed analysis method we suggest for continuous study of observations and evaluation of IC trajectories followed by which the mapping models of the observations and questionnaires on FHIR are discussed. To implement the model for calculating IC score and testing the architecture, we have produced synthetic health data during the life cycles of the number of patients. Then, leveraging the produced data, we demonstrate the trajectories of intrinsic capacity obtained by our proposed methodology applicable in SMART BEAR project retrieved data. At last, we conclude this work and suggest future relevant works.

Related work

A rich research strand is growing on the validation and applicability of IC. The results confirm the validity of a unified concept of IC along with the five constituting domains according to the WHO implementation guide [10]. Confirmatory Factor Analysis allowed to verify the concordance of IC with tests chosen to measure each of the five domains. Different studies leveraged on the longitudinal data of elderly participants from different cohorts such as England [2], China [3, 35], and Mexico [12].

Even though these studies convey high prior knowledge from a medical perspective, we believe they suffer from providing explicit engineering solutions in monitoring the IC. Regarding the constant changes of living status, monitoring the process of healthy aging could play a crucial role in the interpretable prediction of the future state of elderly people. One of the prerequisites of the engineering solution for the continuous monitoring and analysis of the IC is adopting a unified standard for health data exchanges. The HL7 Fast Healthcare Interoperability Resources (FHIR) models are newly used by different research areas from normalizing the pipeline of clinical data for standardizing unstructured electronic health record (EHR) [13] to medical free-text analysis [26] and semantic mapping from raw genomic data [25]. The interoperability of FHIR data models makes it promising in leveraging electronic health records(EHR) for health score assessments. Providing the resources on FHIR workflow, risk assessments are proposed as a specialized type of observation. Risk assessments may be based on: (1) Basic demographic information from the Patient or Group resources such as Various Observations including vital signs, lab information, assessments, genetic information, etc; (2) Family Member History; (3) Current, past and proposed therapies Immunization, Procedure, CarePlan, etc. Although using the FHIR data model, patient prognosis, cardiac, genetic, and breast cancer risk assessment is well established [14], it is not still widely adopted by clinicians and smart health devices programmers.

Considering the capabilities of SMART BEAR infrastructure, alongside the provided possibility of exchanging clinical data by FHIR, in this study we focus on designing a workflow leveraging the SMART BEAR architecture components in acquiring the representative health score of intrinsic capacity in time.

Architecture

The primary goal of the SMART BEAR project is to develop an integrated platform gathering numerous health-related data flows, with further analysis of the day-by-day study of participants’ activities.

Later on, the continuous data collection and processing aims to provide evidence-based personalised interventions for their healthy and independent living. Serving this goal, in addition of having a system tailored to comply with project-specific requirements, and in particular capable of digesting data flows acquired by the interactions with external devices and systems (e.g., significant number of smartphones, hospital medical systems, smart IoTs), and processing in the context of Big data analytics used for the evidence-based decision making, its architecture ought to take into account the current technological trends that impose not only the well-known secure m2m interoperability, the data quality and reusability/dissemination, the protection of individuals information and the provision of means to exercise legally binging GDPR rights, but also other aspects such as extensibility to cover possible new supplementary specification in further implementations, traceability of records of all exchanges, and the notion of designing the logic separately of security but rely on the provision of a security context [36].

Fig. 1
figure 1

A high-level architecture schema of REMOVED-FOR-DOUBLE-BLIND-REVIEW project leveraged for the monitoring process of Intrinsic Capacity. The collected data from smart devices and questionnaires are directed to the FHIR/NON-FHIR repositories after security component

Particularly, the data received from SMART BEAR devices and questionnaires are saved on HAPI FHIR or Non-FHIR repositories. In coherence with the European GDPR, all transactions are performed by a trustworthy component handling data anonymization without losing identifiability, thanks to multiple tokens associated with data entities. A local repository cache allows toggling between views of requests and data transfer, significantly improving the performance of data processing tasks.

Addressing those multidimensional and, in some occasions, conflicting design requirements, the SMART BEAR architecture is based on the most reasonable current industrial-strength solutions, while secure interactions and privacy of data were considered as high importance during the design stage (adhering to the Privacy by Design principleFootnote 1), supporting m2m interoperability and the same time ensuring full compliance with GDPR even for medical data been transmitted by external systems of synergetic H2020projects. Figure 1 illustrates a simplified schematic view of the architecture designed for the SMART BEAR project. In parallel of having in place a well-established interoperability specification (FHIR) to be able to exchange m2m medical data and metadata in between different healthcare applications (not limited to those within the project’s technical scope [23] but also to other synergies), the architecture tackles privacy which is considered a critical issueFootnote 2, along with the capability of performing different types of analysis (e.g., descriptive statistics, statistical testing, and inferencing, data mining, ML).

Fig. 2
figure 2

The functional workflow of calculating and presenting the Intrinsic Capacity trajectories

The SMART BEAR architecture [based on [1, 34]] comprises three main components: the smartphone application and HomeHub components that reside with the participants, and the main cloud component that is the backend system coordinating and serving the other components. The cloud supports several key functions such as data management, analytics for enabling the generation of (verifiable and explainable) ML models supporting decision making for different types of interventions and horizontally preserving the security, supporting m2m interoperability, and maintaining a pseudo-anonymized repository of study participants data. In this context, the backend provides a dashboard to clinicians, data analysts, and other end-users groups data visualisation capabilities and the outcomes of analytics, while cloud components (Repository, Big Data Engine, Decision Support System, Security Component) communicate via REST services for fast reliable performance, having the ability to grow by reusing components that can be managed and updated without affecting the operation of the system, even while they are running. Data protection is considered a critical issue, especially when dealing with special categories of personal data (Art 9, GDPR). A specific component (SecurityComponent) provides mechanisms that handle data minimisation, authentication, and other security and privacy aspects by performing pseudonymisation and IDs re-associations, supports RBAC authentication and authorisation of all RESTful API endpoints to protect the transmission of any (sensitive or not) data, and it also introduces services to cope with the management of privacy-related requests to demonstrate compliance with the GDPR. At run time, this component is also responsible for monitoring, testing, and assessing all runtime operations of the SMART BEAR platform. This component will audit critical components and processes of the infrastructure while leveraging monitoring mechanisms developed in the context of the project to provide an evidence-based, certifiable view of the security posture of the SMART BEAR platform, along with accountability provisions for changes that occur in said posture and the analysis of their cascading effects. Several built-in security assessments addressing the Confidentiality–Integrity–Availability (CIA) principles among custom metrics with respect to the platform’s components that will be tailored will be utilized, leveraging an evidence-based approach, to provide security and privacy assurance assessments with certifiable results. Reference [11] provides a more detailed presentation of the SMART BEAR Architecture.

Methodology

The proposed methodology is organized according to the workflow in Fig. 2. Data processing is performed by a Big Data Analytics (BDA) engine that can periodically update the IC score. This way the recorded values are organized in time series describing the IC trajectory of each patient. The tendency of these trajectories is further analysed to study the best practices for maintaining high levels of IC.

The present paper focuses on technical aspects and engineering principles for monitoring IC using the measurements available in the SMART BEAR project. Due to the working requirements of the project, we may come across some data deficiencies in following the ICOPE handbook experiments. However, we kept as much as possible the balance in the number of measurements for all domains. On the other hand, this provides us the possibility of validating the relevance of the measurements available in SMART BEAR in relation to the IC score.

In Table 1, considered measurements for each domain, data types, the frequency of transmission to SMART BEAR Cloud, the frequency of data reception from SMART BEAR Big Data Analytic engine (BDA), and domains weights are presented. Data storing frequencies on SMART BEAR repositories takes place according to clinicians’ advice for different measurements. The vital measurements with high possible variations, such as Heart rate, are monitored more frequently than those stable measurements such as bone density. More specifically, the workflow presented in Fig. 2 consists of the following task steps.

Data acquiring

The initial assessment contains the demographic and medical examination of all participants of SMART BEAR project. Table 3 proposes the full list of the examination performed at the initial assessment. The follow-up data collection is performed using the SMART BEAR smart devices. In Table 1 the frequencies of sending/receiving data to/from SMART BEAR clouds are mentioned specifically for each measurement. A uniform time scale for all measurements is necessary, therefore, in case a parameter is measured more frequently than the uniform time unit, an averaging process should be applied.

Test timeliness

Due to several reasons, the data collection process could fail. Disconnection from the network, a patient forgetting to recharge smart devices, or many other situations can bring to lose or get outdated data. The participants’ ignorance in taking required tests at the specified time and the different expiration time of each test in measuring IC are also other reasons that may lead to acquiring low-quality data. The impact of losing a measure is however strongly dependent on the parameter to be measured because different biological states (physical or mental abilities) have different temporal dynamics. In our method, we assign a Boolean value indicating whether a recorded measurement is still valid or the IC calculation took place using a value out of the range of temporal validity. In the case of any type of untimely data point, the IC data point will get tagged as “invalid” in the output results. Test of timeliness is evaluated and stored with a timestamp and patient’s id each time a measure is calculated.

Table 1 Table of available measurements in the SMART BEAR project and acquired data for studying the Intrinsic Capacity case
Table 2 Evaluating the stationary status of two of synthetic data sets produced by Synthea

Normalization

Considering the sub-domains of measurement and their different measurement units, the creation of an aggregated score requires data to be normalized first. Using the z-score [24] the normalization procedure can also be exploited to personalize the evaluation of the IC score. Indeed, the expected value and the standard deviation (STD) parameters of a z-score function can be defined from the distribution of individual’s retrospective data. In alternative, these values can be taken from a reference population or a protocol selected by the clinicians. The z-score is computed as follows and stored with timestamp (t) for each measurement \(i \in N\) with value \(x_{i}\), where N is the total number of measurements:

$$\begin{aligned} z_{i}(t)&= z\_{\text {score}} (x_{i}(t)) \nonumber \\ \quad&= \frac{x_{i}(t)-{\text {expected}}\_{\text {value}}(\{x_{i}(0),x_{i}(1), \ldots ,x_{i}(t-1) \})}{STD(\{x_{i}(0),x_{i}(1), \ldots ,x_{i}(t-1) \})}.\nonumber \\ \end{aligned}$$
(1)

Performance score

In our proposal, the performance score is an asymmetric mapping of the z-score on the IC score. In assessing the performance score, two key parameters are exploited: the risk value and the expected value. With risk value, we refer to a global and constant risk level identified by the clinicians for each domain of measurement (for example see the [15]). If a measured value is greater or less than this value, depending on the domain, the patient’s health is at risk and proper interventions must take place. With the expected value, as we also use it for calculating the personalized z-score, it could be an average of all records or the most probable value of that specific measurement for the patient out of their retrospective data. A high-performance score is achieved if the z-score is far better than the expected value, while in the case that z-score is close to the Risk value, the minimum performance score is obtained. The performance score is computed using the function S and z-score (\(z_{i}\)) of measurement i, stored with a timestamp and patient id each time a measurement is received.

$$\begin{aligned} {\text {Performance}}\, {\text {score}} (z_{i})&=S(z_{i}, {\text {Risk}}\, {\text {value}}(i)) \end{aligned}$$
(2)

Parameters in sub-domain aggregation score

An approach in aggregating the sub-domain elements into domains and thereafter, mapping domains into the IC is the weighted aggregation. This way, the effectiveness of each sub-domains measurements in domain performance score, are indicated by their weights, \(w_{i}\), and the aggregation happens using weighted arithmetic functions where D is the number of domains.

$$\begin{aligned} G_{j}&= g_{j}({\text {Performance}}\, {\text {score}} (z_{i}), w_{i})&i \in N_{j}, j \in D \end{aligned}$$
(3)

Aggregating domains

The IC value could be simply an average of all domains’ performance scores or a geometric pooling of them with different weights, \(W_{i}\), applied to different domains scores, as illustrated in Eq. (4). Weights can account for the contribution the different dimensions have on the final IC score. The analytical strategies to be used for measuring a change in health status are debated [31] and the geriatric scholar’s community did not identify recommendations for differentiating the contribution of each dimension. Equally averaging the contribution of the dimension seems in this context the less biased intervention.

$$\begin{aligned} {\text {IC}}&= f(G_{j}, W_{j})&j \in D. \end{aligned}$$
(4)

The assessed IC score using the patients’ retrospective data could provide useful data for leveraging Machine Learning algorithms to predict the population-specific measurements weights.

Some of the required analytical parameters such as Risk value and domain weights are population-dependent. On the other hand, the expected value and tolerance value are extracted from personal records; therefore the z-score and performance score are personalized values assigned to each studied individual.

Table 3 Initial assessment: the table contains the measured parameters at the start of study

Studying trajectories

Studying the trajectories of the IC score of a patient is our final goal. Predictability of incidents that may lead to irreversible damages is from high prioritized goals in continuous monitoring. Care management can substantiate decision making and planning by the trends patient trajectories let emerge. The correct interpretation of trends requires appropriate analytical methods.

Among the various methods used for studying nonlinear dynamics, entropy is one of the prominent and applicable approaches on broad types of time series even with limited and short length and has been used to measure the complexity of time series. Entropy, in terms of information concept, was defined to quantify the expectability of an event vector while in time series analysis, assesses the uncertainty and unpredictability of the evolution of dynamical systems. Evaluation of time series in terms of entropy not only measures the complexity and predictability but also helps in the detection of dynamic changes and incidences [37]. A stable and predictable series allow increasing our confidence in the ability of a patient in maintaining good IC standards.

Mapping on fast healthcare interoperability resources (FHIR)

Bringing medical/clinical records into service requires significant efforts in unifying the concepts and terms adopted, to make the data understandable and usable by other clinicians and scientists. A proposed solution is leveraging the unique LOINC [16] and SNOMED-CT [17] codes in defining observations, encounters, and biological considerations. The data measured and collected with SMART BEAR devices, mobile applications, and questionnaires will be stored in an HAPI FHIR repositories using LOINC and SNOMED-CT codes. Regarding the integration of questionnaires on HAPI FHIR repository, a generic model is defined in [17]. According to this model a questionnaire template FHIR requires resources, where

  • URL shall have a value;

  • name shall have a value;

  • title shall have a value;

  • version might have a value;

  • Recursively for each entry in item:

    • linkID shall have a value;

    • type shall have a code;

and a questionnaire fill in FHIR responses requires resources, where

  • questionnaire has a value;

  • subject shall have a value;

  • Recursively for each entry in item:

    • linkID shall have a value;

    • answer optionally has a value.

Using this model, the questionnaires are also mapped on the FHIR data model.

Generating synthetic data using Synthea

Leveraging synthetic data for studying and simulating the medical history of a population without facing the anonymization challenges that scientists constantly come across, is highly advised in studying health care data. In detail, these synthetic data will be generated to test the main functionalities of an understudying system.

In SMART BEAR project we adopt Synthea, a synthetic patient generator that can model the medical history of patients [33]. In Synthea, clinical care maps and statistics are used to construct models of disease progression and treatment in a Generic Module Framework, that encodes these models in a Synthea module as state transition machines. In other terms, modules describe a progression of states and the transitions between them. On each Synthea generation timestamp, the generic framework processes state once a time to trigger conditions, encounters, medications, and other clinical events.

It is possible to activate different modules simultaneously, which compute state transitions (if any) for every person at every timestamp in the synthetic world. Each state transition in a module can trigger other events such as condition onset, encounters with physicians, observations, prescriptions and so on

The synthetic patient population is generated using a set of probabilities to get a mixture of conditions corresponding to the relevant scenarios. The probabilities are adjustable, so that, if needed, specific sets of synthetic data can be generated to test each scenario. Ranges for each measure are specific for each patient’s condition; values within the ranges are randomly generated by Synthea at each observation time, according to its standard behaviour.

One of Synthea’ strong points is that it can export patients’ data in several formats for different needs, namely:

  1. 1.

    FHIR versions 4.0.1 (R4), 3.0.1 (STU3) and 1.0.2 (DSTU2).

  2. 2.

    C-CDA: uses the MDHT CDA Tools library along with templates from the health-data-standards Ruby gem to export patients in Consolidated Clinical Document Architecture (C-CDA) format. C-CDA is an XML-based standard defined by HL7, that uses templates from a standard library to represent clinical concepts.

  3. 3.

    Text: This format does not adhere to any standards but is clear and easy for a person to read and understand.

  4. 4.

    CSV: Unlike other formats which export a single record per patient, this format generates 9 total files, and adds lines to each based on the clinical events for each patient. These files are intended to be analogous to database tables, with the patient UUID being a foreign key. Files include: patients.csv, encounters.csv, allergies.csv, medications.csv, conditions.csv, careplans.csv, observations.csv, procedures.csv, and immunizations.csv.

In our simulation, the parameters in Table 1 are considered as observations or encounters. These observations take place according to the defined time scale in SMART BEAR for each parameter. The observations’ results for each patient and timestamp of carried out measurements and all demographic characters are saved in FHIR format. Since Synthea does not include the questionnaires in the generic modules, we adapt artificial SNOMED-CT/LOINC codes and assume them as observations (see Table 4 in the appendix). We have assigned to each patient random questionnaires’ scores in the valid specific ranges. To continue, following the workflow in Fig. 2, the IC trajectories are calculated and demonstrated in the end-user interface that could be a clinician or a caregiver monitors.

The basic Synthea process generates values randomly within the predefined ranges. In some cases, this might yield unrealistic data for two reasons: first, frequently repeated measures of the same variable for the same individual cannot randomly oscillate in the whole range considered, even when the patient’s condition is taken into account. For instance, each weekly measure of the body muscle mass cannot differ from the previous one by more than a certain percentage. Moreover, in a real dataset, some values should be correlated, for instance, the walked distance and the number of steps. In Fig. 3 an example of the problem can be found: the lines represent daily measured walked distance in meters and number of steps for an individual, computed with the simplified module, it can be seen that no correlation is found between the data.

Fig. 3
figure 3

An example of inconsistency of produced data by Synthea: while at time = 4 the number of steps increased, the distanced walk diagram seems decreasing

The module that will be used to generate synthetic data for SMART BEAR takes into account the above points:

  • For values that cannot freely oscillate in the whole range, once the first value (randomly taken from the admissible range) is recorded, the other values are obtained applying random increments from a smaller range to the first value.

  • Correlated measures are derived using simple functions from a measure chosen as fundamental, when the correlation does not require specific assumptions or dedicated models, as in the case of walked distance and number of steps. In the other cases, measures are considered as not correlated.

Figure 6 of the appendix, demonstrates an example of executed module for the locomotion domain. Although using synthetic data could be very progressive and informative in scientific studies, the validity of produced data, as it could be noticed from the above discussion, is a matter of debate. To continue, we focus on the evaluation of data sets from two important points of view: Entropy and stationary status of time series.

Fig. 4
figure 4

The schematic views of performance scores function in terms of z-score. The first diagram refers to the cases with a risk value smaller than the expected value (the red curve) and the risk value larger than the expected value (the green curve). The risk values are indicated by dashed lines. In the second diagram, the related Performance score to observation with two risk levels is plotted (the purple diagram)

Fig. 5
figure 5

The calculated Intrinsic Capacity score for four sample of patients using generated synthetic data. Each patient’s IC trajectories, the monthly average and linear trend of 1-year monitoring are plotted in each panel

Fig. 6
figure 6

An example of intrinsic capacity locomotion domain module for generating synthetic data with Synthea

Entropy of time series

Entropy is one of the illustrative indicator of a time series quantifying the uncertainty of events in the dynamical system. Ponce-Flores et al in [28] showed that there is a direct relationship between the complexity of time series and unpredictability. For a time series with length \(\mathcal {N}\), entropy determines how well the state space is reconstructed by \(\mathbf {m}\) dimensional vector spaces and quantifies this similarity. In 1991, Pincus proposed the Approximate entropy [27] as a modified version of Kolmogorov–Sinai entropy, which showed robustness and stability in studying the real-world noisy and medium-length data series such as physiological, mechanical, and physical data. Despite the valuable result of approximate entropy, its strong dependence on input parameters and low-quality performance on short-length data set, lead to propose a refined metric named Sample entropy by scientists [29].

Spectral and Permutation entropy algorithms are other well-established criteria in data series analysis. Whilst Spectral entropy sums the irregularities by summing the normalized signal spectral power, Permutation entropy assesses the diversity of ordinal space vectors by indexing the elements in ascending order and finds the permutation pattern. Because the permutation entropy makes only use of the order of the values, it is robust under non-linear distortion of the signal and is also computationally efficient [37].

Stationary/non-stationary time series

Stationary time series has a specific Probability Density Function (PDF) which leads to specific mean, variance, and covariance while time passes. Examination of a time series in terms of being stationary (at some systems being near stationary state could suffice) or non-stationary, brings about an insight of regularities of the dynamic evolution of the under-studied system.

Most common implementations leverage the Unit Root tests for evaluating the stationary status on a given time series; namely Augmented Dickey–Fuller (ADF) test [8] and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test [19]. The most commonly used is the ADF test, where the null hypothesis is the time series possesses a unit root and is non-stationary. So, if the p value in ADF test is less than the significance level (0.05), the null hypothesis is rejected. The KPSS test, on the other hand, is used to test for trend stationarity. The null hypothesis and the p value interpretation are just the opposite of ADF test.

Experimental analysis

In this section, by applying our proposed monitoring method on four synthetic patients along a period of 1 year, the corresponding IC scores are calculated and demonstrated in Fig. 5.

Due to different dedicated frequencies of data collection from different observations, discussed in the Sect.  Data acquiring and demonstrated in Table 1, unifying the length of observations timestamps for the continuous monitoring of IC score is an urgent need. Considering the maximum observation length of observations data set which is related to the Heart Rate, two times a day, we have repeated the last value of observation i for the (\(\frac{\text {Maximum length of observations}}{\text { length of observation } i \text { list }}\)) the number of subsequent time steps. This way, we get the same length of data sets for all the observations. For each observation i, the z-scores are calculated for each time step using the relevant Expected and Tolerance values, defined based on references in the literature. In the first approach, we consider the median of the normal advised range as generic constant values referring to the Expected value. The adopted Tolerance value is \(\pm 10\%\) of the assigned expected value to each observation. According to our method, the performance score is a nonlinear mapping of z-score to the interval [1, 6]. For this transformation, we suggest a generalized logistic function as a function of z-score, Risk, and Expected values. Risk values, similarly to Expected values, are extracted from the literature. The rationale behind this selection is the changing behavior of natural dynamics; while very small and very large values do not get high marginal gain and variation is negligible, a fast phase transition happens at a value between the allowed maximum and minimum values Fig. 4a.

Regarding the observations in Table 1, some of the parameters have one and some others have two risk levels, for instance, the Total cholesterol value higher than 240 mg/dl is risky, while the Body Mass Index values are risky if lower than 18 \(\frac{\text {kg}}{\text {cm}^2}\) and higher than 25 \(\frac{\text {kg}}{\text {cm}^2}\). For addressing this issue we suggest two different functions, illustrated in Fig. 4. Furthermore, there is another complication in the definition of performance score for those observations with one of the low borderline or high borderline. According to our model, for those observations with low borderline, such as Rapid Geriatric Assessment Score that for a normal applicant, the gain score is higher than 27, the Expected value should be larger than the low borderline. On the contrary, for those measurements such as Beck Depression Inventory Score, the Expected value should be smaller than the high borderline. Therefore we propose the increasing (decreasing) S-shape trends for the cases in which the risk value is larger (smaller) than the expected value. This way, the performance score at risk value gets the minimum value. For those observations with two risk levels, lower and upper values, we consider a Gaussian function which reaches the maximum for the z-score equals to z-score of an Expected value, to calculate the performance score according to Fig. 4b. Following the evaluation of performance scores for all parameters obtained for each patient, we aggregate them to get an IC score at each time step. This aggregation function in the simplest form could be performed by averaging over all the measurements performance scores with equal weights. In a comprehensive study in future works, this function will be expanded and studied considering the relevant effective weights of each in-domain measurement and the weight of domains in the final IC score.

Table 4 The table contains adopted SNOMED_CT/LOINC codes for observations used in generating the synthetic health data

The calculated IC score of four different patients using synthetic medical records is shown in Fig. 5. Table 2 demonstrates a comparison of statistics of two generated synthetic data sets used in the calculation of IC, the Number of Steps that an individual walks during the monitoring period, and the recorded Heart Rate, within the same period of time. The aim is verifying if the series is stationary or not stationary. A stationary series is an indicator of less risk while a non-stationary state implies less predictability. We have evaluated the statics of ADF and KPSS test for the whole data sets of Number of Steps and Hear Rate. As mentioned in Sect.  Stationary/non-stationary time series, ADF and KPSS test stationarity using different null hypothesis. Unlike KPSS test, the null hypothesis in ADF test is that the series is non-stationary. From the first category of evaluations, illustrated in Table 2, both ADF and KPSS tests, respectively with a p value greater and smaller than 0.05, indicating that the steps data set is non-stationary while the Heart Rate data, with p value smaller and greater than 0.05, is stationary. Since the monitoring period lasts 1 year, for evaluating the quality of synthetic data, we deseasonalized data and store them in four separated data frames, steps1 to steps4 and HR1 to HR4 data sets. By building the smaller blocks of data, we witness the smaller values of entropy by applying all selected algorithms, for both seasonal and total data set for Heart Rate in comparison to the Number of Steps for the same monitoring time period. This shows more regularities in the produced heart rate data than the number of steps time series. This can be verified, implementing the ADF and KPSS stationary tests reveal the stationary characteristic of seasonal blocks of steps data set in contrary to the annual data set; the p_values< 0.05 for the ADF and p_values> 0.05 for KPSS tests prove the stationarity. We consider this measure a relevant indicator of the behavior a patient exhibit, because we remove the fluctuations related to the seasonal effects.

Moreover, considering the obtained trajectories (Fig. 5), we have calculated the entropy using common methods, such as Permutation, Spectral, Approximate, and Sample which all report the highest entropy for patient_2. We propose to use the entropy of trajectories as an additional element to be considered in evaluating the risk using IC scores.

Conclusion

In this paper, we have focused on the engineering aspect of implementing the continuous monitoring of IC using wearable devices and available measurements in SMART BEAR project, as the backbone of further explorations and improvements in the prediction and prevention of decreasing intrinsic capacity. We have followed the process of calculating IC from the data acquisition to the representation of IC trajectories by introducing the architectural components in SMART BEAR project. We have illustrated our methodology in calculating IC scores out of the number of various measurements produced by a synthetic data generator. Furthermore, we have evaluated the statics of ADF and KPSS test for two synthetic observations data sets, number of steps, and heart rate in Table 2. Both ADF and KPSS tests indicate that the steps series synthetic data set is a non-stationary time series while the heart rate data is a stationary timeseries.

Concerning the different stationary states for two sets of generated data, it should be noted that leveraging the synthetic data set demands precautions validity tests before further statistical analysis, otherwise the outcome will not converge to a real case scenario’s result. We have also proposed Entropy as an indicator of unpredictability and sudden changes in a studied dynamic. Calculating the Permutation, Spectral, Approximate, and Sample entropy, all consistently results in a higher value for the IC scores of patient_2. From this observation we could find out even though the average linear trend of IC score of patient_2 within a year of monitoring more or less stays stable, this patient has experienced many variations in his/her intrinsic capacity which indicates the risk assessment should be revised and proper intervention is needed to be taken place. In other words, the more is the entropy of IC trajectories, the less trust full predictions may have resulted.

Furthermore, considering the IC trends, much more information, such as the number of months that the patient suffers from low IC, seasonal dependency, and holistic trends of well-being state could be retrieved. Better validation of the trend can be obtained by calculating the p values of seasonal time series and witnessing that contrary to the total data set, we get stationary data sets once we break them into seasonal pieces. This may be a promising solution for getting a reliable approximation of IC within specific time periods, for instance, every 3 months, while the proof of this solution and finding the proper time period demands more effort.