Development and Evaluation of Methodology for Personal Recommendations Applicable in Connected Health

. In this paper, a personal recommendation system of outdoor physical activities using solely user ’ s history data and without application of collaborative ﬁ ltering algorithms is proposed and evaluated. The methodology proposed contains four phases: data fuzzi ﬁ cation, activity usefulness calculation, estimation of most useful activities, activities classi ﬁ cation. In the process of classi ﬁ cation several data mining techniques were compared such as: decision trees algorithms, decision rules algorithm, Bayes algorithm and support vector machines. The proposed algorithm has been experimentally validated using real dataset collected in a certain period of time from a community of 1000 active users. Recommendations generated by the system were related to weight loss. The results show that our generated recommendations have high accuracy, up to 95%.


Introduction
Globally, the burden of non-communicable diseases (NCDs) is growing. They are the leading cause of morbidity and mortality and place a great financial strain on the economy [1]. Sadly, this 'invisible' epidemic which is attributable to common, modifiable risk factors, including physical inactivity, tobacco use, the harmful use of alcohol, and unhealthy diet, imposes a great strain on health systems, resulting in a healthcare work force crisis in many nations [1]. An important factor in prevention and treatment of chronic diseases, as well as supporting healthy aging, is the maintenance of a healthy lifestyle in terms of daily physical activity. Physical inactivity is stated to be one of the leading cause of global mortality, and the World Health Organization (WHO), the United Nations and numerous national governments now view the promotion of physical activity as a public health priority [2,3]. According to a study, incorporating walking or cycling into longer journeys, provides over half the weekly recommended activity, which can be an efficient way of achieving physical activity guidelines and improving population health [4,5].
Recent research has focused on integrating physical activity into prevention, treatment and rehabilitation of NCDs [6,7]. Study has shown that the quality of patients' life with the chronic deceases as cardiovascular diseases, diabetes, chronic obstructive pulmonary disease and some types of cancer, can be significantly improved by giving the patients personalized recommendations for physical activities. This can be done using a recommender system (RS) that collects data from various sources and provides/recommends the content that user needs in the moment [8][9][10].
Building recommender systems requires a multi-disciplinary approach that takes advantage of various computer science fields like machine learning, data mining and information retrieval, and even human-computer interaction [11,12]. The recommender systems are using collaborative filtering, content based or hybrid approach for generating recommendations. Collaborative filtering is one of the most used and successfully applied methods for personalized RS, for which a large and continuously active literature exists [13][14][15][16].
It is an algorithm for matching people with similar interests for the purpose of making recommendations [17]. Since the patients' records contain highly sensitive data, some argue that collaborative filtering is not appropriate approach to be used in systems that are working with high degree of confidentiality [18,19], and are choosing content-based techniques for generating prediction and recommendation models in healthcare.
In this paper, a personal recommendation system for outdoor physical activities is presented. The system does not use collaborative filtering technique, so the recommendations are generated using the user's history activities. In order to find the best classification technique for generating personalized recommendations with high accuracy, we investigated the techniques used in other research studies, and adopted those that were proven to give most accurate results in the healthcare field. The system was tested using the same dataset as in the previous work for COHESY recommender algorithm [20]. The results showed that our system can generate recommendations with high accuracy (up to 95%), without using collaborative filtering methods.

Related Work
In the last decade, many prediction and recommendation models, using various approaches and techniques, have been presented in the field of healthcare. Most of them are used for diagnosing or identifying patients with high risk of particular diseases. Only few of them are focusing on personalized recommendations for improving patients' health. Different classification techniques were compared and analyzed in many studies on healthcare in order to find the one that will give highest accuracy. A research on 12 years Kuwait patient data have used different classification techniques: logistic regression, k-nearest neighbors (k-NN), multifactor dimensionality reduction and support vector machines to identify patients with high risk for diabetes type 2, hypertension and comorbidity. Their results have shown that the support vector machine classifier gives slightly better results, with 81% accuracy [21]. Another research on heart disease prediction has compared the accuracy of Naïve Bayes, K-NN and decision list classifiers on patient data, taking into account different parameters as sex, smoking, weight, alcohol intake, high salt diet, exercise, blood sugar, heart rate, bad cholesterol, blood pressure, etc. The decision tree classifier outperformed the other classifiers with accuracy of 99.2% [22]. Similar work has been presented in cerebrovascular disease prediction model on a 493 patients from Taiwan where decision tree classifier C4.5 have given best results, compared with Bayesian classifier and back propagation neural network classifier [23]. Multiple decision tree algorithms have been adopted on diabetes patients from hospitals in Oman for achieving high accuracy disease risk predictive model. The evaluation of the prediction performance of J48, Decision Stump, REP Tree and Random Forest (RF) have been presented. The model built using RF had less MAE (mean absolute error) and high precision and recall results, compared with the other model results [24].
Algorithm that generates recommendations and suggestions for preventive intervention has been presented in a COHESY system [20]. The presented algorithm analyzes the user's activities and then recommends the most useful activity to the user. Grouping of users with similar characteristics has been done with classification and filtering algorithms. In order to compare the results with this algorithm presented in [20] we used the same dataset of 1000 active users from a mobile sport activity service, SportyPal. The SportyPal system is capable of reading parameters for a particular activity, such as path length, speed, time interval, consumed calories. We applied classification models on the user's activities to investigate the accuracy of the model. In our paper we used the classification methods that have proven to give accurate predictions in other research works in the field of connected health.

Description of the Recommendation Algorithm
The main purpose of the activity recommender system is to discover and recommend the most useful activities to the user. The impact of each activity over the user's health state should be determined first and only those activities that have positive influence should be recommended. Since the user does not provide feedback after execution of various activities the system will rely upon the provided measurements variations.

Data Representation
The system is collecting and storing information regarding activities and measurements defined as vectors with several attributes: (1) Activity (user, type, time, duration, calories, distance) (2) Measurement (user, parameter, time, value) Each activity performed by a user is described with several parameters used for generating recommendations, such as: activity type, activity duration, distance passed and calories burned. The users performed twenty different types of activities: Cycling, Running, Driving, Walking, Hiking, Road-cycling, Blading, Sailing, Skiing, Horse riding, Paragliding, Rowing, Free style, Cross-skiing, Swimming, Snowboarding, Flying, Surfing and Golfing.
As for measurements body weight was recorded. This parameter has been chosen because it is strongly correlated with physical health. So, maintaining a healthy weight is important for health. In addition to lowering the risk of heart disease, stroke, diabetes, and high blood pressure, it can also lower the risk of many different cancers [25,26].

Data Processing
The raw weight data obtained from the users should be filtered and transformed into appropriate format for further processing. Only measurements that have significant changes in the value have been taken into consideration. Three different thresholds were tested. Namely, it was assumed that the measurement is valid if the value change is bigger or equal to 0.5 kg, 1 kg or 1.5 kg. The other values are treated as noise. Example: Where Value a−tx is the value measured before the start of activity a, and Value a+ty is the value measured after activity a, is finished.

Recommendation Algorithm
The recommendation algorithm is composed of four phases (Fig. 1).
Phase 1: Data Fuzzification. After initial separation of dataset into training and test data (we used 60% of the data as train data and 40% of the data as test data), data fuzzification method has been applied on both subsets. For better semantic meaning

Phase 4
Apply classification method

Phase 3
Find N most useful activities Filter test data with only N most useful activities

Phase 2
Activity usefulness calculation

Phase 1
Data fuzzification Transform data to class data representation several different classes are calculated for each kind of activity (Running, Cycling, Walking, etc.), for each user. This is done because of the difference of the duration, calories and distance for each kind of physical activity (for example: the distance of activity "cycling" is bigger than walking and running, assuming it is done for the same time period).
In this process all user's activities are taken into consideration and for every activity parameter (total time, distance and calories) proper class is assigned. Equal size ranges are used. Class calculation is done according to the following formula: Where: -N: is the number of classes that we want to use, v max ; v min : are the max and min value of the parameter value for that kind of activity (running, cycling, walking, etc.), v: is the raw value that we want to transform into a class value, ceiling x ð Þ ¼ x d e: is the smallest integer not less than x.
Example: for a walking activities, where minimum time is 10 min and maximum time is 180 min, we want to find the class for an activity x, with time 50 min, and we want to have 5 class representation.
We will have: ceiling ð 50À10 180À10 5Þ ¼ 2. In our experiment we tested with several number of classes, and we got the best results for three class data representation. The data set is not very large, with average of 35 valid activities per valid users, and using more classes will give many different type of activities and smaller number of done recommendations (valid activities are the activities that have not zero usefulness, and a valid user is a user that had at least one recommended activity).
Phase 2: Activity Usefulness Calculation. For each activity we calculate its usefulness, and for each kind of activity we calculate the factor of importance. Finding the usefulness for every activity is the most important step in this recommendation model. An activity is said to be useful (positively useful in our case) if it contributed towards weight loss.
Between every two measurements there can be zero to n (n > 0) number of activities that the user has performed. We assume that each activity between two measurements had some influence in the parameter change (weight).
Each activity can contribute to more than one measurement parameter change. For each activity, we look for two measurements. The first one is the measurement that was taken before the activity had started and has the biggest validity according to the model in Fig. 2, and the other measurement is the one taken after the activity had finished and had biggest validity according to the model in Fig. 3. We used the same models that were used in the COHESY recommender algorithm [20]. The measurements that were taken right before the activity was performed had the biggest validity. In this case we used the cumulative normal distribution (Fig. 2).
The validity of the measurements that were given after the activity has finished should slowly increase, then they should reach a maximum and afterwards they should slowly decrease. We used the same Gamma distribution model as in [20] (Fig. 3). The moment of the reached maximum is set to be 7 days.
The usefulness value of an activity depends on the measurement value change. If the difference of the values (weight difference) between two measurements that are in a valid range (according to Fig. 3) has greater value, than the usefulness of the activities performed in the time range between these two measurement has greater value, too. The usefulness of each activity is calculated as follows:   [20] Development and Evaluation of Methodology Where: -U A is the usefulness of activity A; value Mp is the parameter value (weight) of the measurement with biggest validity, taken before the activity has started, according the model presented in Fig. 2; value Mn is the parameter value (weight) of the measurement, taken after the activity has finished and had biggest validity, according the model presented in Fig. 3; validity Mp is the validity of the measurement taken before the activity has started and has the biggest validity, according to the cumulative normal distribution model; validity Mn is the validity of the measurement taken after the activity has finished, calculated with Gamma distribution model; -F A is the factor of importance for an activity, and it is an indicator of how much an activity contributed to measurement change.
The factor of importance for every activity is calculated as follows: Where, X A is the number of occurrences of activity A between two measurements and N is the total number of activities performed between two measurements.
The factor value is in the range [0, 1]. If there was only one activity that influences a parameter change in the measurement data, then its factor of importance will be one.
For example, if we have n activities between two measurements, of which x of them are 'walking', y are 'running' and z are 'cycling', then the factor for the walking type of activities will be x/n, for running y/n and for cycling z/n.
Having defined the factor of importance data class transformation can be performed. The raw data and the transformed data, to which classification method can be applied are presented in the following tables (Tables 1 and 2). Phase 3: Find N Most Useful Activities and Filter Test Data with Only N Most Useful Activities. In this step top N most useful activities for a given user are calculated and recommended. In our approach we recommend only the activities that will help the user to lose weight. Tests have been made using N = 10 (we recommend maximum ten most useful activities), and N = 1 (we recommend only the most useful activity). Test data are filtered against these recommendations. This way the test data will consists only of the activities that the user performed as recommendations. In Table 3 the results are shown based on the different test parameters.

Methodological Approach
For evaluation of the proposed algorithm, a specific methodology that consists of six steps has been defined: (1) User u and moment m are chosen (using percentage split); (2) All activities and measurements performed by u before m as a local training set are considered for generating recommendations. The other activities after moment m are used for testing the recommendations; (3) Recommendation algorithm is used to generate recommendations for weight loss.
For each activity a u we calculate their usefulness for weight loss (if the user gained weight the usefulness is negative); (4) The activities are grouped by their type (kind of activity, total time class, distance class and calories class), and sorted by their usefulness; (5) The most useful activities for weight loss are considered for recommendation. The recommended activities are compared with the activities from the test data (after moment m). If they have the same usefulness (positive), then we consider to have a positive recommendation. If we don't find that type of activity in the test set, we assume that the user didn't implement the recommendation; (6) Classification methods are applied on the test data (which is consisted only of the recommended activities), to analyze the accuracy of the algorithm.
Few more constraints are added to filter the observations generated according to the above method in order to get more relevant results. For the purpose of the experiment at least 2 measurements in the local training sets of all observations should be present, the period between consecutive measurements next(a u ) and prevp(a u ) should be at least 5 days and at most 20 days. Additionally, au should not be performed in the last 2 days of the interval because we want to increase the chances that the activity influenced next(a u ).

Experimental Evaluation
The dataset used for this experiment is outdoor activity data set collected from the SportyPal service. The collected dataset is generated by 1000 users. Six different attributes are used for classification purpose: activity type, total time, calories burned, distance passed, the calculated factor of importance and usefulness (positive or negative) ( Table 2).
Three types of activities (representing 90% of all activities performed) are analyzed: Walking, Running and Cycling (Figs. 4, 5 and 6). The cycling activities represented 10% of the analyzed activities (from running, walking and cycling), walking represented 30% of the activities, and running represented 60% of the activities.
• The recommendations of activities with longer distances had biggest accuracies. • The walking recommendations had highest accuracy when the duration time was in the range of 90 min to 120 min and the calories spent were between 500 and 1000. • The running recommendations had highest accuracy when the total time was in the range of 60 min to 90 min, for distances from 15 to 20 km. • The cycling recommendations had highest accuracies for short durations and calories expenditure to 1000.
As we are using class data representation for better manipulation of the data and for classification purposes, we analyzed the accuracy of the given recommendations for different number of classes.
We can see from Figs. 7, 8 and 9, that as we increase the number of classes, we get less recommendations. The accuracy reaches some peak in a three class representation and that is why we used this number of classes. The accuracy of the recommendations increases as we have bigger weight difference for generating valid recommendations. Because there are fluctuations of the weight during the day, when we are considering only 0.5 kg difference in measurement change, we are not sure if that difference was influenced by some other factors as food/liquid intake, time of the measurement (in the morning or later during the day), or is it just as a result of the performed activities. That is why the accuracy of the recommendations gives better performance when considering measurement difference of 1 kg and even 1.5 kg.

Metrics Used
Although the accuracy is very important factor to analyze the performance of recommendation algorithms, used alone as a metric it is not enough [27].
Therefore, in our work additional metrics are used: accuracy, mean absolute error (MAE), precision and recall to evaluate and compare the recommendation performance of the used algorithms. The accuracy of the system is high as the MAE of the prediction system is low. A well-performed prediction system should maximize the precisions and recalls. Precision can be thought of as a measure of a classifiers exactness. Recall can be thought of as a measure of a classifiers completeness [28]. The mean absolute error is used to measure how close forecasts or predictions are to the eventual outcome, without considering their direction. It measures accuracy for continuous variables.

Experimental Results
This activity recommender system model incorporates several classification algorithms: • We used a 10-fold cross validation with: J48, Decision Stump, Decision Table, LibSVM and Naïve Bayes, because the data set (consisted of 1000 users) is not very large. • RF has been built on 10 trees.
• The classifiers showed results with accuracy from 85% to 95% depending of the parameters value. • The classifiers performed with higher accuracy when taking into account only measurement difference of more than 1 kg. • Classifiers: Decision Stump (DS), Decision Table (DT) and Random Forest (RF) showed a better general performance over LibSVM, Naïve Bayes (NB) and J48. • The best performance of the recommender algorithm is with accuracy of 95% (Decision Stump), when recommending only one best activity to the user and the measurement difference is more than 1 kg, but unfortunately this was performed on a small scale data (due to the parameter restrictions), consisted of only 162 valid activities. • The analysis of the data showed that the most important parameter when recommending physical activities is the distance of the activity. With longer distance activity recommendations, the accuracy of the model increased.  • Even though the users performed overall 20 different activities, only Walking, Running and Cycling were taken into consideration when analyzing the data since they were consisting 90% of all activities. • The cycling activities represented around 10% of the analyzed activities (only running, walking and cycling), walking represented around 30% of the activities, and running represented almost 60% of the activities.
Overall, the performance of the classifiers for generating recommendations without using collaborative filtering technique has been very effective with accuracy of 85% to 95% when using measurement difference bigger than 0.5 kg and 1 kg, and even bigger accuracy when using measurement difference bigger than 1.5 kg.

Conclusion and Future Work
In this paper, we presented and compared the results of several data mining techniques for activity recommender system. For the generation of the recommendations, only the user's history data of activities and measurements were used. Even without using any collaborative filtering techniques in the process of generating recommendations, the accuracy of the given recommendations showed great results with accuracy of (85% to 95%). We used the same data set from a sport activity service, SportyPal [29], as in the COHESY algorithm implemented in the previous work. The analyzed results also showed that the accuracy rises when the difference change in the parameter (weight) is bigger. Further study on testing different algorithms and recommendation methodologies can be considered to achieve better accuracy. Also new real medical data set, of patients with chronic diseases can be considered to be used to test the proposed recommender algorithm, and continue with its improvement.