In order to run experiments, software using the architecture developed was implemented, which makes it possible for a set of tests to be conducted and detailed results to be generated for a chosen configuration. The software developed can operate in two modes: for determining optimal classifier parameters (Hill Climbing) and for performing a set of tests. Both modes use the architecture developed, but in the first mode classifier parameters are modified on the go while the second mode uses predetermined values of classifier parameters.
All tests carried out consisted of multiple series and each series comprised a set of rounds. During each round, a set of tasks were performed (Face Recognition or OCR). After obtaining the result for the individual task, the time \(r_t\) and power consumption \(r_p\) were measured (i.e. costs of performing the task). After each round had been conducted, a classifier was built on the basis of the knowledge acquired from the previous and current rounds in the series in question. Those classifier parameters were established in the process of Hill Climbing. In the first round (reference round), the task execution location (cloud or mobile device) was selected randomly. The costs calculated on the basis of the results of this round were not used to build a classifier, but to calculate the penalty that was applied when the task ended in an error. Such a situation could occur e.g. when the network connection established to execute the cloud service was disrupted or terminated. Then, instead of the costs calculated, \(r_i\) values from the reference round corresponding to this task were used, multiplied by a constant factor equal to 1.5. After all rounds in the series had been conducted, all knowledge gathered was deleted. During the tests carried out, multiple series were conducted in order to obtain average values for individual rounds.
Two services were used during tests: Face Recognition and OCR. For the Face Recognition (FR) service, five types of tests were developed (shown in Table 2) for different input data (video stream parameters).
Table 2 Types of tests for the Face Recognition service For the OCR service, five types of test were developed as well (shown in Table 3) for different input data (image parameters).
Table 3 Types of tests for the OCR service Three mobile devices were used during initial tests: the Lenovo Tab 2 A7-30D (1.3 GHz CPU, 1 GB RAM) tablet as well as the Samsung Galaxy Trend Plus (1.2 GHz CPU, 768 MB RAM) and HTC Desire 610 (1.2 GHz CPU, 1 GB RAM) mobile phones. All devices used the Android 4.4.2 operating system. The main experiment was conducted using only the Lenovo Tab 2 A7-30D device. In order to run remote tasks (Face Recognition and OCR) in the cloud, the AWS Lambda solution was used.
All experiments were executed using real-world Internet connections (Wi-Fi and HSDPA/HSUPA). Therefore we were not able to control connection quality and the results obtained suffer from relatively high variation. However, such conditions are similar to real-world applications in which connection quality may change.
The following attributes are used in experiments: O consists of nine attributes presented in Table 4. Eight of them describe a task t, one (connectionType) describes a context c. Execution results R consist of two numeric attributes: batteryUsage and timeUsage. The set of decisions D has two values: cloud and local.
Table 4 Attributes describing tasks and the context Power Measurement
The power consumption measurement (estimation) module was an important element used in power consumption tests in the process of learning and also an element allowing the classifier that predicted energy demand to learn. The choice of the method for measuring or estimating power was preceded by analysing and testing existing solutions. The tests were performed for Android 4.4.2, which was used by a large number of mobile devices at the time (in 2016). The method selected should work in real time (online), allow measurements for individual device components (CPU, wireless communication modules), and should have appropriate measurement resolution (less than 1% battery consumption). Owing to this resolution, it is possible to measure the power consumption of particular applications/services on mobile devices quite accurately.
The first study concerned power measurement methods on Android devices. The most commonly used measurement method is to use a public API (BatteryManager) that makes it possible to retrieve information about the current power status of the device. It uses a subscription mechanism, which prevents obtaining information about battery status regularly and continuous real-time measurements. At the same time, the maximum resolution of measurements is 1%, which may sometimes not be sufficient to measure the difference in power consumption between application/service launches in different contexts. It is possible to use an advanced API (via the android.os.BatteryManager class), which allows measurements with a resolution better than 1%, but this is feasible only for a few devices with the Summit SMB347 and MAX17050 battery charger integrated circuits, which are present in Nexus series devices (such as Google Nexus 6 and 9), so this is not a solution which could be widely used on a variety of Android devices. There is also the non-public BatteryInfo Android API, which makes it possible to obtain low-level information about power consumption. However, it requires android.permission.BATTERY_STATS permissions, which are reserved for applications built into the system and cannot be easily used by user applications. Another option for obtaining data on power consumption is to use the dumpsys batterystats command and visualize the results in the Battery Stats and Battery Historian programs. This enables data to be obtained from system logs, including information on power consumption by the entire device as well as its individual components. The downside of this solution is that it does not work online and that it only shows the results on a PC. Another solution (Carat [27]) allows for the monitoring and analysis of data (on an external server) on power consumption from multiple mobile devices simultaneously and for detecting anomalies in the operation of individual applications. However, this solution does not allow local power measurement in real time (online). There are also closed applications available, such as Battery Doctor, Battery Saver 2017 and GSam Battery Monitor, which can be used for the monitoring and management of battery consumption on mobile devices. Still, due to the lack of their source codes or libraries, it was not possible to use them in the solution developed. An alternative to software solutions is the physical measurement of battery power consumption. This is the most accurate method, but it usually requires gaining access to mobile device internals and the use of additional measuring equipment. Moreover, such solutions only allow for the measurement of total power consumption of the device without obtaining any results for individual components such as CPU or wireless communication modules. Because of the nature of this measurement, this solution will never be widely used. Examples in this category are BattOr [28] (open-source), and the commercial Monsoon Mobile Device Power Monitor solution.
Our further research focused on the estimation of power consumption by mobile devices. Solutions using this method make it possible to obtain real-time results with a high measurement resolution. The most popular and widely used solution based on estimation is the PowerTutor program, which uses three basic energy characteristics of mobile device components. For a thorough analysis of the capabilities of this solution and its potential further use, we used the source code of this program to develop our own library for estimating power consumption. The solution developed allows for estimating power consumption of individual mobile device components (such as CPU and communication modules) with an error in the range of 1–5% [20], but it was designed for older devices and does not support new LTE wireless communication modules. Finally, the latest method for estimating mobile device power consumption uses the power profiles provided by device manufacturers. However, not all mobile devices have these profiles defined correctly which leads to problems with using this method. At the same time, no libraries have been developed that use this solution. In order to compare this method with PowerTutor, we conducted tests using the Lenovo Tab 2 A7-30D device. Preliminary results showed that for this device, there are no major differences between the two solutions when it comes to estimating the power consumption of the CPU and wireless communication modules.
The analysis of available solutions demonstrates that only estimation methods work on most devices and meet basic requirements, i.e. they work online, allow for measuring individual components of a mobile device and have the appropriate measurement resolution. While being aware of its limitations, we have decided to use our own library developed with the use of the PowerTutor source code. However, with the device used in tests, this solution allowed for a fairly accurate estimation of the power consumed by the CPU and wireless communication modules. In future research concerning new Android mobile devices, we are going to use a solution based on power profiles, which has been pre-tested by us. This will require developing a library for analysing the power_profile.xml system file and calculating the power consumption of individual components of a mobile device.
Power Consumption During Learning Process
In the first stage of proper research, the energy cost of building classifiers was measured (including the operation of the Weka library [29, 30]), which made it possible to assess whether it was not excessive in relation to the potential savings resulting from the use of the solution developed. Tests were performed for the three classifiers used (C4.5Footnote 2, Random Forest, Naïve Bayes) using the Lenovo Tab 2 A7-30D tablet. For conducting tests, artificial training data were generated, containing 1,000 random examples, which corresponds to executing 1,000 tasks using the system developed. Every classifier was tested with different percentages (25%, 50%, 75% and 100%)Footnote 3 of the training data set used. Tests for specific percentage values were repeated 100 times and the final result was averaged. In Fig. 2, battery consumption (as a percentage) during the process of building individual classifiers is shown. The results demonstrate that power consumption for all classifiers is low (up to 0.4% of battery charge) and does not significantly affect the ability to carry out the tests of the services developed. In addition, the data set used for these tests was very large and in practice when it comes Face Recognition and OCR service tests, the amount of data that had to be processed by the classifier was much smaller (usually from 100 to 200 examples). The lowest power consumption is associated with the Naïve Bayes classifier and this is due to the fact that Naïve Bayes exhibits linear time complexity, which results in less stress on the device at the time of building the classifier (compared to the remaining classifiers used) and thus less energy expenditure.
Tuning Classifier Parameters
In the next research stage, the Lenovo Tab 2 A7-30D mobile device was used, on which the Hill Climbing algorithm was run in order to determine the optimal parameters for individual classifiers. During the tests, the weights of adaptive algorithm were set as follows: \(w_p = 50\) and \(w_t = 50\), and for each classifier, 20 series of the algorithm were run, each series consisting of five rounds, and all series for a single set of parameters were repeated twice. Initial values of parameters were selected manually based on results of a number of optimization process executions during software development and testing. They appeared to be close to optimal values. However, one may start the tuning process from another starting point, taking into account that it may take longer. Generally, the \(\varDelta \) parameter should be as small as possible given the time complexity of the process. As concerns the number of bins, it equals two because we wanted odd values of this parameter to have a neutral value in the middle. This is not necessary, though. The \(\varDelta \) parameter for \(\epsilon \) is set to five to limit the amount of computations. For each set of parameters, tasks were executed in various contexts—a single test round consisted of:
-
Five tasks executed with the Wi-Fi connection available (9 Mb/s), including three Face Recognition tasks (fr3, fr4, fr5) and two OCR tasks (ocr4, ocr5);
-
Five tasks with the HSDPA/HSUPA connection available, including three Face Recognition tasks (fr3, fr4, fr5) and two OCR tasks (ocr4, ocr5).
Table 5 Optimization of parameters for the C4.5 classifier The result of this stage of research was the optimization of parameters for two classifiers: C4.5 (Table 5) where the \(\epsilon \) parameter was changed (from 10 to 15) and Naïve Bayes (Table 6) where the same parameter changed from 10 to 20. Such a big change in the value of the Naïve Bayes classifier probably resulted from the fact that the decisions made by the classifier were sometimes suboptimal and a higher value of this factor contributed to better optimization.
Table 6 Optimization of parameters for the Naïve Bayes classifier For the Random Forest classifier (Table 7), the initial parameters proved to be optimal and did not require improvement.
Table 7 Optimization of parameters for the Random Forest classifier Optimization
The final stage consisted of conducting the tests concerning the possibility of optimizing power consumption (and additionally execution time) using different classifiers. During the tests related to optimizing power consumption, the weights of the adaptive algorithm were set to \(w_p\) = 90 and \(w_t\) = 10. For the additional test related to optimizing execution time (Fig. 10), the weights were set to \(w_p\) = 10 and \(w_t\) = 90. The Lenovo Tab 2 A7-30D mobile device was used during test. Each test carried out consisted of 20 series, each series of nine rounds, and each round comprised:
-
Seven tasks executed with the Wi-Fi connection available (9 Mb/s), including four tasks of the Face Recognition type (fr1, fr2, fr4, fr5) and three tasks of the OCR type (ocr1, ocr2, ocr4);
-
Seven tasks executed with the HSDPA/HSUPA connection available, including four tasks of the Face Recognition type (fr1, fr2, fr4, fr5) and three tasks of the OCR type (ocr1, ocr2, ocr4).
Detailed test results for individual classifiers are presented in two graphs: for the optimization of power consumption and task execution location (mobile device or cloud computing). For the power consumption optimization graph, the result of a single round was the aggregate power consumption by all tasks executed in that round. The graph shows the average results of individual rounds conducted in all series; for each result, the standard deviation is marked. The t-Student test was also performed for each classifier and for averaged results from all series of the first and the last round. In the case of the graph showing the task execution location, the result of a single round is the number of tasks executed in a given location (locally on the mobile device or remotely using cloud computing). On the graph, each round is marked separately and contains the average number (from all series) of tasks executed in a particular location (locally/remotely).
Figure 3 shows the power consumption optimization graph for the C4.5 classifier. It can be seen that power consumption decreases in subsequent rounds until it begins to oscillate around a single value of 17,500 mJ. The result of the t-Student test for this classifier (the p-value) equals 0.000018, which means that average values for the first and last rounds are statistically significantly different.
Figure 4 shows the graph presenting the number of tasks executed locally on the mobile device and remotely in the cloud for the C4.5 classifier. It can be noticed that the algorithm using this classifier sends more and more tasks to the cloud over time (in successive rounds), reducing power consumption on the mobile device.
Figure 5 shows the power consumption optimization graph for the Random Forest classifier. It can be seen that power consumption decreases in subsequent rounds until it reaches a value of about 18,000 mJ. The result of the t-Student test for this classifier (the p-value) equals 0.00000017, which means that average values for the first and last rounds are statistically significantly different.
Figure 6 shows the graph presenting the number of tasks executed locally on the mobile device and remotely in the cloud for the Random Forest classifier. It can be noted that the algorithm that uses this classifier, similarly as in the case of C4.5, sends more tasks to the cloud, reducing power consumption on the mobile device.
Figure 7 shows the power consumption optimization graph for the Naïve Bayes classifier. It can be seen that power consumption between the first and last rounds does decrease, but it is not a steady or large reduction. The result of the t-Student test for this classifier (the p-value) equals 0.1, which means that average values for the first and last rounds are not statistically significantly different.
Figure 8 shows a graph presenting the number of tasks executed locally on the mobile device and remotely in the cloud for the Naïve Bayes classifier. It can be noticed that the algorithm using this classifier, similarly to the previous tests, sends more tasks to the cloud; however, it does not allow a significant power consumption optimization to be achieved. This might be related to the high value of the random factor, which was the result of running the Hill Climbing algorithm.
Figure 9 shows a comparison of power consumption optimization results for all three classifiers and for services performed without using machine learning methods (exclusively locally on the mobile device and exclusively in the cloud). It can be noticed that classifiers based on decision trees (C4.5 and Random Forest) perform much better than the Naïve Bayes classifier. They achieve almost the same optimization levels and the result of the t-Student test for both classifiers (the p-value) equals 0.3741, which means that average values for the last round for the C4.5 and Random Forest classifiers are not statistically significantly different. However, the Random Forest classifier achieves the final power consumption stage faster. The worst result was achieved by the Naïve Bayes classifier. The result of the t-Student test for the Naïve Bayes and C4.5 classifiers (the p-value) equals 0.0021, which means that average values for the last round for those classifiers are statistically significantly different. In cases where the service was executed in a single location (in the cloud or locally), the results were worse than in cases where classifiers and machine learning were used. However, the result for running the service exclusively in the cloud was only slightly worse than for the Naïve Bayes classifier.
In order to check whether the algorithm developed allows for the optimization of other parameters, task execution time optimization tests were carried out (Fig. 10) for the C4.5, Random Forest and Naïve Bayes classifiers and for services performed without using machine learning methods (exclusively locally on the mobile device and exclusively in the cloud). For all the classifiers tested, task execution time between the first and last rounds decreased. Similar to the power consumption optimization tests, classifiers using decision trees performed much better than the Naïve Bayes classifier when it came to optimizing task execution time. The result of the t-Student test for average values of the first and last rounds of Naïve Bayes classifier tests amounted to 0.3, which means that there is no statistically significant improvement in task execution time results for that classifier. For the C4.5 (the p-value in the t-Student test equals 0.00019) and Random Forest (the p-value in the t-Student test equals 0.0026) classifiers, the decrease in task execution time between the first and last rounds was statistically significant. All classifiers achieved almost the same level of optimization. Results of t-Student tests for these classifiers were as follows: 0.3471 (C4.5/Random Forest), 0.3880 (C4.5/Naïve Bayes) and 0.1939 (Random Forest/Naïve Bayes) which means that average values for the last round for all classifiers are not statistically significantly different. In cases where the service was executed in a single location (in the cloud or locally), the results were significantly worse than in cases where classifiers and machine learning were used.
Numbers of tasks executed in both locations for various contexts (HSDPA/HSUPA and Wi-Fi connections) are presented in Figs. 11, 12, 13, 14, 15 and 16. For all learning algorithms and contexts, the number of local executions exhibits a downward trend, while the number of cloud executions shows an upward one. The difference between local and cloud executions is larger for the Wi-Fi connection than for HSDPA/HSUPA because transfer speed is higher, the connection is more stable, and data transfer becomes cost-effective for a larger number of tasks. This is particularly noticeable for the C4.5 and Random Forest algorithms, which are more accurate than Naïve Bayes.
In order to compare our software with already existing solutions, we analyzed various Mobile Cloud Computing solutions. Many of those (such as MALMOS [31], COMET [32] and COSMOS [33]) do not take into account energy aspects at all in their operation. Only a few (such as AIOLOS [34], CACTSE [35], Cuckoo [36], EMCO [37], IC-Cloud [38], MAUI [39] and ThinkAir [40]) account for energy aspects and only the IC-Cloud solution uses machine learning algorithms to optimize the operation of applications/services. However, almost all of the solutions analyzed (including IC-Cloud) are not being developed any further or there is no access to their source codes. It was only possible to find source codes for two solutions: AIOLOSFootnote 4 and CuckooFootnote 5. Unfortunately, both of these solutions use old software development kit versions (such as Eclipse and the ADT plugin instead of Android Studio). In the case of AIOLOS, we were able to configure and build a sample project, but when running the sample (using the Androsgi plugin), the application closed and reported an error. The code was analyzed, but it was not possible to determine what caused the error. For the second solution—Cuckoo, it was possible to run the sample application. However, comparing this solution with our system proved difficult due to the fact that Cuckoo lacked machine learning mechanisms and used completely different solutions in terms of the computing cloud—the server ran on an EC2 instance and it was not possible to use the AWS Lambda service.