Skip to main content

Machine learning as a service for enabling Internet of Things and People

Abstract

The future Internet is expected to connect billions of people, things and services having the potential to deliver a new set of applications by deriving new insights from the data generated from these diverse data sources. This highly interconnected global network brings new types of challenges in analysing and making sense of data. This is why machine learning is expected to be a crucial technology in the future, in making sense of data, in improving business and decision making, and in doing so, providing the potential to solve a wide range of problems in health care, telecommunications, urban computing, and others. Machine learning algorithms can learn how to perform certain tasks by generalizing examples from a range of sampling. This is a totally different paradigm than traditional programming language approaches, which are based on writing programs that process data to produce an output. However, choosing a suitable machine learning algorithm for a particular application requires a substantial amount of time and effort that is hard to undertake even with excellent research papers and textbooks. In order to reduce the time and effort, this paper introduces the TCDC (train, compare, decide, and change) approach, which can be thought as a ‘Machine Learning as a Service’ approach, to aid machine learning researchers and practitioners to choose the optimum machine learning model to use for achieving the best trade-off between accuracy and interpretability, computational complexity, and ease of implementation. The paper includes the results of testing and evaluating the recommenders based on the TCDC approach (in comparison with the traditional default approach) applied to 12 datasets that are available as open-source datasets drawn from diverse domains including health care, agriculture, aerodynamics and others. Our results indicate that the proposed approach selects the best model in terms of predictive accuracy in 62.5 % for regression tests performed and 75 % for classification tests.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Assem, H, O'Sullivan D (2015) Towards bridging the gap between machine learning researchers and practitioners. In: 2015 IEEE international conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE, pp 702–708

  2. Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  3. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on Machine learning, pp 161–168. ACM

  4. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836

    MathSciNet  Article  MATH  Google Scholar 

  5. Cover TM, Thomas JA (2012) Elements of information theory. Wiley, Hoboken

    MATH  Google Scholar 

  6. Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  7. Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical science pp 54–75

  8. Eugster MJA, Hothorn T, Leisch F (2008) Exploratory and inferential analysis of benchmark experiments. Department of Statistics, University of Munich. Tech Rep 30

  9. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1

    Article  Google Scholar 

  10. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  Article  MATH  Google Scholar 

  11. Hothorn T, Leisch F, Zeileis A, Hornik K (2005) The design and analysis of benchmark experiments. J Comput Gr Stat 14(3):675–699

    MathSciNet  Article  Google Scholar 

  12. Keaveney P (2001) Marketing for the voluntary sector: a guide to measuring marketing performance. Kogan Page Publishers, London

    Google Scholar 

  13. Kerr IR (2013) The internet of people? reflections on the future regulation of human-implantable radio frequency identification. In: Kerr IR, Steeves V, Lucock C (eds) Privacy, identity, and anonymity: lessons from the identity trail. Oxford University Press, Oxford (in press 2009)

    Google Scholar 

  14. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  MATH  Google Scholar 

  15. LeCun Y, Jackel L, Bottou L, Brunot A, Cortes C, Denker J, Drucker H, Guyon I, Muller U, Sackinger E et al (1995) Comparison of learning algorithms for handwritten digit recognition. Int Conf Artif Neural Netw 60:53–60

    Google Scholar 

  16. Lohr S (2012) The age of big data, vol 11. New York Times, New York

    Google Scholar 

  17. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2011) Big data: the next frontier for innovation, competition, and productivity

  18. Martin J, Hirschberg D (1996) Small sample statistics for classification error rates I: error rate measurements. Technical Report No. 96-21. Department of Information and Computer Science, University of California, Irvine

  19. Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinform 21:3301–3307

    Article  Google Scholar 

  20. Olshen L, Stone CJ et al (1984) Classification and regression trees. Wadsworth Int Gr 93(99):101

    MathSciNet  MATH  Google Scholar 

  21. Poultney C, Chopra S, Cun YL et al (2006) Efficient learning of sparse representations with an energy-based model. Adv Neural Inf Process Syst pp 1137–1144

  22. Sundmaeker H, Guillemin P, Friess P, Woelfflé S (2010) Vision and challenges for realising the internet of things, European commission information society and media. Tech Rep. http://www.internet-of-things-research.eu/pdf/IoTClusterbookMarch2010.pdf. Accessed 26 July 2015

  23. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Nat Acad Sci 99(10):6567–6572

    Article  Google Scholar 

  24. UCI Machine Learning Repository (2015). http://archive.ics.uci.edu/ml/datasets.html. Accessed 19 Dec 2015

  25. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  26. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural computation 8(7):1341–1390

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the EC project CogNet, 671625 (H2020-ICT-2014-2, Research and Innovation action) and in part supported by the Science Foundation Ireland ADAPT centre (Grant 13/RC/2106).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haytham Assem.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Assem, H., Xu, L., Buda, T.S. et al. Machine learning as a service for enabling Internet of Things and People. Pers Ubiquit Comput 20, 899–914 (2016). https://doi.org/10.1007/s00779-016-0963-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-016-0963-3

Keywords

  • Machine learning
  • Predictive modelling
  • Supervised learning
  • Regression models
  • Classification models