Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models

  • Héctor Allende-CidEmail author
  • Claudio Moraga
  • Héctor Allende
  • Raúl Monge
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 570)


In this work we present a Distributed Regression approach, which works in problems where distributed data sources may have different contexts. Different context is defined as the change of the underlying law of probability in the distributed sources. We present an approach which uses a discrete representation of the probability density functions (pdfs). We create neighborhoods of similar datasets, comparing their pdfs, and use this information to build an ensemble-based approach and to improve a second level model used in this proposal, that is based in stalked generalization. We compare the proposal with other state of the art models with 5 real data sets and obtain favorable results in the majority of the datasets.


Distributed Machine Learning Context-aware Regression Similarity representation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Context-Aware Regression from Distributed Sources. In: IDC 2013, Prague, Czech Republic, pp. 17–22 (2013)Google Scholar
  2. 2.
    Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Wind Speed Forecast under a Distributed Learning Approach. In: V Chilean Workshop of Pattern Recognition, Temuco, Chile (2013)Google Scholar
  3. 3.
    Allende-Cid, H., Allende, H., Monge, R.: Soft Computing applied to Distributed Regression with Context-Heterogeneity. Submitted to the Journal of Multivalued Logic and Soft Computing (January 2014)Google Scholar
  4. 4.
    Balcan, M.-F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general communication topologies. Paper presented at the meeting of the NIPS (2013)Google Scholar
  5. 5.
    Bello-Orgaz, G., Menéndez, H., Camacho, D.: Adaptive K-Means Algorithm for overlapped graph clustering. International Journal of Neural Systems 22(5), 1–19 (2012)CrossRefGoogle Scholar
  6. 6.
    Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559 (2001)Google Scholar
  7. 7.
    Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 300–307 (2007)MathSciNetGoogle Scholar
  8. 8.
    Chawla, N.V., Lawrence Hall, O., Kevin Bowyer, W., Phillip Kegelmeyer, W.: Learning ensembles from bites: A scalable and accurate approach. Journal Machine Learning Res. 5, 421–445 (2004)Google Scholar
  9. 9.
    D-Lib Magazine. A research library based on historical collections of the Internet Archive (2000), (accesed February 26, 2014)
  10. 10.
    Eyal, I., Keidar, I., Rom, R.: Distributed data clustering in sensor networks. Distributed Computing 24(5), 207–222 (2011)CrossRefzbMATHGoogle Scholar
  11. 11.
    Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. SIGKDD Explor. Newsl. 2(2), 34–38 (2000)CrossRefGoogle Scholar
  12. 12.
    Hefeeda, M., Gao, F., Abd-Almageed, W.: Distributed approximate spectral clustering for large-scale datasets. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)Google Scholar
  13. 13.
    Ienco, D., Bifet, A., Zliobaite, I., Pfahringer, B.: Clustering Based Active Learning for Evolving Data Streams. Discovery Science, 79–93 (2013)Google Scholar
  14. 14.
    Lattner, A., Grimme, A., Timm, I.: An evaluation of Meta Learning and Distributed Strategies in Distributed Machine Learning. In: European Conference on Data Mining 2010, pp. 67–74 (2010)Google Scholar
  15. 15.
    Lazarevic, A., Obradovic, Z.: The Distributed Boosting Algorithm. In: Knowledge Discovery and Data Mining, pp. 311–316 (2001)Google Scholar
  16. 16.
    López, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regaltc: A distributed genetic algorithm for concept learning based on regal and the treatment of counter examples. Soft Comput. 15(7), 1389–1403 (2011)CrossRefGoogle Scholar
  17. 17.
    Menéndez, H., Barrero, D., Camacho, D.: A Genetic Graph-based approach for Partitional Clustering. International Journal of Neural Systems 24(1430008), 1–19 (2014)Google Scholar
  18. 18.
    Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)Google Scholar
  19. 19.
    Pardo, L.: Statistical Inference Based on Divergence Measures. Ed. Chapman and Hall (2005)Google Scholar
  20. 20.
    Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook (2002)Google Scholar
  21. 21.
    Peteiro-Barral, D., Guijarro-Berdinas, B.: A survey of methods for distributed machine learning. Journal of Progress in Artificial Intelligence 2, 1–11 (2013)CrossRefGoogle Scholar
  22. 22.
    Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)CrossRefGoogle Scholar
  23. 23.
    Salicrú, M., Morales, D., Menéndez, M.L., Pardo, L.: On the applications of divergence type measures in testing statistical hypotheses. J. Multivar. Anal. 51(2), 372–391 (1994)CrossRefzbMATHGoogle Scholar
  24. 24.
    Tsoumakas, G., Vlahavas, I.P.: Effective Stacking of Distributed Classifiers. In: ECAI 2002, pp. 340–344 (2002)Google Scholar
  25. 25.
    Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), Google Scholar
  26. 26.
    Wirth, R., Borth, M., Hipp, J.: When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In: ECML 2001, pp. 3–7 (2001)Google Scholar
  27. 27.
    Wolpert, D.: Stacked Generalization. Neural Networks 5(2), 241–259 (1992)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Xing, Y., Madden, M., Duggan, J., Lyons, G.: Context-based Distributed Regression in Virtual Organizations. In: Parallel and Distributed Computing for Machine Learning. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), Cavtat-Dubrovnik, Croatia (2003)Google Scholar
  29. 29.
    Xing, Y., Madden, M.G., Duggan, J., Lyons, G.J.: Context-Sensitive Regression Analysis for Distributed Data. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 292–299. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Héctor Allende-Cid
    • 1
    Email author
  • Claudio Moraga
    • 2
    • 3
  • Héctor Allende
    • 1
  • Raúl Monge
    • 1
  1. 1.Departamento de InformáticaUniversidad Técnica Federico Santa MaríaValparaísoChile
  2. 2.European Centre for Soft ComputingMieres, AsturiasSpain
  3. 3.TU Dortmund UniversityDortmundGermany

Personalised recommendations