An Iterative Learning Algorithm for Within-Network Regression in the Transductive Setting

  • Annalisa Appice
  • Michelangelo Ceci
  • Donato Malerba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5808)

Abstract

Within-network regression addresses the task of regression in partially labeled networked data where labels are sparse and continuous. Data for inference consist of entities associated with nodes for which labels are known and interlinked with nodes for which labels must be estimated. The premise of this work is that many networked datasets are characterized by a form of autocorrelation where values of the response variable in a node depend on values of the predictor variables of interlinked nodes. This autocorrelation is a violation of the independence assumption of observation. To overcome to this problem, the lagged predictor variables are added to the regression model. We investigate a computational solution for this problem in the transductive setting, which asks for predicting the response values only for unlabeled nodes of the network. The neighborhood relation is computed on the basis of the node links. We propose a regression inference procedure that is based on a co-training approach according to separate model trees are learned from both attribute values of labeled nodes and attribute values aggregated in the neighborhood of labeled nodes, respectively. Each model tree is used to label the unlabeled nodes for the other during an iterative learning process. The set of labeled data is changed by including labels which are estimated as confident. The confidence estimate is based on the influence of the predicted labels on known labels of interlinked nodes. Experiments with sparsely labeled networked data show that the proposed method improves traditional model tree induction.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abreu, M., de Groot, H., Florax, R.: Space and growth: A survey of empirical evidence and methods. Region and Development, 12–43 (2005)Google Scholar
  2. 2.
    Anselin, L.: Spatial externalities, spatial multipliers and spatial econometrics. International Regional Science Review (26), 153–166 (2003)CrossRefGoogle Scholar
  3. 3.
    Appice, A., Dzeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: COLT, pp. 92–100 (1998)Google Scholar
  5. 5.
    Brefeld, U., Gärtner, T., Scheffer, T., Wrobel, S.: Efficient co-regularised least squares regression. In: Cohen, W.W., Moore, A. (eds.) 23th International Conference on Machine Learning, ICML 2006. ACM International Conference Proceeding Series, vol. 148, pp. 137–144. ACM, New York (2006)Google Scholar
  6. 6.
    Charlton, M., Fotheringham, S., Brunsdon, C.: Geographically weighted regression. In: ESRC National Centre for Research Methods NCRM Methods Review Papers NCRM/006 (2005)Google Scholar
  7. 7.
    Cortez, P., Morais, A.: A data mining approach to predict forest fires using meteorological data, pp. 512–523. APPIA (2007)Google Scholar
  8. 8.
    Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Abstracts of the 90th ESA Annual Meeting, The Ecological Society of America, p. 152 (2005)Google Scholar
  9. 9.
    Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 256–264. ACM, New York (2008)Google Scholar
  10. 10.
    David, J., Jennifer, N., Brian, G.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 593–598. ACM, New York (2004)Google Scholar
  11. 11.
    Macskassy, S.A., Provost, F.: A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 172–175. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Macskassy, S., Provost, F.: Classification in networked data: a toolkit and a univariate case study. Machine Learning 8, 935–983 (2007)Google Scholar
  13. 13.
    Macskassy, S.A.: Improving learning in networked data by combining explicit and mined links. In: Proceedings of the 22nd Conference on Artificial Intelligence, AAAI 2007, pp. 590–595. AAAI Press, Menlo Park (2007)Google Scholar
  14. 14.
    McPherson, M., Smith-Lovin, L., Cook, J.: Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 415–444 (2001)CrossRefGoogle Scholar
  15. 15.
    Jennifer, N., David, J.: Relational dependency networks. Journal of Machine Learning Research 8, 653–692 (2007)MATHGoogle Scholar
  16. 16.
    Neville, J., Simsek, O., Jensen, D.: Autocorrelation and relational learning: Challenges and opportunities. In: Proceedings of the Workshop on Statistical Relational Learning (2004)Google Scholar
  17. 17.
    Pace, P., Barry, R.: Quick computation of regression with a spatially autoregressive dependent variable. Geographical Analysis 29(3), 232–247 (1997)CrossRefGoogle Scholar
  18. 18.
    Rey, S.J., Montouri, B.D.: U.s. regional income convergence: a spatial econometric perspective. Regional Studies (33), 145–156 (1999)CrossRefGoogle Scholar
  19. 19.
    Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)Google Scholar
  20. 20.
    Tobler, W.: Cellular geography. In: Gale, S., Olsson, G. (eds.) Philosophy in Geography (1979)Google Scholar
  21. 21.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)MATHGoogle Scholar
  22. 22.
    Zhou, Z.-H., Li, M.: Semisupervised regression with cotraining-style algorithms. IEEE Transaction in Knowledge Data Engineering 19(11), 1479–1493 (2007)CrossRefGoogle Scholar
  23. 23.
    Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 912–919. AAAI Press, Menlo Park (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Annalisa Appice
    • 1
  • Michelangelo Ceci
    • 1
  • Donato Malerba
    • 1
  1. 1.Dipartimento di InformaticaUniversità degli Studi di BariBariItaly

Personalised recommendations