Advertisement

Improving Automatic Edge Selection for Relational Classification

  • Cristina Pérez-Solà
  • Jordi Herrera-Joancomartí
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8234)

Abstract

In this paper, we address the problem of edge selection for networked data, that is, given a set of interlinked entities for which many different kinds of links can be defined, how do we select those links that lead to a better classification of the dataset. We evaluate the current approaches to the edge selection problem for relational classification. These approaches are based on defining a metric over the graph that quantifies the goodness of a specific link type. We propose a new metric to achieve this very same goal. Experimental results show that our proposed metric outperforms the existing ones.

Keywords

Aggregation Operator Versus Test Manhattan Distance Induction Algorithm Feature Subset Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th Int. Machine Learning, pp. 121–129 (1994)Google Scholar
  2. 2.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  3. 3.
    Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the 9th National Conf. on Artificial Intelligence, pp. 547–552 (1991)Google Scholar
  4. 4.
    Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proc. of the 10th Conf. on Artificial intelligence, pp. 129–134 (1992)Google Scholar
  5. 5.
    Cardie, C.: Using decision trees to improve case-based learning. In: Proceedings of the 10th Int. Conf. on Machine Learning, pp. 25–32. Morgan Kaufmann (1993)Google Scholar
  6. 6.
    Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007)Google Scholar
  7. 7.
    Newman, M.E.J.: Mixing patterns in networks. Phys. Rev. E 67, 026126 (2003)Google Scholar
  8. 8.
    Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1-2), 65–105 (2006)CrossRefGoogle Scholar
  9. 9.
    Perlich, C., Provost, F.: Aggregation-based feature invention and relational concept classes. In: Proc. of the 9th Int. Conf. on Knowledge Discovery and Data Mining, pp. 167–176 (2003)Google Scholar
  10. 10.
    Rousseeuw, P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. of Computational & Applied Mathematics 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  11. 11.
    Macskassy, S., Provost, F.: NetKit-SRL - network learning toolkit for statistical relational learningGoogle Scholar
  12. 12.
    Kendall, M., Gibbons, J.D.: Rank Correlation Methods, 5th edn. A Charles Griffin Title (September 1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cristina Pérez-Solà
    • 1
  • Jordi Herrera-Joancomartí
    • 1
    • 2
  1. 1.Dept. d’Enginyeria de la Informació i les ComunicacionsUniversitat Autònoma de BarcelonaBellaterraSpain
  2. 2.Internet Interdisciplinary Institute (IN3)UOCSpain

Personalised recommendations