Machine Learning

, Volume 62, Issue 1–2, pp 65–105 | Cite as

Distribution-based aggregation for relational learning with identifier attributes

Article

Abstract

Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures.

Keywords

identifiers relational learning aggregation networks 

References

  1. Bernstein, A., Clearwater, S., Hill, S., Perlich, C., & Provost, F. (2002). Discovering knowledge from relational data extracted from business news. In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2002 (pp. 7–20). University of Alberta, Edmonton, Canada.Google Scholar
  2. Blockeel, H., & Raedt, L.D. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101, 285–297.MATHMathSciNetCrossRefGoogle Scholar
  3. Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:7, 1145–1159.CrossRefGoogle Scholar
  4. Brazdil, P., Gama, J., & Henery, R. (1994). Characterizing the applicability of classification algorithms using meta level learning. In Proceedings of the 7th European Conference on Machine Learning (pp. 83–102).Google Scholar
  5. Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. In Proceedings of the International Conference on Management of Data (pp. 307–318).Google Scholar
  6. Cortes, C., Pregibon, D., & Volinsky, C. (2002). Communities of interest. Intelligent Data Analysis, 6:3, 211–219.Google Scholar
  7. Craven, M., & Slattery, S. (2001). Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43, 97–119.MATHCrossRefGoogle Scholar
  8. DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7, 177–188.CrossRefGoogle Scholar
  9. Domingos, P., & Richardson, M. (2001). Mining the network value of customers. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (pp. 57–66).Google Scholar
  10. Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291–316.CrossRefGoogle Scholar
  11. Flach, P., & Lachiche, N. (2004). Naive Bayesian classification for structured data. In Machine Learning, 57, 233–269.Google Scholar
  12. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5, 49–58.Google Scholar
  13. Gärtner, T., Lloyd, J.W., & Flach, P.A. (2002). Kernels for structured data. In Proceedings of the 12th International Conference on Inductive Logic Programming (pp. 66–83). Springer.Google Scholar
  14. Goldberg, H., & Senator, T. (1995). Restructuring databases for knowledge discovery by consolidation and link formation. In Proceedings of the 1st International Conference On Knowledge Discovery and Data Mining (pp. 136–141). Montreal, Canada: AAAI Press.Google Scholar
  15. Jensen, D., & Getoor, L. (2003). In Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003. American Association for Artificial Intelligence.Google Scholar
  16. Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the 19th International Conference on Machine Learning (pp. 259–266). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  17. Jensen, D., Neville, J., & Gallagher, B. (2004). Why collective inference improves relational classification. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (pp. 593–598). New York, NY, USA: ACM Press.Google Scholar
  18. Jensen, D., Neville, J., & Hay, M. (2003). Avoiding bias when aggregating relational data with degree disparity. In Proceedings of the 20th International Conference on Machine Learning (pp. 274–281).Google Scholar
  19. Kietz, J.-U., & Morik, K. (1994). A polynomial approach to the constructive induction of structural knowledge. Machine Learning, 14, 193–217.MATHCrossRefGoogle Scholar
  20. Kirsten, M., Wrobel, S., & Horvath, T. (2000). Distance based approaches to relational learning and clustering. In S. Ďzeroski & N.Lavrač (Eds.), Relational data mining, (pp. 213–232). Springer Verlag.Google Scholar
  21. Knobbe, A., Haas, M.D., & Siebes, A. (2001). Propositionalisation and aggregates. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 277–288).Google Scholar
  22. Koller, D., & Pfeffer, A. (1998). Probabilistic frame-based systems. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) (pp. 580–587).Google Scholar
  23. Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Ďzeroski and N. Lavrač (Eds.), Relational data mining (pp. 262–291). Springer-Verlag.Google Scholar
  24. Krogel, M.-A., Rawles, S., Železng, F., Flach, P., Lavrač, N., & Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In 13th International Conference on Inductive Logic Programming (ILP) (pp. 197–214).Google Scholar
  25. Krogel, M.-A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In Proceedings of the 11th International Conference on Inductive Logic Programming (ILP) (pp. 142–155).Google Scholar
  26. Krogel, M.-A., & Wrobel, S. (2003). Facets of aggregation approaches to propositionalization. In Proceedings of the 13th International Conference on Inductive Logic Programming (ILP) (pp. 30–39).Google Scholar
  27. Lavrač, N., & Ďzeroski, S. (1994). Inductive logic programming: techniques and application. New York. Ellis HorwoodGoogle Scholar
  28. Libkin, L., & Wong L. (1994). New techniques for studying set languages, bag languages and aggregate functions. In Proceedings of the 13th Symposium on Principles of Database Systems (pp. 155–166).Google Scholar
  29. Macskassy, S., & Provost, F. (2003). A simple relational classifier. In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003.Google Scholar
  30. Macskassy, S., & Provost, F. (2004). Classification in networked Data: A Toolkit and a Univariate Case Study (Technical Report CeDER-04-08). Stern School of Business, New York University.Google Scholar
  31. McCallum, A., Nigam, K., J. Rennie, & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrival, 3, 127–163.CrossRefGoogle Scholar
  32. McCreath, E. (1999). Induction in First Order Logic from Noisy Training Examples and Fixed Example Set Size. Doctoral dissertation, Universtity of New South Wales.Google Scholar
  33. Michalski, R. (1983). A theory and methodology of inductive learning. Artificial Intelligence, 20, 111–161.MathSciNetCrossRefGoogle Scholar
  34. Morik, K. (1999). Tailoring representations to different requirements. In Proceedings of the 10th International Conference on Algorithmic Learning Theory (ALT) (pp. 1–12).Google Scholar
  35. Muggleton, S. (2001). CProgol4.4: A tutorial introduction. In S. Ďzeroski & N.Lavrač (Eds.), Relational Data Mining pp.(105–139). Springer-Verlag.Google Scholar
  36. Muggleton, S., & DeRaedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19 & 20, 629–680.MATHMathSciNetCrossRefGoogle Scholar
  37. Neville, J., & Jensen, D. (2005). Leveraging relational autocorrelation with latent group models. In Proceedings of the 5th IEEE International Conference on Data Mining (pp. 49–55). New York, NY, USA: ACM Press.Google Scholar
  38. Neville, J., Jensen, D., Friedland, L., & Hay, M. (2003a). Learning relational probability trees. In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (pp. 625–630). New York, NY, USA: ACM Press.Google Scholar
  39. Neville, J., Jensen, D., & Gallagher, B. (2003b). Simple estimators for relational Bayesian classifers. In Proceedings of the 3rd International Conference on Data Mining (pp. 609–612). Washington, DC, USA: IEEE Computer Society.Google Scholar
  40. Neville, J., Rattigan, M., & Jensen, D. (2003c). Statistical relational learning: Four claims and a survey. In Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003.Google Scholar
  41. Özsoyoǵlu, G., Özsoyoǵlu, Z., & Matos, V. (1987). Extending relational algebra and relational calculus with set-valued atributes and aggregate functions. ACM Transactions on Database Systems, 12, 566–592.CrossRefGoogle Scholar
  42. Perlich, C. (2005a). Approaching the ILP challenge 2005: Class-conditional Bayesian propositionalization for genetic classification. In Late-Braking track at the 15th International Conference on Inductive Logic Programming (pp. 99–104).Google Scholar
  43. Perlich, C. (2005b). Probability estimation in mulit-relational domain. Doctoral dissertation, Stern School of Business.Google Scholar
  44. Perlich, C., & Provost, F. (2003). Aggregation-based feature invention and relational concept classes. In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (pp. 167–176). New York, NY, USA: ACM Press.Google Scholar
  45. Perlich, C., Provost, F., & Simonoff, J. (2003). Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research, 4, 211–255.MathSciNetCrossRefGoogle Scholar
  46. Pompe, U., & Kononenko, I. (1995). Naive Bayesian classifier with ILP-R. In Proceedings of the 5th International Workshop on Inductive Logic Programming (pp. 417–436).Google Scholar
  47. Popescul, A., & Ungar, L. (2003). Structural logistic regression for link analysis. In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003.Google Scholar
  48. Popescul, A., & Ungar, L. (2004). Cluster-based concept invention for statistical relational learning. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (pp. 665–670).Google Scholar
  49. Popescul, A., Ungar, L., Lawrence, S., & Pennock, D.M. (2002). Structural logistic regression: Combining relational and statistical learning. In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003 (pp. 130–141).Google Scholar
  50. Provost, F., Perlich, C., & Macskassy, S. (2003). Relational learning problems and simple models. In Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003 (pp. 116–120).Google Scholar
  51. Quinlan, J. (1993). C4.5: Programs for machine learning. Los Altos, California: Morgan Kaufmann Publishers.Google Scholar
  52. Quinlan, J., & Cameron-Jones, R. (1993). FOIL: A midterm report. In Proceedings of the 6th European Conference on Machine Learning (ECML) (pp. 3–20).Google Scholar
  53. Slattery, S., & Mitchell, T. (2000). Discovering test set regularities in relational domains. In Proceedings of the 17th International Conference on Machine Learning (pp. 895–902).Google Scholar
  54. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (pp. 485–492). Edmonton, Canada: Morgan Kaufmann.Google Scholar
  55. Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (pp. 870–878).Google Scholar
  56. Witten, I., & Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.Google Scholar
  57. Wnek, J., & Michalski, R. (1993). Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments. Machine Learning, 14, 139–168.Google Scholar
  58. Woznica, A., Kalousis, A., & Hilario, M. (2004). Kernel-based distances for relational learning. In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2004.Google Scholar
  59. Zheng, Z., Kohavi, R., & Mason, L. (2001). Real World Performance of Association Rule Algorithms. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (pp. 401–406).Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.IBM T.J. Watson Research CenterYorktown Heights
  2. 2.New York UniversityNew York

Personalised recommendations