# Distribution-based aggregation for relational learning with identifier attributes

- 440 Downloads
- 30 Citations

## Abstract

Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures.

### Keywords

identifiers relational learning aggregation networks### References

- Bernstein, A., Clearwater, S., Hill, S., Perlich, C., & Provost, F. (2002). Discovering knowledge from relational data extracted from business news. In
*Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2002*(pp. 7–20). University of Alberta, Edmonton, Canada.Google Scholar - Blockeel, H., & Raedt, L.D. (1998). Top-down induction of first-order logical decision trees.
*Artificial Intelligence*,*101*, 285–297.MATHMathSciNetCrossRefGoogle Scholar - Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms.
*Pattern Recognition*,*30:7*, 1145–1159.CrossRefGoogle Scholar - Brazdil, P., Gama, J., & Henery, R. (1994). Characterizing the applicability of classification algorithms using meta level learning. In
*Proceedings of the 7th European Conference on Machine Learning*(pp. 83–102).Google Scholar - Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. In
*Proceedings of the International Conference on Management of Data*(pp. 307–318).Google Scholar - Cortes, C., Pregibon, D., & Volinsky, C. (2002). Communities of interest.
*Intelligent Data Analysis, 6:3*, 211–219.Google Scholar - Craven, M., & Slattery, S. (2001). Relational learning with statistical predicate invention: Better models for hypertext.
*Machine Learning*,*43*, 97–119.MATHCrossRefGoogle Scholar - DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials.
*Controlled Clinical Trials*,*7*, 177–188.CrossRefGoogle Scholar - Domingos, P., & Richardson, M. (2001). Mining the network value of customers. In
*Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining*(pp. 57–66).Google Scholar - Fawcett, T., & Provost, F. (1997). Adaptive fraud detection.
*Data Mining and Knowledge Discovery*,*1*, 291–316.CrossRefGoogle Scholar - Flach, P., & Lachiche, N. (2004). Naive Bayesian classification for structured data. In
*Machine Learning*,*57*, 233–269.Google Scholar - Gärtner, T. (2003). A survey of kernels for structured data.
*SIGKDD Explorations*,*5*, 49–58.Google Scholar - Gärtner, T., Lloyd, J.W., & Flach, P.A. (2002). Kernels for structured data. In
*Proceedings of the 12th International Conference on Inductive Logic Programming*(pp. 66–83). Springer.Google Scholar - Goldberg, H., & Senator, T. (1995). Restructuring databases for knowledge discovery by consolidation and link formation. In
*Proceedings of the 1st International Conference On Knowledge Discovery and Data Mining*(pp. 136–141). Montreal, Canada: AAAI Press.Google Scholar - Jensen, D., & Getoor, L. (2003). In
*Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003*. American Association for Artificial Intelligence.Google Scholar - Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In
*Proceedings of the 19th International Conference on Machine Learning*(pp. 259–266). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar - Jensen, D., Neville, J., & Gallagher, B. (2004). Why collective inference improves relational classification. In
*Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining*(pp. 593–598). New York, NY, USA: ACM Press.Google Scholar - Jensen, D., Neville, J., & Hay, M. (2003). Avoiding bias when aggregating relational data with degree disparity. In
*Proceedings of the 20th International Conference on Machine Learning*(pp. 274–281).Google Scholar - Kietz, J.-U., & Morik, K. (1994). A polynomial approach to the constructive induction of structural knowledge.
*Machine Learning*,*14*, 193–217.MATHCrossRefGoogle Scholar - Kirsten, M., Wrobel, S., & Horvath, T. (2000). Distance based approaches to relational learning and clustering. In S. Ďzeroski & N.Lavrač (Eds.),
*Relational data mining*, (pp. 213–232). Springer Verlag.Google Scholar - Knobbe, A., Haas, M.D., & Siebes, A. (2001). Propositionalisation and aggregates. In
*Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery*(pp. 277–288).Google Scholar - Koller, D., & Pfeffer, A. (1998). Probabilistic frame-based systems. In
*Proceedings of the 15th National Conference on Artificial Intelligence (AAAI)*(pp. 580–587).Google Scholar - Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Ďzeroski and N. Lavrač (Eds.),
*Relational data mining*(pp. 262–291). Springer-Verlag.Google Scholar - Krogel, M.-A., Rawles, S., Železng, F., Flach, P., Lavrač, N., & Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In
*13th International Conference on Inductive Logic Programming (ILP)*(pp. 197–214).Google Scholar - Krogel, M.-A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In
*Proceedings of the 11th International Conference on Inductive Logic Programming (ILP)*(pp. 142–155).Google Scholar - Krogel, M.-A., & Wrobel, S. (2003). Facets of aggregation approaches to propositionalization. In
*Proceedings of the 13th International Conference on Inductive Logic Programming (ILP)*(pp. 30–39).Google Scholar - Lavrač, N., & Ďzeroski, S. (1994).
*Inductive logic programming: techniques and application*. New York. Ellis HorwoodGoogle Scholar - Libkin, L., & Wong L. (1994). New techniques for studying set languages, bag languages and aggregate functions. In
*Proceedings of the 13th Symposium on Principles of Database Systems*(pp. 155–166).Google Scholar - Macskassy, S., & Provost, F. (2003). A simple relational classifier. In
*Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003*.Google Scholar - Macskassy, S., & Provost, F. (2004).
*Classification in networked Data: A Toolkit and a Univariate Case Study*(Technical Report CeDER-04-08). Stern School of Business, New York University.Google Scholar - McCallum, A., Nigam, K., J. Rennie, & Seymore, K. (2000). Automating the construction of internet portals with machine learning.
*Information Retrival*,*3*, 127–163.CrossRefGoogle Scholar - McCreath, E. (1999).
*Induction in First Order Logic from Noisy Training Examples and Fixed Example Set Size*. Doctoral dissertation, Universtity of New South Wales.Google Scholar - Michalski, R. (1983). A theory and methodology of inductive learning.
*Artificial Intelligence, 20*, 111–161.MathSciNetCrossRefGoogle Scholar - Morik, K. (1999). Tailoring representations to different requirements. In
*Proceedings of the 10th International Conference on Algorithmic Learning Theory (ALT)*(pp. 1–12).Google Scholar - Muggleton, S. (2001). CProgol4.4: A tutorial introduction. In S. Ďzeroski & N.Lavrač (Eds.),
*Relational Data Mining*pp.(105–139). Springer-Verlag.Google Scholar - Muggleton, S., & DeRaedt, L. (1994). Inductive logic programming: Theory and methods.
*The Journal of Logic Programming, 19 & 20*, 629–680.MATHMathSciNetCrossRefGoogle Scholar - Neville, J., & Jensen, D. (2005). Leveraging relational autocorrelation with latent group models. In
*Proceedings of the 5th IEEE International Conference on Data Mining*(pp. 49–55). New York, NY, USA: ACM Press.Google Scholar - Neville, J., Jensen, D., Friedland, L., & Hay, M. (2003a). Learning relational probability trees. In
*Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining*(pp. 625–630). New York, NY, USA: ACM Press.Google Scholar - Neville, J., Jensen, D., & Gallagher, B. (2003b). Simple estimators for relational Bayesian classifers. In
*Proceedings of the 3rd International Conference on Data Mining*(pp. 609–612). Washington, DC, USA: IEEE Computer Society.Google Scholar - Neville, J., Rattigan, M., & Jensen, D. (2003c). Statistical relational learning: Four claims and a survey. In
*Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003*.Google Scholar - Özsoyoǵlu, G., Özsoyoǵlu, Z., & Matos, V. (1987). Extending relational algebra and relational calculus with set-valued atributes and aggregate functions.
*ACM Transactions on Database Systems*,*12*, 566–592.CrossRefGoogle Scholar - Perlich, C. (2005a). Approaching the ILP challenge 2005: Class-conditional Bayesian propositionalization for genetic classification. In
*Late-Braking track at the 15th International Conference on Inductive Logic Programming*(pp. 99–104).Google Scholar - Perlich, C. (2005b).
*Probability estimation in mulit-relational domain*. Doctoral dissertation, Stern School of Business.Google Scholar - Perlich, C., & Provost, F. (2003). Aggregation-based feature invention and relational concept classes. In
*Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining*(pp. 167–176). New York, NY, USA: ACM Press.Google Scholar - Perlich, C., Provost, F., & Simonoff, J. (2003). Tree induction vs. logistic regression: A learning-curve analysis.
*Journal of Machine Learning Research*,*4*, 211–255.MathSciNetCrossRefGoogle Scholar - Pompe, U., & Kononenko, I. (1995). Naive Bayesian classifier with ILP-R. In
*Proceedings of the 5th International Workshop on Inductive Logic Programming*(pp. 417–436).Google Scholar - Popescul, A., & Ungar, L. (2003). Structural logistic regression for link analysis. In
*Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003*.Google Scholar - Popescul, A., & Ungar, L. (2004). Cluster-based concept invention for statistical relational learning. In
*Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining*(pp. 665–670).Google Scholar - Popescul, A., Ungar, L., Lawrence, S., & Pennock, D.M. (2002). Structural logistic regression: Combining relational and statistical learning. In
*Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2003*(pp. 130–141).Google Scholar - Provost, F., Perlich, C., & Macskassy, S. (2003). Relational learning problems and simple models. In
*Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI-2003*(pp. 116–120).Google Scholar - Quinlan, J. (1993).
*C4.5: Programs for machine learning*. Los Altos, California: Morgan Kaufmann Publishers.Google Scholar - Quinlan, J., & Cameron-Jones, R. (1993). FOIL: A midterm report. In
*Proceedings of the 6th European Conference on Machine Learning (ECML)*(pp. 3–20).Google Scholar - Slattery, S., & Mitchell, T. (2000). Discovering test set regularities in relational domains. In
*Proceedings of the 17th International Conference on Machine Learning*(pp. 895–902).Google Scholar - Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In
*Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence*(pp. 485–492). Edmonton, Canada: Morgan Kaufmann.Google Scholar - Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In
*Proceedings of the 17th International Joint Conference on Artificial Intelligence*(pp. 870–878).Google Scholar - Witten, I., & Frank, E. (1999).
*Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations*. Morgan Kaufmann.Google Scholar - Wnek, J., & Michalski, R. (1993). Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments.
*Machine Learning*,*14*, 139–168.Google Scholar - Woznica, A., Kalousis, A., & Hilario, M. (2004). Kernel-based distances for relational learning. In
*Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2004*.Google Scholar - Zheng, Z., Kohavi, R., & Mason, L. (2001). Real World Performance of Association Rule Algorithms. In
*Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining*(pp. 401–406).Google Scholar