Computationally efficient scoring of activity using demographics and connectivity of entities

Dubrawski, Artur W.; Ostlund, John K.; Chen, Lujie; Moore, Andrew W.

doi:10.1007/s10799-010-0069-y

Computationally efficient scoring of activity using demographics and connectivity of entities

Published: 30 April 2010

Volume 11, pages 77–89, (2010)
Cite this article

Information Technology and Management Aims and scope Submit manuscript

Artur W. Dubrawski¹,
John K. Ostlund¹,
Lujie Chen¹ &
…
Andrew W. Moore¹

142 Accesses
Explore all metrics

Abstract

Consider a collection of entities, where each may have some demographic properties, and where the entities may be linked in some kind of, perhaps social, network structure. Some of these entities are “of interest”—we call them active. What is the relative likelihood of each of the other entities being active? AFDL, Activity from Demographics and Links, is an algorithm designed to answer this question in a computationally-efficient manner. AFDL is able to work with demographic data, link data (including noisy links), or both; and it is able to process very large datasets quickly. This paper describes AFDL’s feature extraction and classification algorithms, gives timing and accuracy results obtained for several datasets, and offers suggestions for its use in real-world situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is a social pattern? Rethinking a central social science term

Article Open access 13 October 2021

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

AFDL and NetKit have been run on an AMD Opteron 242 dual CPU, 1,600 MHz, 8 GB RAM machine under CentOS 4 ×86_64, except for NetKit IMDB runs which were executed on a faster machine with more memory: AMD Opteron 844 quad CPU, 1,800 MHz, 32 with GB of RAM. We obtained NetKit from http://www.research.rutgers.edu/~sofmac/NetKit.html and ran it without modifications using default parameter settings for this setup: local classifier = null, relational classifier = wvRN [9], collective inference = relaxation labeling [14].

References

Getoor L, Diehl CP (2005) Link mining: a survey, SIGKDD explorations. 7(2):3–12
Domingos P (2003) Prospects and challenges for multi-relational data mining, SIGKDD explorations. 5(1):80–83
Fawcett T, Provost F (2003) Adaptive fraud detection. Data Min Knowl Disc 3:291–316
Google Scholar
Cortes C, Pregibon D, Volinsky C (2004) Communities of interest. In: Proceedings of intelligent data analysis (IDA)
Neville J, Simsek O, Jensen D, Komoroske J, Palmer K, Goldberg H (2005) Using relational knowledge discovery to prevent securities fraud. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD-05)
Kubica J, Moore A, Cohn D, Schneider J (2003) A fast graph-based method for link analysis and queries. In: Proceedings of the 2003 IJCAI text-mining & link-analysis workshop
Kubica J, Moore A, Schneider J, Yang Y (2002) Stochastic link and group detection, eighteenth national conference on artificial intelligence
Sofus A (2006) Macskassy and foster provost. A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring. Workshop on statistical network learning at 23rd international conference on machine learning ICML 2006, Pittsburgh, PA, USA, June 2006
Sofus A (2003) Macskassy and foster provost. A simple relational classifier. In: Proceedings of the multi-relational data mining workshop (MRDM) at the ninth ACM SIGKDD international conference on knowledge discovery and data mining
Sofus A (2005) Macskassy and foster provost. Suspicion scoring based on guilt-by-association, collective inference, and focused data access. International conference on intelligence analysis
Macskassy SA, Provost F (2006) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res (forthcoming)
Komarek P (2004) Logistic regression for data mining and high-dimensional classification, Ph.D Thesis, Carnegie Mellon University
Dubrawski A (1997) Stochastic validation for automated tuning of neural network’s hyper-parameters. J Rob Auton Syst 21(1):89–93 Elsevier Science Publishers
Google Scholar
Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: ACM SIGMOD international conference on management of data
Box GEP, Draper NR (1987) Empirical model building and response surfaces. Wiley
Moore A, Schneider J (1995) Memory based stochastic optimization. In: Advances in neural information processing systems (NIPS 8)

Download references

Author information

Authors and Affiliations

The Auton Lab, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Artur W. Dubrawski, John K. Ostlund, Lujie Chen & Andrew W. Moore

Authors

Artur W. Dubrawski
View author publications
You can also search for this author in PubMed Google Scholar
John K. Ostlund
View author publications
You can also search for this author in PubMed Google Scholar
Lujie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artur W. Dubrawski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dubrawski, A.W., Ostlund, J.K., Chen, L. et al. Computationally efficient scoring of activity using demographics and connectivity of entities. Inf Technol Manag 11, 77–89 (2010). https://doi.org/10.1007/s10799-010-0069-y

Download citation

Published: 30 April 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10799-010-0069-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computationally efficient scoring of activity using demographics and connectivity of entities

Abstract

Access this article

Similar content being viewed by others

What is a social pattern? Rethinking a central social science term

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computationally efficient scoring of activity using demographics and connectivity of entities

Abstract

Access this article

Similar content being viewed by others

What is a social pattern? Rethinking a central social science term

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation