Abstract
Today it is quite common for web page content to include an advertisement. Since advertisers often want to target their message to people with certain demographic attributes, the anonymity of Internet users poses a special problem for them. The purpose of the present research is to find an effective way to infer demographic information (e.g. gender, age or income) about people who use the Internet but for whom demographic information is not otherwise available. Our hope is to build a high quality database of demographic profiles covering a large segment of the Internet population without having to survey each individual Internet user. Though Internet users are largely anonymous, they nonetheless provide a certain amount of usage information. Usage information includes, but is not limited to, (a) search terms entered by the Internet user and (b) web pages accessed by the Internet user. In this paper, we describe an application of the Latent Semantic Analysis (LSA) [1] information retrieval technique to construct a vector space in which we can represent the usage data associated with each Internet user of interest. Subsequently, we show how the LSA vector space enables us to produce demographic inferences by supplying the input to a three layer neural model trained using the scaled conjugate gradient (SCG) method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. Indexing by latent semantic analysis. Journal of the American Society For Information Science, 41(6), 1990.
Landauer, T. K., & Dumais, S. T., How come you know so much? From practical problem to theory. In D. Hermann, C. McEvoy, M. Johnson, & P. Hertel (Eds.), Basic and applied memory: Memory in context. Mahwah, NJ: Erlbaum, 105–126, 1996.
Landauer, T. K., & Dumais, S. T., A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240, 1997.
M. W. Berry et al., SVDPACKC: Version 1.0 User’s Guide, Tech. Rep. CS-93-194, University of Tennessee, Knoxville,TN, October 1993.
N. Belkin and W. Croft. Retrieval techniques. In M. Williams, editor, Annual Review of Information Science and Technology (ARIST), volume 22, chapter 4, pages 109–145. Elsevier Science Publishers B.V., 1987.
G. Golub and C. Van Loan. Matrix Computations. Johns-Hopkins, Baltimore, Maryland, second edition, 1989.
Salton, G. (ed), The SMART Retrieval System — Experiments in Automatic Document Processing, Englewood Cliffs, New Jersey: Prentice-Hall, 1971.
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 2nd edition, 1992.
A. Zell et al., Stuttgart Neural Network Simulator: User Manual Version 4.1, University of Stuttgart, 1995.
Dumais, S. T. (1995), Using LSI for information filtering: TREC-3 experiments. In: D. Harman (Ed.), The Third Text REtrieval Conference (TREC3) National Institute of Standards and Technology Special Publication, in press 1995.
Dumais, S.T., Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers, 23(2), 229–236, 1991.
Dumais, S. T., Furnas, G. W., Landauer, T. K. and Deerwester, S., Using latent semantic analysis to improve information retrieval. In Proceedings of CHI’88: Conference on Human Factors in Computing, New York: ACM, 281–285, 1988.
Dumais, S.T., “Latent Semantic Indexing (LSI) and TREC-2.” In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105–116, 1994.
Dumais, S. T., “LSI meets TREC: A status report.” In: D. Harman (Ed.), The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology Special Publication500-207, pp. 137–152, 1993.
Kaski, S., Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proceedings of IJCNN’98, International Joint Conference on Neural Networks, volume 1, pages 413–418. IEEE Service Center, Piscataway, NJ., 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Murray, D., Durrell, K. (2000). Inferring Demographic Attributes of Anonymous Internet Users. In: Masand, B., Spiliopoulou, M. (eds) Web Usage Analysis and User Profiling. WebKDD 1999. Lecture Notes in Computer Science(), vol 1836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44934-5_1
Download citation
DOI: https://doi.org/10.1007/3-540-44934-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67818-2
Online ISBN: 978-3-540-44934-8
eBook Packages: Springer Book Archive