Inferring Demographic Attributes of Anonymous Internet Users

  • Dan Murray
  • Kevan Durrell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1836)


Today it is quite common for web page content to include an advertisement. Since advertisers often want to target their message to people with certain demographic attributes, the anonymity of Internet users poses a special problem for them. The purpose of the present research is to find an effective way to infer demographic information (e.g. gender, age or income) about people who use the Internet but for whom demographic information is not otherwise available. Our hope is to build a high quality database of demographic profiles covering a large segment of the Internet population without having to survey each individual Internet user. Though Internet users are largely anonymous, they nonetheless provide a certain amount of usage information. Usage information includes, but is not limited to, (a) search terms entered by the Internet user and (b) web pages accessed by the Internet user. In this paper, we describe an application of the Latent Semantic Analysis (LSA) [1] information retrieval technique to construct a vector space in which we can represent the usage data associated with each Internet user of interest. Subsequently, we show how the LSA vector space enables us to produce demographic inferences by supplying the input to a three layer neural model trained using the scaled conjugate gradient (SCG) method.


Internet User Latent Semantic Analysis Latent Semantic Indexing Demographic Attribute Document Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. Indexing by latent semantic analysis. Journal of the American Society For Information Science, 41(6), 1990.Google Scholar
  2. 2.
    Landauer, T. K., & Dumais, S. T., How come you know so much? From practical problem to theory. In D. Hermann, C. McEvoy, M. Johnson, & P. Hertel (Eds.), Basic and applied memory: Memory in context. Mahwah, NJ: Erlbaum, 105–126, 1996.Google Scholar
  3. 3.
    Landauer, T. K., & Dumais, S. T., A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240, 1997.CrossRefGoogle Scholar
  4. 4.
    M. W. Berry et al., SVDPACKC: Version 1.0 User’s Guide, Tech. Rep. CS-93-194, University of Tennessee, Knoxville,TN, October 1993.Google Scholar
  5. 5.
    N. Belkin and W. Croft. Retrieval techniques. In M. Williams, editor, Annual Review of Information Science and Technology (ARIST), volume 22, chapter 4, pages 109–145. Elsevier Science Publishers B.V., 1987.Google Scholar
  6. 6.
    G. Golub and C. Van Loan. Matrix Computations. Johns-Hopkins, Baltimore, Maryland, second edition, 1989.zbMATHGoogle Scholar
  7. 7.
    Salton, G. (ed), The SMART Retrieval System — Experiments in Automatic Document Processing, Englewood Cliffs, New Jersey: Prentice-Hall, 1971.Google Scholar
  8. 8.
    William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 2nd edition, 1992.Google Scholar
  9. 9.
    A. Zell et al., Stuttgart Neural Network Simulator: User Manual Version 4.1, University of Stuttgart, 1995.Google Scholar
  10. 10.
    Dumais, S. T. (1995), Using LSI for information filtering: TREC-3 experiments. In: D. Harman (Ed.), The Third Text REtrieval Conference (TREC3) National Institute of Standards and Technology Special Publication, in press 1995.Google Scholar
  11. 11.
    Dumais, S.T., Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers, 23(2), 229–236, 1991.Google Scholar
  12. 12.
    Dumais, S. T., Furnas, G. W., Landauer, T. K. and Deerwester, S., Using latent semantic analysis to improve information retrieval. In Proceedings of CHI’88: Conference on Human Factors in Computing, New York: ACM, 281–285, 1988.Google Scholar
  13. 13.
    Dumais, S.T., “Latent Semantic Indexing (LSI) and TREC-2.” In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105–116, 1994.Google Scholar
  14. 14.
    Dumais, S. T., “LSI meets TREC: A status report.” In: D. Harman (Ed.), The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology Special Publication500-207, pp. 137–152, 1993.Google Scholar
  15. 15.
    Kaski, S., Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proceedings of IJCNN’98, International Joint Conference on Neural Networks, volume 1, pages 413–418. IEEE Service Center, Piscataway, NJ., 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Dan Murray
    • 1
  • Kevan Durrell
    • 1
  1. 1.SourceWorks ConsultingHull, QuebecCanada

Personalised recommendations