Privacy Preserving Link Analysis on Dynamic Weighted Graph

Article

Abstract

Link analysis algorithms have been used successfully on hyperlinked data to identify authoritative documents and retrieve other information. They also showed great potential in many new areas such as counterterrorism and surveillance. Emergence of new applications and changes in existing ones created new opportunities, as well as difficulties, for them: (1) In many situations where link analysis is applicable, there may not be an explicit hyperlinked structure. (2) The system can be highly dynamic, resulting in constant update to the graph. It is often too expensive to rerun the algorithm for each update. (3) The application often relies heavily on client-side logging and the information encoded in the graph can be very personal and sensitive. In this case privacy becomes a major concern. Existing link analysis algorithms, and their traditional implementations, are not adequate in face of these new challenges. In this paper we propose the use of a weighted graph to define and/or augment a link structure. We present a generalized HITS algorithm that is suitable for running in a dynamic environment. The algorithm uses the idea of “lazy update” to amortize cost across multiple updates while still providing accurate ranking to users in the mean time. We prove the convergence of the new algorithm and evaluate its benefit using the Enron email dataset. Finally we devise a distributed implementation of the algorithm that preserves user privacy thus making it socially acceptable in real-world applications.

Keywords

link analysis data mining text analysis privacy HITS graph algorithms lazy update 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S. and L. Page (1998), “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, in 7th World Wide Web Conference, Brisbane, Australia.Google Scholar
  2. Canny, J. (2002), “Collaborative Filtering with Privacy”, in IEEE Symposium on Security and Privacy, Oakland, CA, U.S.A, pp. 45–57Google Scholar
  3. Canny, J. and S. Sorkin (2004), “Practical Large-Scale Distributed Key Generation”, Eurocrypt 2004.Google Scholar
  4. Carriere, J. and R. Kazman (1997), “WebQuery: Searching and Visualizing the Web through Connectivity”, in Proceedings of the International WWW Conference.Google Scholar
  5. Chakrabarti, S., B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan (1998), “Automatic Resource List Compilation by Analyzing Hyperlink Structure and Associated Text”, in Proceedings of the 7th International World Wide Web Conference.Google Scholar
  6. Cohen, W.W. (2005), Enron Email Dataset, http://www-2.cs.cmu.edu/~enron/.
  7. Corrada-Emmanuel, A. (2005), Enron Email Dataset Research, http://ciir.cs.umass.edu/~corrada/enron/.
  8. Fouque, P. and J. Stern (2001), “One Round Threshold Discrete-Log Key Generation without Private Channels”, Public Key Cryptography, pp. 300–316.Google Scholar
  9. Gennaro, R., S. Jarecki, H. Krawczyk, and T. Rabin (1999), “Secure Distributed Key Generation for Discrete-Log Based Cryptosystems”, Lecture Notes in Computer Science, 1592, 295–310.Google Scholar
  10. Golub, G.H. and C.F. Van Loan (1989), Matrix Computations. Johns Hopkins University Press.Google Scholar
  11. Kautz, H., B. Selman, and A. Milewski (1996), “Agent Amplified Communication”, AAAI-96, Portland, Oregon, MIT Press, Cambridge, MA, 3–9.Google Scholar
  12. Kautz, H., B. Selman, and M. Shah (1997), “Combining Social Networks and Collaborative Filtering”, Communications of ACM, 40(3), 63–65.CrossRefGoogle Scholar
  13. Kleinberg, J.M. (1999), “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM, 46(5), 604–632.CrossRefGoogle Scholar
  14. MacDonald, D.W. and M.S. Ackerman (1998), “Just Talk to Me: A Field Study of Expertise Location”, in ACM CSCW-98, pp. 315—324.Google Scholar
  15. Newell, A. and P.S. Rosenbloom (1981), “Mechanisms of Skill Acquisition and the Law of Practice”, in J.R. Anderson (Ed.), Cognitive Skills and their Acquisition, Hillsdale, NJ: Earlbaum, pp. 1–55.Google Scholar
  16. Ng, A.Y., A.X. Zheng, and M.I. Jordan (2001a), “Link Analysis, Eigenvectors and Stability”, in Proceedings of the 17th International Joint Conference on Artificial Intelligence, Seattle, Washington, United States, pp. 903–910.Google Scholar
  17. Ng, A.Y., A.X. Zheng, and M.I. Jordan (2001b), “Stable Algorithms for Link Analysis”, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, United States, pp. 258–266.Google Scholar
  18. Pedersen, T. (1991), “A Threshold Cryptosystem without a Trusted Party”, in Proceedings of EUROCRYPT ’91, Springer-Verlag LNCS, vol. 547, pp. 522–526.Google Scholar
  19. Pirolli, P., J. Pitkow, and R. Rao (1996), “Silk from a Sow's Ear: Extracting Usable Structures from the Web”, in Proceedings of ACM Conference on Human Factors in Computing Systems, ACM Press.Google Scholar
  20. Polak, E. (1971), Computational Methods in Optimization. Academic Press.Google Scholar
  21. Schwartz, M.F. and D.C.M. Wood (1993), “Discovering Shared Interests Using Graph Analysis”, Communications of ACM, 36(8), 78–89.CrossRefGoogle Scholar
  22. Salton, G. (1989), Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley.Google Scholar
  23. Stewart, G.W. and J. Sun (1990), Matrix Perturbation Theory. Academic Press.Google Scholar
  24. Strang, G. (1980), Linear Algebra and Its Applications, 2nd edition. Academic Press.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Yitao Duan
    • 1
  • Jingtao Wang
    • 1
  • Matthew Kam
    • 1
  • John Canny
    • 1
  1. 1.Computer Science DivisionUniversity of California at BerkeleyBerkeley

Personalised recommendations