Information Retrieval

, Volume 13, Issue 6, pp 618–635 | Cite as

Discriminative graphical models for faculty homepage discovery



Faculty homepage discovery is an important step toward building an academic portal. Although the general homepage finding tasks have been well studied (e.g., TREC-2001 Web Track), faculty homepage discovery has its own special characteristics and not much focused research has been conducted for this task. In this paper, we view faculty homepage discovery as text categorization problems by utilizing Yahoo BOSS API to generate a small list of high-quality candidate homepages. Because the labels of these pages are not independent, standard text categorization methods such as logistic regression, which classify each page separately, are not well suited for this task. By defining homepage dependence graph, we propose a conditional undirected graphical model to make joint predictions by capturing the dependence of the decisions on all the candidate pages. Three cases of dependencies among faculty candidate homepages are considered for constructing the graphical model. Our model utilizes a discriminative approach so that any informative features can be used conveniently. Learning and inference can be done relatively efficiently for the joint prediction model because the homepage dependence graphs resulting from the three cases of dependencies are not densely connected. An extensive set of experiments have been conducted on two testbeds to show the effectiveness of the proposed discriminative graphical model.


Discriminative graphical models Homepage finding Information retrieval 


  1. Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information.Google Scholar
  2. Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2002). TREC 10 Web and interactive tracks at CSIRO. NIST special publication, pp. 151–158.Google Scholar
  3. Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting social networks and contact information from email and the web. In First Conference on Email and Anti-Spam (CEAS).Google Scholar
  4. Davison, B. (2000). Topical locality in the Web. In Proceedings of the 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: New York, NY, pp. 272–279.Google Scholar
  5. Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., et al. (2006). Community information management. IEEE Data Engineering Bulletin, 29(1):64–72.Google Scholar
  6. Hawking, D., & Craswell, N. (2001). Overview of the TREC-2001 web track. NIST special publication, pp. 61–67.Google Scholar
  7. Heckerman, D., Meek, C., & Koller, D. (2007). Probabilistic entity-relationship models, PRMs, and plate Models. Introduction to Statistical Relational Learning.Google Scholar
  8. Jordan, M. (1998). Learning in graphical models. Norwell, MA: Kluwer.MATHGoogle Scholar
  9. Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 27–34). New York, NY: ACM Press.Google Scholar
  10. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, pp. 282–289.Google Scholar
  11. McCallum, A. (2003). Efficiently inducing features of conditional random fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).Google Scholar
  12. Murphy, K., Weiss, Y., & Jordan, M. (1999). Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, Citeseer, pp. 467–475.Google Scholar
  13. Neville, J., & Jensen ,D. (2003). Collective classification with relational dependency networks. In Proceedings of the Second International Workshop on Multi-Relational Data Mining, pp. 77–91.Google Scholar
  14. Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 143–150). New York, NY: ACM.Google Scholar
  15. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar
  16. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47.CrossRefGoogle Scholar
  17. Shakes, J., Langheinrich, M., & Etzioni, O. (1997). Dynamic reference sifting: A case study in the homepage domain. Computer Networks and ISDN Systems, 29(8–13):1193–1204.CrossRefGoogle Scholar
  18. Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Seventh IEEE International Conference on Data Mining, pp. 292–301.Google Scholar
  19. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), pp. 895–902.Google Scholar
  20. Upstill, T. Craswell, N., & Hawking, D. (2003). Query-independent evidence in home page finding. ACM Transactions on Information Systems (TOIS), 21(3):286–313.CrossRefGoogle Scholar
  21. Voorhees, E., & Harman, D. (2001). Overview of TREC 2001. NIST Special Publication, pp. 500–250.Google Scholar
  22. Westerveld, T., Hiemstra, D., & Kraaij, W. (2002). Retrieving web pages using content, links, URLs and anchors. NIST special publication, pp. 663–672.Google Scholar
  23. Xi, W., Fox, E., Tan, R., & Shu, J. (2002). Machine learning approach for homepage finding task. Lecture Notes in Computer Science, pp. 145–159.Google Scholar
  24. Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, pp. 42–49.Google Scholar
  25. Yang, Y., & Pedersen, J. (1997). A comparative study on feature selection in text categorization. In International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, pp. 412–420.Google Scholar
  26. Yedidia, J., Freeman, W., & Weiss, Y. (2001). Generalized nelief propagation. Advances in Neural Information Processing Systems, pp. 689–695.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer SciencePurdue UniversityWest LafayetteUSA

Personalised recommendations