Discriminative graphical models for faculty homepage discovery
- 132 Downloads
- 3 Citations
Abstract
Faculty homepage discovery is an important step toward building an academic portal. Although the general homepage finding tasks have been well studied (e.g., TREC-2001 Web Track), faculty homepage discovery has its own special characteristics and not much focused research has been conducted for this task. In this paper, we view faculty homepage discovery as text categorization problems by utilizing Yahoo BOSS API to generate a small list of high-quality candidate homepages. Because the labels of these pages are not independent, standard text categorization methods such as logistic regression, which classify each page separately, are not well suited for this task. By defining homepage dependence graph, we propose a conditional undirected graphical model to make joint predictions by capturing the dependence of the decisions on all the candidate pages. Three cases of dependencies among faculty candidate homepages are considered for constructing the graphical model. Our model utilizes a discriminative approach so that any informative features can be used conveniently. Learning and inference can be done relatively efficiently for the joint prediction model because the homepage dependence graphs resulting from the three cases of dependencies are not densely connected. An extensive set of experiments have been conducted on two testbeds to show the effectiveness of the proposed discriminative graphical model.
Keywords
Discriminative graphical models Homepage finding Information retrievalReferences
- Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information.Google Scholar
- Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2002). TREC 10 Web and interactive tracks at CSIRO. NIST special publication, pp. 151–158.Google Scholar
- Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting social networks and contact information from email and the web. In First Conference on Email and Anti-Spam (CEAS).Google Scholar
- Davison, B. (2000). Topical locality in the Web. In Proceedings of the 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: New York, NY, pp. 272–279.Google Scholar
- Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., et al. (2006). Community information management. IEEE Data Engineering Bulletin, 29(1):64–72.Google Scholar
- Hawking, D., & Craswell, N. (2001). Overview of the TREC-2001 web track. NIST special publication, pp. 61–67.Google Scholar
- Heckerman, D., Meek, C., & Koller, D. (2007). Probabilistic entity-relationship models, PRMs, and plate Models. Introduction to Statistical Relational Learning.Google Scholar
- Jordan, M. (1998). Learning in graphical models. Norwell, MA: Kluwer.MATHGoogle Scholar
- Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 27–34). New York, NY: ACM Press.Google Scholar
- Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, pp. 282–289.Google Scholar
- McCallum, A. (2003). Efficiently inducing features of conditional random fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).Google Scholar
- Murphy, K., Weiss, Y., & Jordan, M. (1999). Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, Citeseer, pp. 467–475.Google Scholar
- Neville, J., & Jensen ,D. (2003). Collective classification with relational dependency networks. In Proceedings of the Second International Workshop on Multi-Relational Data Mining, pp. 77–91.Google Scholar
- Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 143–150). New York, NY: ACM.Google Scholar
- Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47.CrossRefGoogle Scholar
- Shakes, J., Langheinrich, M., & Etzioni, O. (1997). Dynamic reference sifting: A case study in the homepage domain. Computer Networks and ISDN Systems, 29(8–13):1193–1204.CrossRefGoogle Scholar
- Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Seventh IEEE International Conference on Data Mining, pp. 292–301.Google Scholar
- Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), pp. 895–902.Google Scholar
- Upstill, T. Craswell, N., & Hawking, D. (2003). Query-independent evidence in home page finding. ACM Transactions on Information Systems (TOIS), 21(3):286–313.CrossRefGoogle Scholar
- Voorhees, E., & Harman, D. (2001). Overview of TREC 2001. NIST Special Publication, pp. 500–250.Google Scholar
- Westerveld, T., Hiemstra, D., & Kraaij, W. (2002). Retrieving web pages using content, links, URLs and anchors. NIST special publication, pp. 663–672.Google Scholar
- Xi, W., Fox, E., Tan, R., & Shu, J. (2002). Machine learning approach for homepage finding task. Lecture Notes in Computer Science, pp. 145–159.Google Scholar
- Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, pp. 42–49.Google Scholar
- Yang, Y., & Pedersen, J. (1997). A comparative study on feature selection in text categorization. In International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, pp. 412–420.Google Scholar
- Yedidia, J., Freeman, W., & Weiss, Y. (2001). Generalized nelief propagation. Advances in Neural Information Processing Systems, pp. 689–695.Google Scholar