Discriminative graphical models for faculty homepage discovery
Purchase on Springer.com
$39.95 / €34.95 / £29.95*
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.
Faculty homepage discovery is an important step toward building an academic portal. Although the general homepage finding tasks have been well studied (e.g., TREC-2001 Web Track), faculty homepage discovery has its own special characteristics and not much focused research has been conducted for this task. In this paper, we view faculty homepage discovery as text categorization problems by utilizing Yahoo BOSS API to generate a small list of high-quality candidate homepages. Because the labels of these pages are not independent, standard text categorization methods such as logistic regression, which classify each page separately, are not well suited for this task. By defining homepage dependence graph, we propose a conditional undirected graphical model to make joint predictions by capturing the dependence of the decisions on all the candidate pages. Three cases of dependencies among faculty candidate homepages are considered for constructing the graphical model. Our model utilizes a discriminative approach so that any informative features can be used conveniently. Learning and inference can be done relatively efficiently for the joint prediction model because the homepage dependence graphs resulting from the three cases of dependencies are not densely connected. An extensive set of experiments have been conducted on two testbeds to show the effectiveness of the proposed discriminative graphical model.
- Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information.
- Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2002). TREC 10 Web and interactive tracks at CSIRO. NIST special publication, pp. 151–158.
- Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting social networks and contact information from email and the web. In First Conference on Email and Anti-Spam (CEAS).
- Davison, B. (2000). Topical locality in the Web. In Proceedings of the 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: New York, NY, pp. 272–279.
- Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., et al. (2006). Community information management. IEEE Data Engineering Bulletin, 29(1):64–72.
- Hawking, D., & Craswell, N. (2001). Overview of the TREC-2001 web track. NIST special publication, pp. 61–67.
- Heckerman, D., Meek, C., & Koller, D. (2007). Probabilistic entity-relationship models, PRMs, and plate Models. Introduction to Statistical Relational Learning.
- Jordan, M. (1998). Learning in graphical models. Norwell, MA: Kluwer.
- Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 27–34). New York, NY: ACM Press.
- Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, pp. 282–289.
- McCallum, A. (2003). Efficiently inducing features of conditional random fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).
- Murphy, K., Weiss, Y., & Jordan, M. (1999). Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, Citeseer, pp. 467–475.
- Neville, J., & Jensen ,D. (2003). Collective classification with relational dependency networks. In Proceedings of the Second International Workshop on Multi-Relational Data Mining, pp. 77–91.
- Ogilvie, P., & Callan, J. (2003). Combining document representations for known-item search. In Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 143–150). New York, NY: ACM.
- Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers.
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47. CrossRef
- Shakes, J., Langheinrich, M., & Etzioni, O. (1997). Dynamic reference sifting: A case study in the homepage domain. Computer Networks and ISDN Systems, 29(8–13):1193–1204. CrossRef
- Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Seventh IEEE International Conference on Data Mining, pp. 292–301.
- Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), pp. 895–902.
- Upstill, T. Craswell, N., & Hawking, D. (2003). Query-independent evidence in home page finding. ACM Transactions on Information Systems (TOIS), 21(3):286–313. CrossRef
- Voorhees, E., & Harman, D. (2001). Overview of TREC 2001. NIST Special Publication, pp. 500–250.
- Westerveld, T., Hiemstra, D., & Kraaij, W. (2002). Retrieving web pages using content, links, URLs and anchors. NIST special publication, pp. 663–672.
- Xi, W., Fox, E., Tan, R., & Shu, J. (2002). Machine learning approach for homepage finding task. Lecture Notes in Computer Science, pp. 145–159.
- Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, pp. 42–49.
- Yang, Y., & Pedersen, J. (1997). A comparative study on feature selection in text categorization. In International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, pp. 412–420.
- Yedidia, J., Freeman, W., & Weiss, Y. (2001). Generalized nelief propagation. Advances in Neural Information Processing Systems, pp. 689–695.
- Discriminative graphical models for faculty homepage discovery
Volume 13, Issue 6 , pp 618-635
- Cover Date
- Print ISSN
- Online ISSN
- Springer Netherlands
- Additional Links
- Discriminative graphical models
- Homepage finding
- Information retrieval
- Industry Sectors