Abstract
Probabilistic Graphical Models (PGM) are a well-established approach for modelling uncertain knowledge and reasoning. Since we focus on inference, this paper explores Probabilistic Inference Networks (PIN’s) which are a special case of PGM. PIN’s, commonly referred as Bayesian Networks, are used in Information Retrieval to model tasks such as classification and ad-hoc retrieval.
Intuitively, a probabilistic logical framework such as Probabilistic Datalog (PDatalog) should provide the expressiveness required to model PIN’s. However, this modelling turned out to be more challenging than expected, requiring to extend the expressiveness of PDatalog. Also, for IR and when modelling more general tasks, it turned out that 1st generation PDatalog has expressiveness and scalability bottlenecks. Therefore, this paper makes a case for 2nd generation PDatalog which supports the modelling of PIN’s. In addition, the paper reports the implementation of a particular PIN application: Bayesian Classifiers to investigate and demonstrate the feasibility of the proposed approach.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bekkerman, R., et al.: Automatic categorization of email into folders: Benchmark experiments on enron and sri corpora. Tech. Rep. Center for Intelligent Information Retrieval (2004)
Forst, J.F., Tombros, A., Roelleke, T.: Polis: A probabilistic logic for document summarisation. In: Studies in Theory of Information Retrieval, pp. 201–212 (2007)
Frommholz, I.: Annotation-based document retrieval with probabilistic logics. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 321–332. Springer, Heidelberg (2007)
Fuhr, N.: Probabilistic datalog - a logic for powerful retrieval methods. In: ACM SIGIR, pp. 282–290 (1995)
Fuhr, N.: Optimum database selection in networked ir. In: NIR 1996, SIGIR (1996)
Kheirbeck, A., Chiaramella, Y.: Integrating hypermedia and information retrieval with conceptual graphs formalism. In: Hypertext - Information Retrieval - Multimedia, Synergieeffekte elektronischer Informationssysteme, pp. 47–60 (1995)
Klimt, B., Yang, Y.: The Enron corpus: A new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, p. 41 (1998)
Meghini, C., Sebastiani, F., Straccia, U., Thanos, C.: A model of information retrieval based on a terminological logic. In: ACM SIGIR, pp. 298–308 (1993)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufman, San Mateo (1988)
Polleres, A.: From SPARQL to rules (and back). In: 16th international conference on World Wide Web (WWW), pp. 787–796. ACM, New York (2007)
Roelleke, T., Fuhr, N.: Information retrieval with probabilistic datalog. In: Uncertainty and Logics - Advanced Models for the Representation and Retrieval of Information (1998)
Roelleke, T., Wu, H., Wang, J., Azzam, H.: Modelling retrieval models in a probabilistic relational algebra with a new operator: The relational Bayes. VLDB Journal (2009)
Schenk, S.: A SPARQL semantics based on Datalog. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 160–174. Springer, Heidelberg (2007)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Turtle, H., Croft, W.: Efficient probabilistic inference for text retrieval. In: Proceedings RIAO 1991, pp. 644–661 (1991)
Turtle, H., Croft, W.B.: Inference networks for document retrieval. In: ACM SIGIR, New York, pp. 1–24 (1990)
van Rijsbergen, C.J.: Towards an information logic. In: ACM SIGIR, pp. 77–86 (1989)
Wong, S., Yao, Y.: On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems 13(1), 38–68 (1995)
Wu, H., Kazai, G., Roelleke, T.: Modelling anchor text retrieval in book search based on back-of-book index. In: SIGIR Workshop on Focused Retrieval, pp. 51–58 (2008)
Yang, Y.: A study on thresholding strategies for text categorization. In: ACM SIGIR, pp. 137–145 (2001) (press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martinez-Alvarez, M., Roelleke, T. (2010). Modelling Probabilistic Inference Networks and Classification in Probabilistic Datalog. In: Deshpande, A., Hunter, A. (eds) Scalable Uncertainty Management. SUM 2010. Lecture Notes in Computer Science(), vol 6379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15951-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-15951-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15950-3
Online ISBN: 978-3-642-15951-0
eBook Packages: Computer ScienceComputer Science (R0)