Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources

  • Yingjie Li
  • Jeff Heflin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6496)


In order to effectively and quickly answer queries in environments with distributed RDF/OWL, we present a query optimization algorithm to identify the potentially relevant Semantic Web data sources using structural query features and a term index. This algorithm is based on the observation that the join selectivity of a pair of query triple patterns is often higher than the overall selectivity of these two patterns treated independently. Given a rule goal tree that expresses the reformulation of a conjunctive query, our algorithm uses a bottom-up approach to estimate the selectivity of each node. It then prioritizes loading of selective nodes and uses the information from these sources to further constrain other nodes. Finally, we use an OWL reasoner to answer queries over the selected sources and their corresponding ontologies. We have evaluated our system using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.


information integration query optimization query reformulation source selectivity 


  1. 1.
    Haase, P., Wang, Y.: A decentralized infrastructure for query answering over distributed ontologies. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 1351–1356. ACM, New York (2007)Google Scholar
  2. 2.
    Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The Piazza peer data management system. IEEE Trans. Knowl. Data Eng. 16(7), 787–798 (2004)CrossRefGoogle Scholar
  3. 3.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A federated repository for querying graph structured data from the web. In: The Semantic Web, pp. 211–224 (2008)Google Scholar
  4. 4.
    Li, Y., Heflin, J.: Query optimization for ontology-based information integration. In: CIKM 2010. ACM, New York (2010)Google Scholar
  5. 5.
    Li, Y., Qasem, A., Heflin, J.: A scalable indexing mechanism for ontology-based information integration. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (2010)Google Scholar
  6. 6.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 627–640. ACM, New York (2009)Google Scholar
  7. 7.
    Qasem, A., Dimitrov, D.A., Heflin, J.: Efficient selection and integration of data sources for answering semantic web queries. In: International Conference on Semantic Computing, pp. 245–252 (2008)Google Scholar
  8. 8.
    Ramakrishnan, R., Ullman, J.D.: A survey of research on deductive database systems. Journal of Logic Programming 23, 125–149 (1993)CrossRefGoogle Scholar
  9. 9.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD 1979, pp. 23–34. ACM, New York (1979)CrossRefGoogle Scholar
  10. 10.
    Serafini, L., Tamilin, A.: Drago: Distributed reasoning architecture for the semantic web. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 361–376. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for rdf data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)Google Scholar
  12. 12.
    Staudt, M., Soiron, R., Quix, C., Jarke, M.: Query optimization for repository-based applications. In: Proceedings of the 1999 ACM Symposium on Applied Computing, SAC 1999, pp. 197–203. ACM, New York (1999)Google Scholar
  13. 13.
    Stuckenschmidt, H., Vdovjak, R., Broekstra, J., Houben, G.: Towards distributed processing of RDF path queries. Int. J. Web Eng. Technol. 2(2/3), 207–230 (2005)CrossRefGoogle Scholar
  14. 14.
    Tran, T., Wang, H., Haase, P.: Hermes: Data web search on a pay-as-you-go integration infrastructure. Web Semantics 7(3), 189–203 (2009)CrossRefGoogle Scholar
  15. 15.
    Udrea, O., Pugliese, A., Subrahmanian, V.S.: Grin: A graph based RDF index. In: AAAI, pp. 1465–1470 (2007)Google Scholar
  16. 16.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yingjie Li
    • 1
  • Jeff Heflin
    • 1
  1. 1.Department of Computer Science and EngineeringLehigh UniversityBethlehemU.S.A.

Personalised recommendations