World Wide Web

, Volume 18, Issue 4, pp 889–912 | Cite as

Structured content-based query answers for improving information quality

  • Loan T. H. VoEmail author
  • Jinli Cao
  • Wenny Rahayu


Extensible markup language (XML) has been widely adopted as a standard to exchange and integrate data over multiple sources. This allows users to explore large datasets through a declarative query interface, such as XQuery and XPath. However, the results of queries posted to such heterogeneous data sources are often inconsistent due to the anomalies arising from structural and semantic inconsistencies. This significantly affects the ability of the system to provide accurate query answers. Most of the prior work on finding consistent query answers (CQAs) lacks the full extensibility to find the CQAs relating to the requirements of data constraints holding conditionally on XML data with inconsistent structures. This paper proposes an approach, called SC2QA, which utilizes XML conditional functional dependency (XCSD) to compute consistent answers for queries posted to arbitrary XML data to improve information quality. An XCSD is a structured and content-based functional dependency holding conditionally on certain objects with diverse structures. The query answer is calculated by qualifying queries with appropriate information derived from the interaction between the query and the XCSDs. Experiments have been conducted on synthetic datasets to demonstrate the effectiveness of SC2QA.


Data quality Inconsistent data Consistent query answers Data repairs Constraints 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: algorithms and complexity, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, pp. 31–41. (2009)Google Scholar
  2. 2.
    Arenas, M.: Normalization theory for XML. SIGMOD Rec. 35(4), 57–64 (2006)CrossRefGoogle Scholar
  3. 3.
    Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases, PODS ‘99, Philadelphia, Pennsylvania, USA, ACM, pp. 68–79. (1999)Google Scholar
  4. 4.
    Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. Theory Pract Log Program 3(4), 393–424 (2003)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296(3), 405–434 (2003)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Arenas, M., Bertossi, L.: On the Decidability of Consistent Query Answering, In proc. Alberto Mendelzon Int. Workshop on Foundations of Data Management, (2010)Google Scholar
  7. 7.
    Bertossi, L.: Consistent query answering in databases. SIGMOD Rec. 35(2), 68–76 (2006)CrossRefGoogle Scholar
  8. 8.
    Bertossi, L.: Database repairing and consistent query answering. In: Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)Google Scholar
  9. 9.
    Buttler, D.: A Short Survey of Document Structure Similarity Algorithms, Proceedings of the 5th International Conference on Internet Computing, USA, pp. 3–9. (2004)Google Scholar
  10. 10.
    Cate, B.T., Fontaine, G., Kolaitis, P.G.: On the data complexity of consistent query answering. Proceedings of the 15th International Conference on Database Theory, Berlin, Germany, ACM, pp. 22–33. (2012)Google Scholar
  11. 11.
    Ceravolo, P., Liu, C., Jarrar, M., Sattler, K.-U.: Special issue on querying the data web. World Wide Web 14(5–6), 461–463 (2011)CrossRefGoogle Scholar
  12. 12.
    Chomicki, J.: Consistent Query Answering: Five Easy Pieces 11th International Conference on Database theory, Springer LNCS, 1–17. (2007)Google Scholar
  13. 13.
    Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs, CIKM ‘04 Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM Press, pp. 417–426. (2004)Google Scholar
  14. 14.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving Data Quality: Consistency and Accuracy, VLDB‘07, Vienna, Austria, VLDB Endowment, pp. 315–326. (2007)Google Scholar
  15. 15.
    Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints, Proceedings of the 9th International Conference on Database Theory, Springer-Verlag, pp. 225–241. (2002)Google Scholar
  16. 16.
    Deutsch, A., Popa, L., Tannen, V.: Query reformulation with constraints. SIGMOD Rec. 35(1), 65–73 (2006)CrossRefGoogle Scholar
  17. 17.
    Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 1–48 (2008)CrossRefGoogle Scholar
  18. 18.
    Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing, SIGMOD ‘11, Athens, Greece, ACM pp. 469–480. (2011)Google Scholar
  19. 19.
    Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairs and consistent answers for XML data with functional dependencies. In: Database and XML Technologies, pp. 238–253. Springer, Berlin (2003)CrossRefGoogle Scholar
  20. 20.
    Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: WISE 2005, pp. 175–188. Springer, Berlin (2005)Google Scholar
  21. 21.
    Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairing Inconsistent XML Data with Functional Dependencies. In: Encyclopedia of Database Technologies and Applications, Idea Group, 542–547. (2005)Google Scholar
  22. 22.
    Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 1–50 (2010)CrossRefGoogle Scholar
  23. 23.
    Ghodke, S., Bird, S., Zhang, R.: A Breadth-First Representation for Tree Matching in Large Scale Forest-Based Translation, 5th International Joint Conference on Natural Language Processing Chiang Mai, Thailand, IJCNLP2011 pp. 785–793. (2011)Google Scholar
  24. 24.
    Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tackling inconsistencies in data integration through source preferences Workshop on Information Quality in Information Systems - QDB, Paris, pp. 27–34. (2004)Google Scholar
  25. 25.
    Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations, ICDT ‘09 Proceedings of the 12th International Conference on Database Theory St. Petersburg, Russia, ACM, pp. 53–62. (2009)Google Scholar
  26. 26.
    Kolahi, S., Lakshmanan, L.V.S.: Exploiting conflict structures in inconsistent databases, ADBIS‘10 Proceedings of the 14th East European Conference on Advances in Databases and Information Systems, Novi Sad, Serbia, Springer-Verlag, pp. 320–335. (2010)Google Scholar
  27. 27.
    Lee, K.-H., Whang, K.-Y., Han, W.-S.: XMin: minimizing tree pattern queries with minimality guarantee. World Wide Web 13(3), 343–371 (2010)CrossRefGoogle Scholar
  28. 28.
    Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources, Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 241–250. (2001)Google Scholar
  29. 29.
    Ng, W.: Repairing Inconsistent Merged XML Data, Database and Expert Systems Applications. (2003).Google Scholar
  30. 30.
    Puhlmann, S., Naumann, F., Eis, M.: The Dirty XML Generator. (2004)Google Scholar
  31. 31.
    Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents, Proceedings of the 17th International Conference on Database and Expert Systems Applications, DEXA‘06, pp. 512–516. (2006)Google Scholar
  32. 32.
    Staworko, S., Chomicki, J.: Validity-Sensitive Querying of XML Databases, EDBT Workshops, pp. 164–177. (2006)Google Scholar
  33. 33.
    Tagarelli, A.: Exploring dictionary-based semantic relatedness in labeled tree data. Inf. Sci. 220(20), 244–268 (2013)CrossRefGoogle Scholar
  34. 34.
    Tan, Z., Zhang, L.: Repairing XML functional dependency violations. Inf. Sci. 181(23), 5304–5320 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  35. 35.
    Tan, Z., Wang, W., Shi, B.: Extending Tree Automata to Obtain Consistent Query Answer from Inconsistent XML Document Proceedings of the First International Multi-Symposium on Computer and Computational Sciences (IMSCCS‘06), pp. 488–495. (2006)Google Scholar
  36. 36.
    Tan, Z., Zhang, Z., Wang, W., Shi, B.: Computing repairs for inconsistent XML document using chase. In: Anvances in Data and Web Management, pp. 293–304. Springer, Berlin (2007)CrossRefGoogle Scholar
  37. 37.
    Tan, Z., Liu, C., Wang, W., Shi, B.: Consistent query answers from virtually integrated XML data. J. Syst. Softw. 83(12), 2566–2578 (2010)CrossRefGoogle Scholar
  38. 38.
    Vincent, M.W., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst. 29(3), 445–462 (2004)CrossRefGoogle Scholar
  39. 39.
    Vo, L.T.H., Cao, J., Rahayu, W.: Discovering Conditional Functional Dependencies in XML Data, Australasian Database Conference, pp. 143–152. (2011)Google Scholar
  40. 40.
    Vo, L.T.H., Cao, J., Rahayu, W., Nguyen, H.-Q.: Structured content-aware discovery for improving XML data consistency. Inform. Sci. 248(1), 168–190 (2013)MathSciNetCrossRefGoogle Scholar
  41. 41.
    W3C, XML Path Language (XPath), (1999)Google Scholar
  42. 42.
    Weis, M., Naumann, F.: Detecting Duplicate Objects in XML Documents, Proceedings of the 2004 international workshop on Information quality in information systems, Paris, France, ACM, pp. 10–19. (2004)Google Scholar
  43. 43.
    Weis, M., Naumann, F.: DogmatiX Tracks down Duplicates in XML, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, Maryland, ACM pp. 431–442. (2005)Google Scholar
  44. 44.
    Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M.: GDR: a system for guided data repair, SIGMOD, pp. 1223–1226. (2010)Google Scholar
  45. 45.
    Yu, C., Jagadish, H.V.: XML Schema refinement through redundancy detection and normalization. VLDB 17(2), 203–223 (2008)CrossRefGoogle Scholar
  46. 46.
    Yu, C., Popa, L.: Constraint-based XML query rewriting for data integration, SIGMOD ‘04, Paris, France, pp. 371–382. (2004)Google Scholar
  47. 47.
    Yu, C., Jagadish, H.V.: Efficient Discovery of XML Data Redundancies, Proceedings of the 32nd International Conference on Very Large Databases, Seoul, Korea, VLDB Endowment pp. 103–114. (2006)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Computer Science EngineeringLa Trobe UniversityMelbourneAustralia

Personalised recommendations