Advertisement

Enabling Real Time Analytics over Raw XML Data

  • Manoj K. AgarwalEmail author
  • Krithi Ramamritham
  • Prashant Agarwal
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

The data generated by many applications is in semi structured format, such as XML. This data can be used for analytics only after shredding and storing it in structured format. This process is known as Extract-Transform-Load or ETL. However, ETL process is often time consuming due to which crucial time-sensitive insights can be lost or they may become un-actionable. Hence, this paper poses the following question: How do we expose analytical insights in the raw XML data? We address this novel problem by discovering additional information from the raw semi-structured data repository, called complementary information (CI), for a given user query. Experiments with real as well as synthetic data show that the discovered CI is relevant in the context of the given user query, nontrivial, and has high precision. The recall is also found to be high for most queries. Crowd-sourced feedback on the discovered CI corroborates these findings, showing that our system is able to discover highly relevant and potentially useful CI in real-world XML data repositories. Concepts behind our technique are generic and can be used for other semi-structured data formats as well.

Keywords

XML Real time Analytics Information retrieval 

References

  1. 1.
    Tatarinov, I., et al.: Storing and querying ordered XML using a relational database system. In: SIGMOD (2002)Google Scholar
  2. 2.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: EDBT (2008)Google Scholar
  3. 3.
    Liu, Z., Chen, Y.: Identifying meaningful return information for XML keyword search. In: SIGMOD (2007)Google Scholar
  4. 4.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: VLDB (2004)CrossRefGoogle Scholar
  5. 5.
    Zhou, R., Liu, C., Li, J.: Fast ELCA computation for keyword queries on XML Data. In: EDBT (2010)Google Scholar
  6. 6.
    Chen, L., Papakonstantinou, Y.: Supporting top-k keyword search in XML databases. In: ICDE (2010)Google Scholar
  7. 7.
    Guo, L., et al.: XRANK: ranked keyword search over XML documents. In: SIGMOD (2003)Google Scholar
  8. 8.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB (2003)CrossRefGoogle Scholar
  9. 9.
    Cao, H., et al.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: EDBT (2010)Google Scholar
  10. 10.
    Agarwal, M.K., Ramamritham, K., Agarwal, P.: Generic keyword search over xml data. In: EDBT (2016)Google Scholar
  11. 11.
    Bao, Z., Ling, T., Chen, B., Lu, J.: Effective XML keyword search with relevance oriented ranking. In: ICDE (2009)Google Scholar
  12. 12.
    Botev, C., Shanmugasundaram, J.: Context-sensitive keyword search and ranking for XML documents. In: WebDB (2005)Google Scholar
  13. 13.
    Roy, P., et al.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM (2005)Google Scholar
  14. 14.
    Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001).  https://doi.org/10.1007/978-3-662-04565-7CrossRefzbMATHGoogle Scholar
  15. 15.
    Feige, U.: A threshold of ln n for approximating set cover. J. ACM (JACM) 45(4), 634–652 (1998)CrossRefGoogle Scholar
  16. 16.
    Bhalotia, G., et al.: Keyword searching and browsing in databases using BANKS. In: ICDE (2002)Google Scholar
  17. 17.
    Hui, J., Knoop, S., Schwarz, P.: HIWAS: enabling technology for analysis of clinical data in XML documents. In: VLDB (2011)Google Scholar
  18. 18.
    Arenas, M.: Normalization theory for XML. In: SIGMOD Record, vol. 35, no. 4, December 2006CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Manoj K. Agarwal
    • 1
    Email author
  • Krithi Ramamritham
    • 2
  • Prashant Agarwal
    • 3
  1. 1.Microsoft Bing (Search Technology Center - India)HyderabadIndia
  2. 2.Department of Computer Science and EngineeringIIT-BombayPowai, MumbaiIndia
  3. 3.FlipkartBangaloreIndia

Personalised recommendations