Advertisement

A Large-Scale Repository of Deterministic Regular Expression Patterns and Its Applications

  • Haiming ChenEmail author
  • Yeting Li
  • Chunmei Dong
  • Xinyu Chu
  • Xiaoying Mou
  • Weidong Min
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11441)

Abstract

Deterministic regular expressions (DREs) have been used in a myriad of areas in data management. However, to the best of our knowledge, presently there has been no large-scale repository of DREs in the literature. In this paper, based on a large corpus of data that we harvested from the Web, we build a large-scale repository of DREs by first collecting a repository after analyzing determinism of the real data; and then further processing the data by using normalized DREs to construct a compact repository of DREs, called DRE pattern set. At last we use our DRE patterns as benchmark datasets in several algorithms that have lacked experiments on real DRE data before. Experimental results demonstrate the usefulness of the repository.

Keywords

Deterministic regular expressions Repository Evaluation 

References

  1. 1.
    igraph - the network analysis package. http://igraph.org/
  2. 2.
  3. 3.
    Software for complex networks. http://networkx.github.io/
  4. 4.
    Abiteboul, S., Milo, T., Benjelloun, O.: Regular rewriting of active XML and unambiguity. In: PODS 2005, pp. 295–303. ACM (2005)Google Scholar
  5. 5.
    Barbosa, D., Mignet, L., Veltri, P.: Studying the XML Web: gathering statistics from an XML sample. World Wide Web 9(2), 187–212 (2006)CrossRefGoogle Scholar
  6. 6.
    Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: WWW 2005, pp. 712–721. ACM (2005)Google Scholar
  7. 7.
    Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: a practical study. In: WebDB 2004, pp. 79–84. ACM (2004)Google Scholar
  8. 8.
    Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126. VLDB Endowment (2006)Google Scholar
  9. 9.
    Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB 2007, pp. 998–1009 (2007)Google Scholar
  10. 10.
    Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: CIKM 2015, pp. 1541–1550. ACM (2015)Google Scholar
  11. 11.
    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Chen, H., Chen, L.: Inclusion test algorithms for one-unambiguous regular expressions. In: Fitzgerald, J.S., Haxthausen, A.E., Yenigun, H. (eds.) ICTAC 2008. LNCS, vol. 5160, pp. 96–110. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-85762-4_7CrossRefGoogle Scholar
  13. 13.
    Chen, H., Lu, P.: Checking determinism of regular expressions with counting. Inf. Comput. 241, 302–320 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Choi, B.: What are real DTDs like. Technical reports (CIS), p. 17 (2002)Google Scholar
  15. 15.
    Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. arXiv:1307.6348 [cs.DB] (2013)
  16. 16.
    Colazzo, D., Ghelli, G., Pardini, L., Sartiani, C.: Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking. Theor. Comput. Sci. 492(2013), 88–116 (2013)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Colazzo, D., Ghelli, G., Sartiani, C.: Linear time membership in a class of regular expressions with counting, interleaving, and unordered concatenation. ACM Trans. Database Syst. (TODS) 42(4), 24 (2017)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Freydenberger, D.D., Kötzing, T.: Fast learning of restricted regular expressions and DTDs. Theory Comput. Syst. 57, 1114–1158 (2015)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Grijzenhout, S., Marx, M.: The quality of the XML web. In: CIKM 2011, pp. 1719–1724 (2011)Google Scholar
  20. 20.
    Huang, X., Bao, Z., Davidson, S.B., Milo, T., Yuan, X.: Answering regular path queries on workflow provenance, pp. 375–386. IEEE (2015)Google Scholar
  21. 21.
    Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB 2013, pp. 13–18 (2013)Google Scholar
  22. 22.
    Kilpeläinen, P.: Checking determinism of XML Schema content models in optimal time. Inf. Syst. 36(3), 596–617 (2011)CrossRefGoogle Scholar
  23. 23.
    Laender, A.H., Moro, M.M., Nascimento, C., Martins, P.: An X-ray on web-available XML schemas. ACM SIGMOD Rec. 38(1), 37–42 (2009)CrossRefGoogle Scholar
  24. 24.
    Li, Y., Chu, X., Mou, X., Dong, C., Chen, H.: Practical study of deterministic regular expressions from large-scale XML and schema files. In: IDEAS 2018, pp. 45–53. ACM (2018)Google Scholar
  25. 25.
    Li, Y., Zhang, X., Peng, F., Chen, H.: Practical study of subclasses of regular expressions in DTD and XML schema. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds.) APWeb 2016. LNCS, vol. 9932, pp. 368–382. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45817-5_29CrossRefGoogle Scholar
  26. 26.
    Losemann, K., Martens, W.: The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38(4), 24:1–24:39 (2013)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Makoto, M.: RELAX NG home page (2014). http://relaxng.org/. Accessed 25 Feb 2014
  28. 28.
    Peng, F., Chen, H.: Discovering restricted regular expressions with interleaving. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 104–115. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25255-1_9CrossRefGoogle Scholar
  29. 29.
    Peng, F., Chen, H., Mou, X.: Deterministic regular expressions with interleaving. In: Leucker, M., Rueda, C., Valencia, F.D. (eds.) ICTAC 2015. LNCS, vol. 9399, pp. 203–220. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25150-9_13CrossRefGoogle Scholar
  30. 30.
    Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema part 1: structures second edition. W3C Recommendation (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Haiming Chen
    • 1
    Email author
  • Yeting Li
    • 1
    • 2
  • Chunmei Dong
    • 1
    • 2
  • Xinyu Chu
    • 1
    • 2
  • Xiaoying Mou
    • 1
    • 2
  • Weidong Min
    • 3
  1. 1.State Key Laboratory of Computer ScienceISCASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.School of SoftwareNanchang UniversityNanchangChina

Personalised recommendations