Privacy-Preserving Schema Reuse

  • Nguyen Quoc Viet Hung
  • Do Son Thanh
  • Nguyen Thanh Tam
  • Karl Aberer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8422)


As the number of schema repositories grows rapidly and several webbased platforms exist to support publishing schemas, schema reuse becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of privacy constraints. Instead of showing original schemas, the framework returns an anonymized schema with maximal utility while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy


Utility Function Schema Group Original Schema Abstract Attribute Utility Loss 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Adam, N.R.: Security-control methods for statistical databases: a comparative study. In: CSUR, 515–556 (1989)Google Scholar
  5. 5.
    Agrawal, D.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001, pp. 247–255 (2001)Google Scholar
  6. 6.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Rec., 439–450 (2000)Google Scholar
  7. 7.
    Antón, A.I., Bertino, E., Li, N., Yu, T.: A roadmap for comprehensive online privacy policy management. Communications of the ACM 50(7), 109–116 (2007)CrossRefGoogle Scholar
  8. 8.
    Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD, pp. 906–908 (2005)Google Scholar
  9. 9.
    Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: QDB, pp. 61–72 (2007)Google Scholar
  10. 10.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)Google Scholar
  11. 11.
    Bentounsi, M., Benbernou, S., Deme, C.S., Atallah, M.J.: Anonyfrag: an anonymization-based approach for privacy-preserving bpaas. In: Cloud-I, pp. 9:1–9:8 (2012)Google Scholar
  12. 12.
    Bernstein, P.A., Madhavan, J., Rahm, E.: Generic Schema Matching, Ten Years Later. In: VLDB, pp. 695–701 (2011)Google Scholar
  13. 13.
    Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD, pp. 1–12 (2007)Google Scholar
  14. 14.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)Google Scholar
  15. 15.
    Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD, pp. 70–78 (2008)Google Scholar
  16. 16.
    Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. In: VLDB, pp. 538–549 (2008)Google Scholar
  17. 17.
    Chen, K., Kannan, A., Madhavan, J., Halevy, A.: Exploring schema repositories with schemr. SIGMOD Rec., 11–16 (2011)Google Scholar
  18. 18.
    Clifton, C., Kantarciolu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 19–26. ACM (2004)Google Scholar
  19. 19.
    Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)Google Scholar
  20. 20.
    Sarma, A.D., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD, pp. 817–828 (2012)Google Scholar
  21. 21.
    Duchateau, F., Bellahsene, Z.: Measuring the quality of an integrated schema. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 261–273. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. In: JASA, pp. 10–18 (1986)Google Scholar
  23. 23.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 27–33 (2005)Google Scholar
  25. 25.
    Glover, F., McMillan, C.: The general employee scheduling problem: an integration of ms and ai. COR, 563–573 (1986)Google Scholar
  26. 26.
    Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley (1989)Google Scholar
  27. 27.
    Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. In: SIGMOD, pp. 1061–1066 (2010)Google Scholar
  28. 28.
    Halfond, W., Viegas, J., Orso, A.: A classification of sql-injection attacks and countermeasures, pp. 65–81. IEEE (2006)Google Scholar
  29. 29.
    Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 139–154. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  30. 30.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD, pp. 279–288 (2002)Google Scholar
  31. 31.
    Karp, R.M.: Reducibility Among Combinatorial Problems. In: CCC, pp. 85–103 (1972)Google Scholar
  32. 32.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM, 604–632 (1999)Google Scholar
  33. 33.
    Lambert, D.: Measures of disclosure risk and harm. In: JOS, p. 313 (1993)Google Scholar
  34. 34.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)Google Scholar
  35. 35.
    Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: SIGKDD, pp. 517–526 (2009)Google Scholar
  36. 36.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB, pp. 1338–1347 (2010)Google Scholar
  37. 37.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. TKDD, 24 (2007)Google Scholar
  38. 38.
    Madhavan, J., Bernstein, P.A., Doan, A.-H., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)Google Scholar
  39. 39.
    Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: SIGMOD, pp. 411–422 (2010)Google Scholar
  40. 40.
    Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: SIGMOD, pp. 665–676 (2007)Google Scholar
  41. 41.
    Viet, Q., Nguyen, H., Do, S.T., Nguyen, T.T., Aberer, K.: Towards enabling schema reuse with privacy constraints, EPFL-REPORT-189971 (2013)Google Scholar
  42. 42.
    Nguyen, Q.V.H., Luong, H.X., Miklós, Z., Quan, T.T., Aberer, K.: Collaborative Schema Matching Reconciliation. In: CoopIS (2013)Google Scholar
  43. 43.
    Nguyen, Q.V.H., Thanh, T.N., Miklos, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go Reconciliation in Schema Matching Networks. In: ICDE (2014)Google Scholar
  44. 44.
    Quoc Viet Nguyen, H., Wijaya, T.K., Miklós, Z., Aberer, K., Levy, E., Shafran, V., Gal, A., Weidlich, M.: Minimizing human effort in reconciling match networks. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 212–226. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  45. 45.
    Peukert, E., Eberius, J., Rahm, E.: Amc - a framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)Google Scholar
  46. 46.
    Smith, K., Bonaceto, C., Wolf, C., Yost, B., Morse, M., Mork, P., Burdick, D.: Exploring schema similarity at multiple resolutions. In: SIGMOD, pp. 1179–1182 (2010)Google Scholar
  47. 47.
    Smith, K.P., Mork, P., Seligman, L., Leveille, P.S., Yost, B., Li, M.H., Wolf, C.: Unity: Speeding the creation of community vocabularies for information integration and reuse. In: IRI, pp. 129–135 (2011)Google Scholar
  48. 48.
    Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS, 557–570 (2002)Google Scholar
  49. 49.
    Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., Wagner, M.M.: Technical description of rods: a real-time public health surveillance system. Journal of the American Medical Informatics Association 10(5), 399–408 (2003)CrossRefGoogle Scholar
  50. 50.
    Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)Google Scholar
  51. 51.
    Yost, B., Bonaceto, C., Morse, M., Wolf, C., Smith, K.: Visualizing Schema Clusters for Agile Information Sharing. In: InfoVis, pp. 5–6 (2009)Google Scholar
  52. 52.
    Yu, C., Jagadish, H.V.: Schema summarization. In: VLDB, pp. 319–330 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nguyen Quoc Viet Hung
    • 1
  • Do Son Thanh
    • 1
  • Nguyen Thanh Tam
    • 1
  • Karl Aberer
    • 1
  1. 1.École Polytechnique Fédérale de LausanneSwitzerland

Personalised recommendations