Software & Systems Modeling

, Volume 12, Issue 1, pp 15–34 | Cite as

Harvesting models from web 2.0 databases

  • Oscar Díaz
  • Gorka PuenteEmail author
  • Javier Luis Cánovas Izquierdo
  • Jesús García Molina
Theme Section


Data rather than functionality are the sources of competitive advantage for Web2.0 applications such as wikis, blogs and social networking websites. This valuable information might need to be capitalized by third-party applications or be subject to migration or data analysis. Model-Driven Engineering (MDE) can be used for these purposes. However, MDE first requires obtaining models from the wiki/blog/website database (a.k.a. model harvesting). This can be achieved through SQL scripts embedded in a program. However, this approach leads to laborious code that exposes the iterations and table joins that serve to build the model. By contrast, a Domain-Specific Language (DSL) can hide these “how” concerns, leaving the designer to focus on the “what”, i.e. the mapping of database schemas to model classes. This paper introduces Schemol, a DSL tailored for extracting models out of databases which considers Web2.0 specifics. Web2.0 applications are often built on top of general frameworks (a.k.a. engines) that set the database schema (e.g., MediaWiki, Blojsom). Hence, table names offer little help in automating the extraction process. In addition, Web2.0 data tend to be annotated. User-provided data (e.g., wiki articles, blog entries) might contain semantic markups which provide helpful hints for model extraction. Unfortunately, these data end up being stored as opaque strings. Therefore, there exists a considerable conceptual gap between the source database and the target metamodel. Schemol offers extractive functions and view-like mechanisms to confront these issues. Examples using Blojsom as the blog engine are available for download.


Model-driven engineering Web2.0 Harvesting Data re-engineering Databases 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Architecture-Driven Modernization (ADM). Accessed 21-Dec-10.
  2. 2.
    Eclipse Modeling Framework. Accessed 21-Dec-10.
  3. 3.
    hCard Microformat. Accessed 21-Dec-10.
  4. 4.
    Hibernate. Accessed 21-Dec-10.
  5. 5.
    hProduct Microformat. Accessed 21-Dec-10.
  6. 6.
    ISO 9126 Software Quality Model. Accessed 21-Dec-10.
  7. 7.
  8. 8.
    MediaWiki. accessed 21-Dec-10.
  9. 9.
    Microformats. Accessed 21-Dec-10.
  10. 10.
    Rdfa. Accessed 21-Dec-10.
  11. 11.
    Structured Blogging. Accessed 21-Dec-10.
  12. 12.
    Teneo. Accessed 21-Dec-10.
  13. 13.
    Use Class With Semantics in Mind, W3C. Accessed 21-Dec-10.
  14. 14.
    XText. Accessed 21-Dec-10.
  15. 15.
    Barbier, G., Bruneliere H., Jouault F., Lennon Y., Madiot F.: Modisco, a model-driven platform to support real legacy modernization uses cases. In: Information Systems Transformation: Architecture-Driven Modernization Case Studies. Elsevier Science, Amsterdam (2010)Google Scholar
  16. 16.
    Michael, R.B.: On reverse engineering of vendor databases. In: Working Conference on Reverse Engineering (WCRE), pp. 183–190 (1998)Google Scholar
  17. 17.
    Cánovas, J.L., Cuadrado, J.S., Molina J.G.: Gra2MoL: a domain specific transformation language for bridging grammarware to modelware in software modernization. In: MODSE 2008 (2008)Google Scholar
  18. 18.
    Cook, S.: Domain-specific modeling and model driven architecture. MDA J. (2004, last accessed Oct 2010).
  19. 19.
    Czarnecki, D.: Blojsom. Accessed 21-Dec-10
  20. 20.
    Davis, K.H., Aiken P.H.: Data reverse engineering: a historical survey. In: Working Conference on Reverse Engineering (WCRE), pp. 70–78 (2000)Google Scholar
  21. 21.
    Díaz O., Villoria F.M.: Generating blogs out of product catalogues: an MDE approach. J. Syst. Softw. 83(10), 1970–1982 (2010)CrossRefGoogle Scholar
  22. 22.
    Hainaut, J.-L., Cleve, A., Henrard, J., Hick, J.-M.: Migration of Legacy information systems. In: Mens and Demeyer [33], pp. 105–138Google Scholar
  23. 23.
    Heidenreich, F., Johannes, J., Karol, S., Seifert, M., Wende, C.: Derivation and refinement of textual syntax for models. In: ECMDA-FA, pp. 114–129 (2009)Google Scholar
  24. 24.
    Cánovas J.L., Molina J.G.: An architecture-driven modernization tool for calculating metrics. IEEE Softw. 27, 37–43 (2010)Google Scholar
  25. 25.
    Jahnke J.H.: Cognitive support in software reengineering based on generic fuzzy reasoning nets. Fuzzy Sets Syst. 145(1), 3–27 (2004)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Jahnke, J.H., Schäfer, W., Zündorf, A.: Generic fuzzy reasoning nets as a basis for reverse engineering relational database applications. In: ESEC/SIGSOFT FSE, pp. 193–210 (1997)Google Scholar
  27. 27.
    Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I., Valduriez, P.: ATL: a QVT-like transformation language. In: OOPSLA Companion (2006)Google Scholar
  28. 28.
    Jouault, F., Kurtev, I.: Transforming models with ATL. In: MoDELS Satellite Events, pp. 128–138 (2005)Google Scholar
  29. 29.
    Kurtev, I., Bézivin, J., Aksit, M.: Technological spaces: an initial appraisal. In: International Symposium on Distributed Objects and Applications, DOA (2002)Google Scholar
  30. 30.
    Lockwood, N.S., Dennis, A.R.: Exploring the corporate blogosphere: a taxonomi for research and practice. In: Proceedings of the 41st Annual Hawaii International Conference on System Sciences-HICSS (2008)Google Scholar
  31. 31.
    Markines, B.: Socially induced semantic networks and applications. SIGWEB Newsl., pp. 3:1–3:3, September (2009)Google Scholar
  32. 32.
    MartSoft. Open Catalog Format. Accessed 21-Dec-10.
  33. 33.
    Mens T., Demeyer S.: Software Evolution. Springer, Berlin (2008)zbMATHGoogle Scholar
  34. 34.
    Müller, H.A., Jahnke, J.H., Smith, D.B., Storey, M.-A., Tilley, S.R., Wong, K.: Reverse engineering: a roadmap. In: International Conference on Software Engineering (ICSE), pp. 47–60 (2000)Google Scholar
  35. 35.
    Carr, N.: Lessons in Corporate Blogging, 2006. Business Week Online. Accessed 21-Dec-10.
  36. 36.
    Polo M., Rodríguez de Guzmán I.G., Piattini M.: An MDA-based approach for database re-engineering. J. Softw. Maintenance 19(6), 383–417 (2007)CrossRefGoogle Scholar
  37. 37.
    Reus, T., Geers, H., van Deursen, A.: Harvesting software systems for MDA-based reengineering. In: ECMDA-FA, pp. 213–225 (2006)Google Scholar
  38. 38.
    Simitsis, A., Skoutas, D., Castellanos, M.: Representation of conceptual ETL designs in natural language using semantic web technology. In: Data & Knowledge Engineering (2009)Google Scholar
  39. 39.
    Steinberg D., Budinsky F., Paternostro M., Merks E.: EMF: Eclipse Modeling Framework. Addison-Wesley, Reading (2008)Google Scholar
  40. 40.
    Stonebraker M., Moore D.: Object-Relational DBMSs: The Next Great Wave. Morgan Kaufmann, USA (1996)zbMATHGoogle Scholar
  41. 41.
    Ulrich W.M., Newcomb P.H.: Information Systems Transformation: ADM Case Studies. Morgan Kaufmann, USA (2010)Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Oscar Díaz
    • 1
  • Gorka Puente
    • 1
    Email author
  • Javier Luis Cánovas Izquierdo
    • 2
  • Jesús García Molina
    • 2
  1. 1.ONEKIN Research GroupUniversity of the Basque CountrySan SebastiánSpain
  2. 2.Model UM Research GroupUniversity of MurciaMurciaSpain

Personalised recommendations