Interpretation of Construction Patterns for Biodiversity Spreadsheets
Spreadsheets are widely adopted as “popular databases”, where authors shape their solutions interactively. Although spreadsheets are easily adaptable by the author, their informal schemas cannot be automatically interpreted by machines to integrate data across independent spreadsheets. In biology, we observed a significant amount of biodiversity data in spreadsheets treated as isolated entities with different tabular organizations, but with high potential for data articulation. In order to automatically interpret these spreadsheets we exploit construction patterns followed by users in the biodiversity domain. This paper details evidences of such patterns and how they can lead to characterize the nature of a spreadsheet, as well as, its fields in a domain. It combines an automatic analysis of thousands of spreadsheets, collected on the Web, with results from a survey conducted with biologists. We propose a representation model to be used in automatic interpretation systems that captures these patterns.
KeywordsPattern recognition Spreadsheet interpretation Semantic mapping Biodiversity data integration
Work partially financed by FAPESP (2012/16159-6), the Microsoft Research FAPESP Virtual Institute (NavScales project), the Center for Computational Engineering and Sciences - Fapesp/Cepid 2013/08293-7, CNPq (grant 143483/2011-0, MuZOO Project and PRONEX-FAPESP), INCT in Web Science (CNPq 557.128/2009-9), CAPES, as well as individual grants from CNPq.
- 1.Tolk, A.: What comes after the Semantic Web - PADS Implications for the Dynamic Web, pp. 55–62 (2006)Google Scholar
- 2.Bernardo, I.R., Santanchè, A., Baranauskas, M.C.C.: Automatic interpretation spreadsheets based on construction patterns recognition. In: International Conference on Enterprise Information Systems (ICEIS), pp. 1–12 (2014)Google Scholar
- 3.Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a Web of Semantic Data for Interpreting Tables, pp. 26–27 (2010)Google Scholar
- 4.O’Connor, M.J., Halaschek-Wiener, C., Musen, M.A.: Mapping master: a flexible approach for mapping spreadsheets to OWL. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part II. LNCS, vol. 6497, pp. 194–208. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 5.Zhao, C., Zhao, L., Wang, H.: A spreadsheet system based on data semantic object. In: 2010 2nd IEEE International Conference on Information Management and Engineering, pp. 407–411 (2010)Google Scholar
- 8.Ponder, W.F., Carter, G.A., Flemons, P., Chapman, R.R.: Evaluation of Museum Collection Data for Use in Biodiversity Assessment. 15(3), 648–657 (2010)Google Scholar
- 9.Doush, I.A., Pontelli, E.: Detecting and recognizing tables in spreadsheets. In: Proceedings 8th IAPR International Workshop Document Analysis System - DAS 2010, pp. 471–478 (2010)Google Scholar
- 10.Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceeding 28th International Conference on Software Engineering - ICSE 2006, vol. 15, p. 182 (2006)Google Scholar
- 13.Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the International Workshop on Consuming Linked Data, pp. 1–12 (2010)Google Scholar
- 14.Jang, W., Seiie, Ko, Eun-Jung and Woo: Unified user-centric context: who, where, when, what, how and why. In: Proceedings of the International Workshop on Personalized Context Modeling and Management for UbiComp Applications, pp. 26–34 (2005)Google Scholar