Exploring Solutions for Linking Big Data in Official Statistics

  • Tiziana Tuoto
  • Daniela Fusco
  • Loredana Di Consiglio
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 227)


Official statistics has acknowledged the value of big data and has started exploring the use of diverse sources in several domains. Sometimes, big data objects can be easily connected to statistical units. If a unit identifier is available, the opportunity to link big data to existing statistical micro data can allow enlarging the content, the coverage, the accuracy and the timeliness of official statistics, for example Internet-scraped data could be used with this aim. In this setting, new challenges arise in data integration with respect to linking administrative data. In this work, we describe a real case of integration of web scraped data and a statistical register of agritourisms specifying the novelties and challenges of the procedure.


Big data Internet-scraped data Data integration Data linkage Farm register 


  1. 1.
    Barcaroli, G., Scannapieco, M., Nurra, A., Scarnò, M., Salamone, S., Summa, D.: Internet as Data Source in the Istat survey on ICT in Enterprises. Austrian J. Stat. 44, 31–43 (2015)CrossRefGoogle Scholar
  2. 2.
    Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.: OWLIM: a family of scalable semantic repositories. Semant. Web J. 2(1) (2011)Google Scholar
  3. 3.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings ACM STOC 2, pp. 380–388. Montreal, Quebec, Canada (2002)zbMATHGoogle Scholar
  4. 4.
    Citro, C.:(2014) From multiple modes for surveys to multiple data sources for estimates. Surv. Method.Google Scholar
  5. 5.
    Fellegi, I.P., Sunter A.B.: A theory for record linkage. J. Am. Stat. Soc. 64 (1969)Google Scholar
  6. 6.
    Fuchs, M., Höpken, W., Lexhagen, M.: Big data analytics for knowledge generation in tourism destinations—a case from Sweden. J. Destination Mark. Manage. (2014)Google Scholar
  7. 7.
    Gill, L.: Methods for Automatic Record Matching and Linking and their Use in National Statistics. National Statistics Methodological Series No. 25. London: Office for National Statistics. (2001)
  8. 8.
    Gupta, S., Szekely, P., Knoblock, C., Goel, A., Taheriyan, M., Muslea, M.: Karma: a system for mapping structured sources into the semantic web. In: Proceedings of the 9th Extended Semantic Web Conference (ESWC2012)Google Scholar
  9. 9.
    Jaro, M.: Advances in Record Linkage Methodologies as Applied to Matching the 1985 Census of Tampa. Fla J. Am. Stat. Soc. 84(406), 414–420 (1989)CrossRefGoogle Scholar
  10. 10.
    Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web semantics: science, services and agents on the WorldWideWeb 23, 2–15 (2013)CrossRefGoogle Scholar
  11. 11.
    Heerschap, N., Ortega, S., Priem, A., Offermans, M.: Innovation of tourism statistics through the use of new big data sources. Technical Paper, Statistics Netherlands (2014)Google Scholar
  12. 12.
    Hepp, M.: GoodRelations: an ontology for describing products and services offers on the web. In: Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), Acitrezza, Italy. vol. 5268. Springer LNCS, 29 Sept–3 Oct 2008, pp. 332–347Google Scholar
  13. 13.
  14. 14.
    Tuoto T., Cibella N., Fortini M., Scannapieco M., Tosco L., (2007) RELAIS: Don’t Get Lost in a Record Linkage Project, Proc. of the Federal Committee on Statistical Methodologies (FCSM: Research Conference. Arlington, VA, USA (2007)Google Scholar
  15. 15.
    Tuoto, T., Gould, P., Seyb, A., Cibella, N., Scannapieco, N., Scanu, M.: Data Linking: A Common Project for Official Statistics in Proceedings of Conference of European Statistics Stakeholders Rome 24, 25 Nov 2014Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Tiziana Tuoto
    • 1
  • Daniela Fusco
    • 1
  • Loredana Di Consiglio
    • 1
  1. 1.IstatRomeItaly

Personalised recommendations