Self Training Wrapper Induction with Linked Data

  • Anna Lisa Gentile
  • Ziqi Zhang
  • Fabio Ciravegna
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)


This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our method can achieve F measure of 0.85, which is a competitive result compared against a supervised solution.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ACM SIGMOD/PODS 2003, pp. 337–348. ACM (2003),
  2. 2.
    Blanco, R., Halpin, H., Herzig, D., Mika, P.: Entity search evaluation over structured web data. In: SIGIR 2011, pp. 65–71 (2011),
  3. 3.
    Carlson, A., Schafer, C.: Bootstrapping information extraction from semi-structured web pages. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 195–210. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. Journal of the ACM 51(5), 731–779 (2004), CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. In: VLDB 2011, vol. 4(4), pp. 219–230 (2011),
  6. 6.
    Gentile, A.L., Zhang, Z., Augenstein, I., Ciravegna, F.: Unsupervised wrapper induction using linked data. In: K-CAP 2013, pp. 41–48. ACM (2013),
  7. 7.
    Hao, Q., Cai, R., Pang, Y., Zhang, L.: From One Tree to a Forest: a Unified Solution for Structured Web Data Extraction. In: SIGIR 2011, pp. 775–784 (2011),
  8. 8.
    Kobilarov, G., Bizer, C., Auer, S., Lehmann, J.: DBpedia-A Linked Data Hub and Data Source for Web and Enterprise Applications. In: WWW 2009, pp. 1–3 (2009),
  9. 9.
    Kushmerick, N.: Wrapper Induction for information Extraction. In: IJCAI 1997, pp. 729–735 (1997),
  10. 10.
    Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: COLD 2010, pp. 1–12 (2010)Google Scholar
  11. 11.
    Muslea, I., Minton, S., Knoblock, C.: Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction. In: IJCAI 2003, pp. 415–420 (2003),
  12. 12.
    Muslea, I., Minton, S., Knoblock, C.: Hierarchical wrapper induction for semistructured information sources. Auton. Agents and Multi-Agent Syst., 1–28 (2001),
  13. 13.
    Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1-3), 233–272 (1999), CrossRefzbMATHGoogle Scholar
  14. 14.
    Wong, T., Lam, W.: Learning to adapt web information extraction knowledge and discovering new attributes via a Bayesian approach. IEEE Knowledge and Data Engineering 22(4), 523–536 (2010), CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Anna Lisa Gentile
    • 1
  • Ziqi Zhang
    • 1
  • Fabio Ciravegna
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldUK

Personalised recommendations