Mining Travel Resources on the Web Using L-Wrappers

  • Elvira Popescu
  • Amelia Bădică
  • Costin Bădică
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4029)


The work described here is part of an ongoing research on the application of general-purpose inductive logic programming, logic representation of wrappers (L-wrappers) and XML technologies (including the XSLT transformation language) to information extraction from the Web. The L-wrappers methodology is based on a sound theoretical approach and has already proved its efficacy on a smaller scale, in the area of collecting product information. This paper proposes the use of L-wrappers for tuple extraction from HTML in the domain of e-tourism. It also outlines a method for translating L-wrappers into XSLT and illustrates it with the example of a real-world travel agency Web site.


Information Extraction Pattern Graph Inductive Logic Programming Inductive Logic Programming System Address Description 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bădică, C., Bădică, A.: Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 177–191. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Bădică, C., Bădică, A., Popescu, E.: Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 44–50. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Bex, G.J., Maneth, S., Neven, F.: A formal model for an expressive fragment of XSLT. Information Systems (27), 21–39 (2002)MATHCrossRefGoogle Scholar
  4. 4.
    Clark, J.: XSLT Transformation (XSLT) Version 1.0, W3C Recommendation November 16 (1999),
  5. 5.
    Chidlovskii, B.: Information Extraction from Tree Documents by Learning Subtree Delimiters. In: Proc. IIWeb 2003, Acapulco, Mexico, pp. 3–8 (2003)Google Scholar
  6. 6.
    Freitag, D.: Information extraction from HTML: application of a general machine learning approach. In: Proc. AAAI 1998, pp. 517–523 (1998)Google Scholar
  7. 7.
    Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive Power of Tree and String Based Wrappers. In: Proc. IIWeb 2003, Acapulco, Mexoco, pp. 16–21 (2003)Google Scholar
  8. 8.
    Knoblock, C.: Agents for Gathering, Integrating, and Monitoring Information for Travel Planning. In: Intelligent Systems for Tourism. IEEE Intelligent Systems. pp. 53–66, November/December (2002)Google Scholar
  9. 9.
    Kosala, R., Bussche, J., van den Bruynooghe, M., Blockeel, H.: Information Extraction in Structured Documents Using Tree Automata Induction. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 299–310. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Laender, A.H.F., Ribeiro-Neto, B., Silva, A.S., Teixeira., J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)CrossRefGoogle Scholar
  12. 12.
    Laudon, K.C., Traver, C.G.: E-commerce. business. technology. society, 2nd edn. Pearson Addison-Wesley, London (2004)Google Scholar
  13. 13.
    Li, Z., Ng, W.K.: WDEE: Web Data Extraction by Example. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 347–358. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Oxygen XML Editor,
  15. 15.
    Quinlan, J.R., Cameron-Jones, R.M.: Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 287–312 (1995)CrossRefGoogle Scholar
  16. 16.
    Sakamoto, H., Arimura, H., Arikawa, S.: Knowledge Discovery from Semistructured Texts. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 586–599. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Travelocity Web site.
  18. 18.
    Xiao, L., Wissmann, D., Brown, M., Jablonski, S.: Information Extraction from HTML: Combining XML and Standard Techniques fro IE from the Web. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 165–174. Springer, Heidelberg (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Elvira Popescu
    • 1
  • Amelia Bădică
    • 2
  • Costin Bădică
    • 1
  1. 1.Software Engineering DepartmentUniversity of CraiovaCraiovaRomania
  2. 2.Business Information Systems DepartmentUniversity of CraiovaCraiovaRomania

Personalised recommendations