Improving Web Data Annotations with Spreading Activation

  • Fatih Gelgi
  • Srinivas Vadrevu
  • Hasan Davulcu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3806)


The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by the human eye, “meaningful information” is still largely inaccessible for the computer applications. In this paper, we present automated algorithms to gather meta-data and instance information by utilizing global regularities on the Web and incorporating the contextual information. Our system is distinguished since it does not require domain specific engineering. Experimental evaluations were successfully performed on the TAP knowledge base and the faculty-course home pages of computer science departments containing 16,861 Web pages.


Semi-structured data spreading activation semantic partitioning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)Google Scholar
  2. 2.
    Davulcu, H., Vadrevu, S., Nagarajan, S., Ramakrishnan, I.V.: Ontominer: Bootstrapping and populating ontologies from domain specific web sites. IEEE Intelligent Systems 18(5) (September 2003)Google Scholar
  3. 3.
    Vadrevu, S., Nagarajan, S., Gelgi, F., Davulcu, H.: Automated metadata and instance extraction from news web sites. In: The 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne University of Technology, France (2005) (to appear)Google Scholar
  4. 4.
    Ashish, N., Knoblock, C.A.: Semi-automatic wrapper generation for internet information sources. In: Conference on Cooperative Information Systems, pp. 160–169 (1997)Google Scholar
  5. 5.
    Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper induction for information extraction. In: Intl. Joint Conference on Artificial Intelligence, pp. 729–737 (1997)Google Scholar
  6. 6.
    Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)Google Scholar
  7. 7.
    Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ACM SIGMOD, San Diego, USA (2003)Google Scholar
  8. 8.
    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall. In: Intl. World Wide Web Conf. (2004)Google Scholar
  9. 9.
    Ciravegna, F., Chapman, S., Dingli, A., Wilks, Y.: Learning to harvest information for the semantic web. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 312–326. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Dill, S., Tomlin, J.A., Zien, J.Y., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A.: Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In: Twelth International Conference on World Wide Web, pp. 178–186 (2003)Google Scholar
  11. 11.
    Collins, A.M., Loftus, E.F.: A spreading activation theory of semantic processing. Psychological Review (82), 407–428 (1975)Google Scholar
  12. 12.
    Salton, G., Buckley, C.: On the use of spreading activation methods in automatic information. In: Proceedings of the 11th international ACM SIGIR conference on Research and development in information retrieval, pp. 147–160. ACM Press, New York (1988)CrossRefGoogle Scholar
  13. 13.
    Guha, R.V., McCool, R.: Tap: A semantic web toolkit. Semantic Web Journal (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Fatih Gelgi
    • 1
  • Srinivas Vadrevu
    • 1
  • Hasan Davulcu
    • 1
  1. 1.Department of Computer Science and EngineeringArizona State UniversityTempeUSA

Personalised recommendations