Advertisement

Extracting Topic Maps from Web Pages

  • Motohiro Mase
  • Seiji Yamada
  • Katsumi Nitta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5433)

Abstract

We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the following two points to the existing clustering method: The first is merging only the linked Web pages, thus extracting the underlying relationships between the topics. The second is introducing weighting based on similarity from the contents of the Web pages and relevance between topics of pages. The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeling the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.

Keywords

Web information extraction Topic Maps clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models. In: 5th International World Wide Web Conference (2000)Google Scholar
  3. 3.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 150–160 (2000)Google Scholar
  4. 4.
    Gansner, R.E., North, S.C.: An open graph visualization system and its applications to software engineering. Software – Practice and Experience 30(11), 1203–1233 (2000)CrossRefGoogle Scholar
  5. 5.
    Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    GVU’s WWW Surveying Team: GVU’s 10th WWW User Survey: Problem Using the Web (1998), http://www.gvu.gatech.edu/user_surveys/
  7. 7.
    International Standard Organization: ISO/IEC 13250 Topic Maps: Information Tecknology Document Description and Markup Language (2000)Google Scholar
  8. 8.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River (1998)Google Scholar
  9. 9.
    Kerk, R., Groschupf, S.: How to Create Topic Maps (2003), http://www.media-style.com/gfx/assets/HowtoCreateTopicMaps.pdf
  10. 10.
    Menczer, F.: Lexical and semantic clustering by web links. Journal of American Society Information Science and Technology 55(14), 1261–1269 (2004)CrossRefGoogle Scholar
  11. 11.
    Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E 69, 066133 (2004)CrossRefGoogle Scholar
  12. 12.
    Reynolds, J., Kimber, W.E.: Topic Map Authoring With Reusable Ontologies and Automated Knowledge Mining. In: XML 2002 Conference (2002)Google Scholar
  13. 13.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  14. 14.
    Spertus, E.: ParaSite: mining structural information on the Web. In: The 6th International World Wide Web Conference, pp. 1205–1215 (1997)Google Scholar
  15. 15.
    TopicMaps.Org: XML Topic Maps 1.0 (2001), http://www.topicmaps.org/xtm/1.0/

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Motohiro Mase
    • 1
  • Seiji Yamada
    • 2
  • Katsumi Nitta
    • 1
  1. 1.Tokyo Institute of TechnologyJapan
  2. 2.National Institute of InformaticsJapan

Personalised recommendations