Abstract

Electronic library is the collection of digital information related to an individual domain and in turn to all domains. A focused crawler traverses the Web looking for the pages most relevant to a domain and at the same time discarding the irrelevant pages and hence is helpful for generating the-e contents for digital library related to a particular domain. In this paper a focused crawling technique to generate online contents for e-library based upon WorldNet semantics is proposed. The applicability of the proposed approach is shown by retrieving the documents which are highly related to a single domain. The quality of the pages included into the library is derived from the relevancy measure of the page with the content of domain related pages.

Keywords

Focused Web crawler information retrieval Tf-Idf semantics search engine indexing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brin, S., Page, L.: The anatomy of a large scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)CrossRefGoogle Scholar
  2. 2.
    Ehrig, M., Maedche, A.: Ontology-Focused Crawling of Web Documents. In: Proceedings of the Symposium on Applied Computing 2003, Melbourne, FL, USA (2003)Google Scholar
  3. 3.
    Cho, J., Hector, G.-M.: Parallel Crawlers. In: Proceedings of the World Wide Web conference (WWW), Honolulu, Hawaii (2002)Google Scholar
  4. 4.
    Cho, J., Garcia-Molina, H.: The evolution of the web and implications for an incremental crawler. In: Proceeding of 26th International Conference on Very Large Database, Cairo, Egypt, pp. 200–209 (2000)Google Scholar
  5. 5.
    Cho, J., Garcia-Molina, H., Page, L.: Efficient Crawling Through URL Ordering. In: Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, pp. 379–388 (1998)Google Scholar
  6. 6.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, pp. 1–17 (1998)Google Scholar
  7. 7.
    Ester, M., Groß, M., Kriegel, H.-P.: Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies. In: Proceedings of the 27th International Conference on Very Large Database, VLDB 2001, Roma, Italy, pp. 633–637 (2001)Google Scholar
  8. 8.
    Kumar, M., Vig, R.: Design of CORE: context ontology rule enhanced focused web crawler. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, pp. 494–497. ACM, New York (2009) ISBN: 978-1-60558-351-8, doi:10.1145/1523103.1523201Google Scholar
  9. 9.
    Kumar, M., Vig, R.: Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler. In: Krishna, P.V., Babu, M.R., Ariwa, E. (eds.) ObCom 2011, Part II. CCIS, vol. 270, pp. 31–36. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Goyal, N., Kumar, M., Vig, R.: Consistency Enforcement Using Ontology on Web. Journal of Computers 5(10), 1520–1526 (2010), ISSN 1796-203X, doi:10.4304/jcp.5.10.1520-1526Google Scholar
  11. 11.
    Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: a scalable fully distributed web crawler. Software Practice & Experience 34(8), 711–726 (2004)CrossRefGoogle Scholar
  12. 12.
    De Bra, P.M.E., Post, R.D.J.: Information retrieval in the World-Wide Web: Making client-based searching feasible. Computer Networks and ISDN Systems 27(2), 183–192 (1994)CrossRefGoogle Scholar
  13. 13.
    Chakrabarti, S., van den Berg, M., Domc, B.: Focused crawling: a new approach to topic-specific Web resource discovery. In: Proceedings of the 8th International World Wild Web Conference, Toronto, Canada, pp. 1623–1640 (1999)Google Scholar
  14. 14.
    Pirkola, A.: Focused Crawling: A Means to Acquire Biological Data from the Web. In: VLDB 2007, Vienna, Austria (2007)Google Scholar
  15. 15.
    Rungsawang, A., Angkawattanawit, N.: Learnable topic-specific web crawler. Journal of Networks and Computer Applications, 97–114 (2005)Google Scholar
  16. 16.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  17. 17.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  18. 18.
  19. 19.
    Singh, J., Kumar, M.: A Meta Search Approach to Find Similarity between Web Pages Using Different Similarity Measures. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds.) ICAC3 2011. CCIS, vol. 125, pp. 150–160. Springer, Heidelberg (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.University Institute of Engineering and Technology, Panjab UniversityChandigarhIndia

Personalised recommendations