Skip to main content

The Adaptability of English Based Web Search Algorithms to Chinese Search Engines

  • Conference paper
Frontiers of WWW Research and Development - APWeb 2006 (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Abstract

Much research in recent years has been devoted to meta-search and multilingual search to improve performance and increase the scope of the search. Since most existing web search algorithms are originally developed for English web documents, one would question the efficiency and performance of these techniques as they are applied to documents of other languages. In this work, we have chosen Chinese web search and documents for our study. Potential issues and problems in applying well-known English language based algorithms to Chinese web documents are identified and discussed. Through our qualitative and exploratory quantitative analysis, it can be concluded that these algorithms and techniques cannot be directly used to develop an efficient Chinese search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine (July 31, 2005), http://www-db.stanford.edu/~backrub/google.html

  2. China Internet Information Center. A Survey and Report on the Status of Internet Development in China (July 31, 2005), http://www.cnnic.net.cn/download/2004/2004072002.pdf

  3. Chinese-search-engine.com. Chinese Search Engine Survey (September 21, 2004), http://chinese-search-engine.com/chinese-search-engine/survey.htm

  4. Chinese-search-engine.com. Marketing China: Simple Facts About China (September 21, 2004), http://chinese-search-engine.com/marketing-china/china-facts.htm

  5. Chinese Mac FAQ. Character Sets and Encodings (July 31, 2005), http://www.yale.edu/chinesemac/pages/charset_encoding.html

  6. Ciravegna, F.: Challenges in Information Extraction Text for Knowledge Management. IEEE Intelligent Systems and Their Applications (2001)

    Google Scholar 

  7. Craswell, N., Hawking, D., Thistlewaite, P.: Merging Results from Isolated Search Engines. In: The Tenth Australasian Database Conference (1999)

    Google Scholar 

  8. Foo, S., Li, H.: Chinese Word Segmentation and Its Effect on Information Retrieval. Information Processing and Management 40(1), 161–190 (2004)

    Article  Google Scholar 

  9. Freitag, D., Kushmerick, N.: Boost Wrapper Induction. In: The Seventeenth National Conference on Artificial Intelligence (AAAI 2000) (2000)

    Google Scholar 

  10. Ishida, R.: Ruby Markup and Styling (July 31, 2005), http://www.w3.org/International/tutorials/ruby

  11. Jin, H., Wong, K.F.: A Chinese Dictionary Construction Algorithm for Information Retrieval. ACM Transactions on Asian Language Information Processing 1(4), 281–296 (2002)

    Article  Google Scholar 

  12. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  13. Kushmerick, N., Weld, S.D., Doorenbos, R.: Wrapper Induction for Information Extraction. In: International Joint Conference on Artificial Intelligence, pp. 729–737 (1997)

    Google Scholar 

  14. Li, K.F., Wang, Y., Nishio, S., Yu, W.: A Formal Approach to Evaluate and Compare Internet Search Engines: A Case Study on Searching the Chinese Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 195–206. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Liu, G., et al.: China Web Graph Measurements and Evolution. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 668–679. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Luk, R.W.P., Kwok, K.L.: A Comparison of Chinese Document Indexing Strategies and Retrieval Models. ACM Transactions on Asian Language Information Processing 1(3), 225–268 (2002)

    Article  Google Scholar 

  17. Muslea, I., Minton, S., Knoblock, C.: A Hierarchical Approach to Wrapper Induction. In: The Third International Conference on Autonomous Agents, pp. 190–197 (1999)

    Google Scholar 

  18. Soderland, S.: Learning Information Extraction Rules for Semistructured and Free Text. Machine Learning, 1–44 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, L., Li, K.F., Manning, E.G. (2006). The Adaptability of English Based Web Search Algorithms to Chinese Search Engines. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_36

Download citation

  • DOI: https://doi.org/10.1007/11610113_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics