Abstract
Elastic ChatNoir (Search:www.chatnoir.eu Code:www.github.com/chatnoir-eu) is an Elasticsearch-based search engine offering a freely accessible search interface for the two ClueWeb corpora and the Common Crawl, together about 3 billion web pages. Running across 130 nodes, Elastic ChatNoir features subsecond response times comparable to commercial search engines. Unlike most commercial search engines, it also offers a powerful API that is available free of charge to IR researchers. Elastic ChatNoir’s main purpose is to serve as a baseline for reproducible IR experiments and user studies for the coming years, empowering research at a scale not attainable to many labs beforehand, and to provide a platform for experimenting with new approaches to web search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of WWW (1998)
Cartright, M.A., Huston, S., Field, H.: Galago: A modular distributed processing and retrieval system. In: Proceedings of SIGIR 2012 Workshop on Open Source Information Retrieval, pp. 25–31 (2012)
Collins-Thompson, K., Bennett, P.N., Diaz, F., Clarke, C., Voorhees, E.M.: TREC 2013 web track overview. In: Proceedings of TREC (2013)
Collins-Thompson, K., Macdonald, C., Bennett, P.N., Diaz, F., Voorhees, E.M.: TREC 2014 web track overview. In: Proceedings of TREC (2014)
Goetz, B.: The Lucene search engine: powerful, flexible, and free. In: JavaWorld (2000)
Hagen, M., Potthast, M., Adineh, P., Fatehifar, E., Stein, B.: Source retrieval for web-scale text reuse detection. In: Proceedings of CIKM 2017, pp. 2091–2094 (2017)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_37
Potthast, M., Hagen, M., Stein, B., Graßegger, J., Michel, M., Tippmann, M., Welsch, C.: ChatNoir: a search engine for the ClueWeb09 corpus. In: Proceedings of SIGIR 2012, p. 1004 (2012)
Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: Proceedings CIKM 2004, pp. 42–49 (2004)
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language model-based search engine for complex queries. In: Proceedings of ICIA 2005, pp. 2–6 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bevendorff, J., Stein, B., Hagen, M., Potthast, M. (2018). Elastic ChatNoir: Search Engine for the ClueWeb and the Common Crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_83
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_83
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)