Improvements in Recall and Precision in Wolters Kluwer Spain Legal Search Engine

  • Angel Sancho Ferrer
  • Jose Manuel Mateo Rivero
  • Alejandro Mesas García
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4884)

Abstract

In this paper we describe the search technology in production in Wolters Kluwer Spain for the legal research market. This technology improves the “Google like” experience by increasing both the total number of retrieved documents (recall) and the quality of the very best ones (precision) while maintaining the ease of entering a natural language query. We propose a hybrid approach, both in the working methodology and in the codification of the legal knowledge -subject matter expert and librarian-, through new layers of semantic analysis and algorithms. We improve the traditional tf-idf vector space model by creating a mixed document indexing schema of terms and concepts as well as a proprietary ranking algorithm trained by a hybrid genetic algorithm. These calculations also improve the quality of keyword-in-context.

Keywords

Legal Knowledge Representation Information Retrieval Semantic Indexation Hybrid Methodologies Genetic Algorithms Machine Learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)Google Scholar
  2. 2.
    Battelle, J.: The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Penguin Group (2005)Google Scholar
  3. 3.
    Brin, S., Page, L.: The Anatomy of a Large Hypertextual Web Search Engine. Computer Science Department, Stanford University (1998)Google Scholar
  4. 4.
    Buckland, M., Chen, A., Chen, H.M., Youngin, K., Lam, B., Larson, R., Norgard, B., Purat, J.: Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies (1999)Google Scholar
  5. 5.
    Cano, J.R., Herrera, F., Lozano, M. (2006). A Study on the Combination of Evolutionary Algorithms and Stratified Strategies for Training Set Selection in Data Mining. Advances in Soft Computing Series, pp. 271–284. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Cotta, C., Alba, E., Troya, J.M., Schoenauer, M.: Utilising Dynastically Optimal Forma Recombination in Hybrid Genetic Algorithms. In: Bäck, T., Eiben, A.E., Schoenauer, M., Schwefel, H.-P. (eds.) Parallel Problem Solving from Nature V. LNCS, pp. 305–314. Springer, Berlin (1998)CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Gelotte, K.: The Art and Science of Query Cooking. In: New Idea Engineering (2005), http://www.ideaeng.com/pub/entsrch/v2n7/article01.html
  9. 9.
    Gospodnetic, O., Hatcher, E.: Lucene in Action. Manning Publications (2005)Google Scholar
  10. 10.
    Hallerman, D.: Search engine marketing: search users and usage. In: eMarketer (2005)Google Scholar
  11. 11.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor (1975)Google Scholar
  12. 12.
    Jansen, B.J., Spink, A.: How are we searching the World Wide Web? A comparison of nine search engine transaction logs. In: Information Processing and Management (2004)Google Scholar
  13. 13.
    Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32(1), 5–17 (1998)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Peláez, J.I., Mesas, A., Pelta, D.: Soft Computing based decision support system: Two Prototypes for Combinatorial optimization problems. In: Fourth Conference of the European Society for Fuzzy Logic and Tecnology (EUSFLAT 2005) (2005)Google Scholar
  16. 16.
    Pollock, A., Hockley, A.: What’s Wrong with Internet Searching. At: Designing for the Web: Empirical Studies (october 30, 1996 Microsoft Corporate Headquarters)Google Scholar
  17. 17.
    Singitham,, Pavan Kumar, C., Mahabhashyam, M.S., Raghavan, P.: Efficiency-quality tradeoffs for vector score aggregation. In: Proc. VLDB, pp. 624–635 (2004)Google Scholar
  18. 18.
    Sisson, D.: Assumptions About User Search Behaviour (1998-2002)Google Scholar
  19. 19.
    Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: the public and their queries. Journal of the American society for information science and technology (February 1, 2001)Google Scholar
  20. 20.
    Susskind, R.: Transforming the Law. Essays on Technology, Justice and the Legal marketplace. Oxford University Press, Oxford (2000)Google Scholar
  21. 21.
    Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: Proc. SIGIR, pp. 127–134. ACM Press, New York (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Angel Sancho Ferrer
    • 1
  • Jose Manuel Mateo Rivero
    • 1
  • Alejandro Mesas García
    • 1
  1. 1.Research & Development DepartmentWolters Kluwer SpainMadridSpain

Personalised recommendations