Improving Access to Large Patent Corpora

  • Richard Bache
  • Leif Azzopardi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6380)


Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favor Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents than the traditional Boolean model.


Term Frequency Lorenz Curve Retrieval Model Query Term Information Retrieval System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The lemur toolkit, (Last visited 2010)
  2. 2.
  3. 3.
    Arampatzis, A., Kamps, J., Koolen, M., Nussbaum, N.: Access to legal documents: Exact match, best match and combinations. In: TREC 2007: NIST Special Publication 500-274: The Sixteenth Text Retrieval Conference Proceedings, Gaithersburg, MD, USA. NIST (2007)Google Scholar
  4. 4.
    Azzopardi, L., Bache, R.: On the relationship between effectiveness and accessibility. In: Proceedings of the 33th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR 2010 (to appear, 2010)Google Scholar
  5. 5.
    Azzopardi, L., Vanderbauwhede, W., Joho, H.: A survey of patent analysts’ search requirements. In: Proceedings of the 33th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR 2010 (to appear, 2010)Google Scholar
  6. 6.
    Azzopardi, L., Vinay, V.: Accessibility in information retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 482–489. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Azzopardi, L., Vinay, V.: Document accessibility: Evaluating the access afforded to a document by the retrieval system. In: Evaluation Workshop at the European Conference in Information Retrieval, Glasgow, UK (March 30-April 3, 2008)Google Scholar
  8. 8.
    Azzopardi, L., Vinay, V.: Evaluation methods for information access tasks. In: CIKM 2008 Proceedings of the 17th ACM International Conference on Information and Knowledge Management, California, US, October 26-30. ACM Press, New York (2008)Google Scholar
  9. 9.
    Bache, R., Azzopardi, L.: Identifying retrievability-improving model features to enhance boolean search for patent retrieval. In: Proceedings of the 1st International Workshop on the Advances in Patent Information Retrieval (2010)Google Scholar
  10. 10.
    Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: CIKM, pp. 1863–1866 (2009)Google Scholar
  11. 11.
    Bashir, S., Rauber, A.: Improving retrievability of patents in prior-art search. To appear ECIR2010, Milton Keynes, England (2010)Google Scholar
  12. 12.
    Bonino, D., Ciaramella, A., Corno, F.: Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32(1), 30–38 (2010)CrossRefGoogle Scholar
  13. 13.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56. ACM, New York (2004)Google Scholar
  14. 14.
    Gastwirth, J.: The estimation of the lorenz curve and gini index. The Review of Economics and Statistics 54, 306–316 (1972)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Hunt, D., Nguyen, L., Rodgers, M.: Patent Searching: Tools and Techniques. John Wiley and Sons, Chichester (2007)Google Scholar
  16. 16.
    Joho, H., Azzopardi, L., Vanderbauwhede, W.: A survey of patent users: An analysis of tasks, behavior, search functionality and system requirements. In: Proceedings of the 3rd Symposium on Information Interaction in Context, IIiX 2010 (to appear, 2010)Google Scholar
  17. 17.
    Ma, H., Chandrasekar, R., Quirk, C., Gupta, A.: Improving search engines using human computation games. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 275–284 (2009)Google Scholar
  18. 18.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  19. 19.
    Salton, G., Fox, E., Wu, H.: Extended boolean information retrieval. Communications of ACM, 1022–1036 (1983)Google Scholar
  20. 20.
    Spärk Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 60(5), 779–840 (2004)Google Scholar
  21. 21.
    Spärk Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Information Processing and Management 36(6), 493–502 (2000)Google Scholar
  22. 22.
    Tseng, Y.H., Wu, Y.J.: A study of search tactics for patentability search: a case study on patent engineers. In: PaIR ’08: Proceeding of the 1st ACM Workshop on Patent Information Retrieval, pp. 33–36. ACM Press, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Richard Bache
    • 1
  • Leif Azzopardi
    • 1
  1. 1.Department of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations