Skip to main content

Inducing Controlled Error over Variable Length Ranked Lists

  • Conference paper
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Abstract

When examining the robustness of systems that take ranked lists as input, we can induce noise, measured in terms of Kendall’s tau rank correlation, by applying a set number of random adjacent transpositions. The set number of random transpositions ensures that any ranked lists, induced with this noise, has a specific expected Kendall’s tau. However, if we have ranked lists of varying length, it is not clear how many random transpositions we must apply to each list to ensure that we obtain a consistent expected Kendall’s tau across the collection. In this article we investigate how to compute the number of random adjacent transpositions required to obtain an expected Kendall’s tau for a given list length, and find that it is infeasible to compute for lists of length more than 9. We also investigate an alternate and more efficient method of inducing noise in ranked lists called Gaussian Perturbation. We show that using this method, we can compute the parameters required to induce a consistent level of noise for lists of length 107 in just over six minutes. We also provide an approximate solution to provide results in less than 10− 5 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aouf, M., Park, L.A.F.: Approximate document outlier detection using random spectral projection. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 579–590. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2), 21–30 (2008)

    Article  Google Scholar 

  3. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011)

    Google Scholar 

  4. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44(1), 1 (2012)

    Article  Google Scholar 

  5. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622. ACM (2001)

    Google Scholar 

  6. Farah, M., Vanderpooten, D.: An outranking approach for rank aggregation in information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 591–598. ACM, New York (2007)

    Google Scholar 

  7. Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer Series in Statistics, vol. 1 (2001)

    Google Scholar 

  8. Krüpl, B., Holzinger, W., Darmaputra, Y., Baumgartner, R.: A flight meta-search engine with metamorph. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1069–1070. ACM (2009)

    Google Scholar 

  9. Linden, G., Smith, B., York, J.: Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003)

    Article  Google Scholar 

  10. Liu, Y.T., Liu, T.Y., Qin, T., Ma, Z.M., Li, H.: Supervised rank aggregation. In: Proceedings of the 16th international Conference on World Wide Web, pp. 481–490. ACM (2007)

    Google Scholar 

  11. Park, L.A.F.: Fast approximate text document clustering using compressive sampling. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 565–580. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Ronchetti, E., Field, C., Blanchard, W.: Robust linear model selection by cross-validation. Journal of the American Statistical Association 92(439), 1017–1023 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  13. Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press (2001)

    Google Scholar 

  14. Shah, A.K.: A simpler approximation for areas under the standard normal curve. The American Statistician 39(1), 80–80 (1985)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Park, L.A.F., Stone, G. (2014). Inducing Controlled Error over Variable Length Ranked Lists. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06605-9_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06604-2

  • Online ISBN: 978-3-319-06605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics