Inducing Controlled Error over Variable Length Ranked Lists

Park, Laurence A. F.; Stone, Glenn

doi:10.1007/978-3-319-06605-9_22

Laurence A. F. Park²³ &
Glenn Stone²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4011 Accesses
1 Citations
1 Altmetric

Abstract

When examining the robustness of systems that take ranked lists as input, we can induce noise, measured in terms of Kendall’s tau rank correlation, by applying a set number of random adjacent transpositions. The set number of random transpositions ensures that any ranked lists, induced with this noise, has a specific expected Kendall’s tau. However, if we have ranked lists of varying length, it is not clear how many random transpositions we must apply to each list to ensure that we obtain a consistent expected Kendall’s tau across the collection. In this article we investigate how to compute the number of random adjacent transpositions required to obtain an expected Kendall’s tau for a given list length, and find that it is infeasible to compute for lists of length more than 9. We also investigate an alternate and more efficient method of inducing noise in ranked lists called Gaussian Perturbation. We show that using this method, we can compute the parameters required to induce a consistent level of noise for lists of length 10⁷ in just over six minutes. We also provide an approximate solution to provide results in less than 10^− 5 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aouf, M., Park, L.A.F.: Approximate document outlier detection using random spectral projection. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 579–590. Springer, Heidelberg (2012)
Chapter Google Scholar
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2), 21–30 (2008)
Article Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011)
Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44(1), 1 (2012)
Article Google Scholar
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622. ACM (2001)
Google Scholar
Farah, M., Vanderpooten, D.: An outranking approach for rank aggregation in information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 591–598. ACM, New York (2007)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer Series in Statistics, vol. 1 (2001)
Google Scholar
Krüpl, B., Holzinger, W., Darmaputra, Y., Baumgartner, R.: A flight meta-search engine with metamorph. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1069–1070. ACM (2009)
Google Scholar
Linden, G., Smith, B., York, J.: Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003)
Article Google Scholar
Liu, Y.T., Liu, T.Y., Qin, T., Ma, Z.M., Li, H.: Supervised rank aggregation. In: Proceedings of the 16th international Conference on World Wide Web, pp. 481–490. ACM (2007)
Google Scholar
Park, L.A.F.: Fast approximate text document clustering using compressive sampling. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 565–580. Springer, Heidelberg (2011)
Chapter Google Scholar
Ronchetti, E., Field, C., Blanchard, W.: Robust linear model selection by cross-validation. Journal of the American Statistical Association 92(439), 1017–1023 (1997)
Article MATH MathSciNet Google Scholar
Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press (2001)
Google Scholar
Shah, A.K.: A simpler approximation for areas under the standard normal curve. The American Statistician 39(1), 80–80 (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Engineering and Mathematics, University of Western Sydney, Australia
Laurence A. F. Park & Glenn Stone

Authors

Laurence A. F. Park
View author publications
You can also search for this author in PubMed Google Scholar
Glenn Stone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, L.A.F., Stone, G. (2014). Inducing Controlled Error over Variable Length Ranked Lists. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics