Skip to main content

Data Fusion Performance Prophecy: A Random Forest Revelation

  • Conference paper
  • First Online:
Information Integration and Web Intelligence (iiWAS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14416))

  • 430 Accesses

Abstract

Data fusion synthesizes results from diverse sources, but the performance impact remains mysterious. This research reveals the inner workings of fusion through machine prophecy. Constructing a random forest model using TREC dataset benchmarks, we accurately predicted the performance of two fusion algorithms. The model achieved near perfect R2 scores above 0.9 by exploiting meaningful statistical features. Compared to linear regression, the tree-based ensemble provides superior insight. The importance of newly identified drivers, like P@1000 metrics, is quantified. With this prescient view, researchers can refine fusion techniques to offer better search. By uncovering the secrets of fusion success, machine learning guides the path to retrieval excellence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Https://trec.nist.gov/

  2. 2.

    It is sklearn.ensemble.RandomForestRegressor.

References

  1. Huang, Y., Xu, Q., Liu, Y., Xu, C., Wu, S.: Data Fusion Methods with Graded Relevance Judgment. WISA, pp. 227–239 (2022)

    Google Scholar 

  2. Kurland, O., Culpepper, J.S.: Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial. SIGIR, pp. 1383–1386 (2018)

    Google Scholar 

  3. Lee, J.-H.: Analyses of Multiple Evidence Combination. SIGIR, pp. 267–276 (1997)

    Google Scholar 

  4. Vogt, C.C., Cottrell, G.W.: Predicting the Performance of Linearly Combined IR Systems. SIGIR, pp. 190–196 (1998)

    Google Scholar 

  5. Wu, S., Crestani, F.: Data fusion with estimated weights. CIKM, pp. 648–651 (2002)

    Google Scholar 

  6. Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D.A., Frieder, O., Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system. J. Assoc. Inf. Sci. Technol. 55(10), 859–868 (2004)

    Article  Google Scholar 

  7. Lillis, D., Zhang, L., Toolan, F., Collier, R.W., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. SIGIR, pp. 347–354 (2010)

    Google Scholar 

  8. Wu, S.: Linear combination of component results in information retrieval. Data Knowl. Eng. 71(1), 114–126 (2012)

    Article  Google Scholar 

  9. Juárez-González, A., Montes-y-Gómez, M., Villaseñor-Pineda, L., Pinto-Avendaño, D., Pérez-Coutiño, M.: Selecting the N-top retrieval result lists for an effective data fusion. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, pp. 580–589. Springer Berlin Heidelberg, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12116-6_49

    Chapter  Google Scholar 

  10. Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., Wong, K.-F.: Ranking model selection and fusion for effective microblog search. SoMeRA@SIGIR, pp. 21–26 (2014)

    Google Scholar 

  11. Balasubramanian, N., Allan, J.: Learning to select rankers. SIGIR, 855–856 (2010)

    Google Scholar 

  12. Peng, J., Macdonald, C., Ounis, I.: Learning to Select a Ranking Function. ECIR, pp. 114–126 (2010)

    Google Scholar 

  13. Markovits, G., Shtok, A., Kurland, O., Carmel, D.: Predicting query performance for fusion-based retrieval. CIKM, pp. 813–822 (2012)

    Google Scholar 

  14. Roitman, H.: Enhanced Performance Prediction of Fusion-based Retrieval. ICTIR, pp. 195–198 (2018)

    Google Scholar 

  15. Faggioli, G.: Enabling Performance Prediction in Information Retrieval Evaluation. SIGIR, p. 2701 (2021)

    Google Scholar 

  16. Wu, S., McClean, S.I.: Performance prediction of data fusion for information retrieval. Inf. Process. Manag. 42(4), 899–915 (2006)

    Article  Google Scholar 

  17. Shaw, J.A., Fox, E.A.: Combination of Multiple Searches. TREC, pp. 105–108 (1994)

    Google Scholar 

  18. Javed A. Aslam, Mark H. Montague. Models for Metasearch. SIGIR 2001: 275–284

    Google Scholar 

  19. Montague, M.H., Aslam, J.A.: Condorcet fusion for improved retrieval. CIKM, pp. 538–548 (2002)

    Google Scholar 

  20. Sivaram, M., Batri, K., Mohammed, A.S., Porkodi, V., Kousik, N.V.: Data fusion using Tabu crossover genetic algorithm in information retrieval. J. Intell. Fuzzy Syst. 39(4), 5407–5416 (2020)

    Google Scholar 

  21. Valadez, J.H., Morales-González, E., Fernández-Reyes, F.C., Montes-y-Gómez, M., Fuentes-Pacheco, J., Rendón-Mancha, J.M.: Exploiting hierarchical dependence structures for unsupervised rank fusion in information retrieval. J. Intell. Inf. Syst. 60(3), 853–876 (2023)

    Google Scholar 

  22. Wu, S., Huang, C., Li, L., Crestani, F.: Fusion-based methods for result diversification in web search. Inf. Fusion 45, 16–26 (2019)

    Article  Google Scholar 

  23. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengli Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Wu, S. (2023). Data Fusion Performance Prophecy: A Random Forest Revelation. In: Delir Haghighi, P., et al. Information Integration and Web Intelligence. iiWAS 2023. Lecture Notes in Computer Science, vol 14416. Springer, Cham. https://doi.org/10.1007/978-3-031-48316-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48316-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48315-8

  • Online ISBN: 978-3-031-48316-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics