CBR Meets Big Data: A Case Study of Large-Scale Adaptation Rule Generation

Jalali, Vahid; Leake, David

doi:10.1007/978-3-319-24586-7_13

Vahid Jalali¹⁵ &
David Leake¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9343))

Included in the following conference series:

International Conference on Case-Based Reasoning

1094 Accesses
8 Citations

Abstract

Adaptation knowledge generation is a difficult problem for CBR. In previous work we developed ensembles of adaptation for regression (EAR), a family of methods for generating and applying ensembles of adaptation rules for case-based regression. EAR has been shown to provide good performance, but at the cost of high computational complexity. When efficiency problems result from case base growth, a common CBR approach is to focus on case base maintenance, to compress the case base. This paper presents a case study of an alternative approach, harnessing big data methods, specifically MapReduce and locality sensitive hashing (LSH), to make the EAR approach feasible for large case bases without compression. Experimental results show that the new method, BEAR, substantially increases accuracy compared to a baseline big data k-NN method using LSH. BEAR’s accuracy is comparable to that of traditional k-NN without using LSH, while its processing time remains reasonable for a case base of millions of cases. We suggest that increased use of big data methods in CBR has the potential for a departure from compression-based case-base maintenance methods, with their concomitant solution quality penalty, to enable the benefits of full case bases at much larger scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Big data ensembles of adaptations for regression.
2.
http://aws.amazon.com/elasticmapreduce/.

References

Kim, G.H., Trimi, S., Chung, J.H.: Big-data applications in the government sector. Commun. ACM 57(3), 78–85 (2014)
Article Google Scholar
Hoover, W.: Transforming health care through big data. Technical report, Institute for Health Technology Transformation (2013)
Google Scholar
Greengard, S.: Weathering a new era of big data. Commun. ACM 57(9), 12–14 (2014)
Article Google Scholar
Plaza, E.: Semantics and experience in the future web. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 44–58. Springer, Heidelberg (2008)
Chapter Google Scholar
Ontañón, S., Lee, Y.-C., Snodgrass, S., Bonfiglio, D., Winston, F.K., McDonald, C., Gonzalez, A.J.: Case-based prediction of teen driver behavior and skill. In: Lamontagne, L., Plaza, E. (eds.) ICCBR 2014. LNCS, vol. 8765, pp. 375–389. Springer, Heidelberg (2014)
Google Scholar
Cordier, A., Lefevre, M., Champin, P.A., Georgeon, O., Mille, A.: Trace-based reasoning - modeling interaction traces for reasoning on experiences. In: Proceedings of the 2014 Florida AI Research Symposium, pp. 363–368. AAAI Press (2014)
Google Scholar
Smyth, B., Keane, M.: Remembering to forget: a competence-preserving case deletion policy for case-based reasoning systems. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 377–382. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Smyth, B., McKenna, E.: Building compact competent case-bases. In: Althoff, K.-D., Bergmann, R., Branting, L.K. (eds.) ICCBR 1999. LNCS (LNAI), vol. 1650, p. 329. Springer, Heidelberg (1999)
Chapter Google Scholar
Jalali, V., Leake, D.: Extending case adaptation with automatically-generated ensembles of adaptation rules. In: Delany, S.J., Ontañón, S. (eds.) ICCBR 2013. LNCS, vol. 7969, pp. 188–202. Springer, Heidelberg (2013)
Chapter Google Scholar
Jalali, V., Leake, D.: A context-aware approach to selecting adaptations for case-based reasoning. In: Brézillon, P., Blackburn, P., Dapoigny, R. (eds.) CONTEXT 2013. LNCS, vol. 8175, pp. 101–114. Springer, Heidelberg (2013)
Chapter Google Scholar
Jalali, V., Leake, D.: Adaptation-guided case base maintenance. In: Proceedings of the Twenty-Eighth Conference on Artificial Intelligence, pp. 1875–1881. AAAI Press (2014)
Google Scholar
Jalali, V., Leake, D.: On retention of adaptation rules. In: Lamontagne, L., Plaza, E. (eds.) ICCBR 2014. LNCS, vol. 8765, pp. 200–214. Springer, Heidelberg (2014)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. STOC 1998, pp. 604–613. ACM, New York (1998)
Google Scholar
Daengdej, J., Lukose, D., Tsui, E., Beinat, P., Prophet, L.: Dynamically creating indices for two million cases: a real world problem. In: Smith, I., Faltings, B. (eds.) Advances in Case-Based Reasoning, pp. 105–119. Springer, Berlin (1996)
Chapter Google Scholar
Beaver, I., Dumoulin, J.: Applying mapreduce to learning user preferences in near real-time. In: Delany, S.J., Ontañón, S. (eds.) ICCBR 2013. LNCS, vol. 7969, pp. 15–28. Springer, Heidelberg (2013)
Chapter Google Scholar
Francis, A., Ram, A.: Computational models of the utility problem and their application to a utility analysis of case-based reasoning. In: Proceedings of the Workshop on Knowledge Compilation and Speed-Up Learning (1993)
Google Scholar
Smyth, B., Cunningham, P.: The utility problem analysed: a case-based reasoning perspective. In: Proceedings of the Third European Workshop on Case-Based Reasoning, pp. 392–399. Springer, Berlin (1996)
Google Scholar
Craw, S., Massie, S., Wiratunga, N.: Informed case base maintenance: a complexity profiling approach. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence, pp. 1618–1621. AAAI Press (2007)
Google Scholar
Muñoz-Ávila, H.: A case retention policy based on detrimental retrieval. In: Althoff, K.-D., Bergmann, R., Branting, L.K. (eds.) ICCBR 1999. LNCS (LNAI), vol. 1650, pp. 276–287. Springer, Heidelberg (1999)
Chapter Google Scholar
Ontañón, S., Plaza, E.: Collaborative case retention strategies for CBR agents. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 392–406. Springer, Heidelberg (2003)
Chapter Google Scholar
Salamó, M., López-Sánchez, M.: Adaptive case-based reasoning using retention and forgetting strategies. Know.-Based Syst. 24(2), 230–247 (2011)
Article Google Scholar
Zhu, J., Yang, Q.: Remembering to add: competence-preserving case-addition policies for case base maintenance. In: Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 234–241. Morgan Kaufmann (1999)
Google Scholar
Angiulli, F.: Fast condensed nearest neighbor rule. In: Proceedings of the Twenty-second International Conference on Machine Learning, pp. 25–32. ACM, New York (2005)
Google Scholar
Wilson, D., Martinez, T.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article MATH Google Scholar
Brighton, H., Mellish, C.: Identifying competence-critical instances for instance-based learners. In: Instance Selection and Construction for Data Mining, The Springer International Series in Engineering and Computer Science, vol. 608, pp. 77–94. Springer, Berlin (2001)
Google Scholar
Houeland, T.G., Aamodt, A.: The utility problem for lazy learners - towards a non-eager approach. In: Bichindaritz, I., Montani, S. (eds.) ICCBR 2010. LNCS, vol. 6176, pp. 141–155. Springer, Heidelberg (2010)
Chapter Google Scholar
Hanney, K., Keane, M.T.: The adaptation knowledge bottleneck: how to ease it by learning from cases. In: Leake, D.B., Plaza, E. (eds.) ICCBR 1997. LNCS, vol. 1266. Springer, Heidelberg (1997)
Chapter Google Scholar
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. VLDB 99, 518–529 (1999)
Google Scholar
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: IEEE International Conference on Computer Vision ICCV (2009)
Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004, pp. 253–262. ACM, New York (2004)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010) http://archive.ics.uci.edu/ml
Hayes, M., Shah, S.: Hourglass: a library for incremental processing on hadoop. In: 2013 IEEE International Conference on Big Data, pp. 742–752 (2013)
Google Scholar
Jalali, V., Leake, D.: Manual for EAR4 and CAAR weka plugins, case-based regression and ensembles of adaptations, version 1. Technical report TR 717, Computer Science Department. Indiana University, Bloomington (2015)
Google Scholar
Witten, I., Frank, E., Hall, M.: Data mining: practical machine learning tools and techniques with Java implementations, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Computing, Indiana University, Bloomington, IN, 47408, USA
Vahid Jalali & David Leake

Authors

Vahid Jalali
View author publications
You can also search for this author in PubMed Google Scholar
David Leake
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vahid Jalali .

Editor information

Editors and Affiliations

Universität Paderborn, Paderborn, Germany
Eyke Hüllermeier
Goethe-Universität Frankfurt, Frankfurt/Main, Germany
Mirjam Minor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jalali, V., Leake, D. (2015). CBR Meets Big Data: A Case Study of Large-Scale Adaptation Rule Generation. In: Hüllermeier, E., Minor, M. (eds) Case-Based Reasoning Research and Development. ICCBR 2015. Lecture Notes in Computer Science(), vol 9343. Springer, Cham. https://doi.org/10.1007/978-3-319-24586-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-24586-7_13
Published: 26 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24585-0
Online ISBN: 978-3-319-24586-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics