Abstract
Research in psychology, psycholinguistics, and cognitive science has discovered and examined numerous psychological constraints on human information processing. Short term memory limitations, a focus of attention bias, and a preference for the use of temporally recent information are three examples. This paper shows that psychological constraints such as these can be used effectively as domain-independent sources of bias to guide feature set selection and weighting for case-based learning algorithms.
We first show that cognitive biases can be automatically and explicitly encoded into the baseline instance representation: each bias modifies the representation by changing features, deleting features, or modifying feature weights. Next, we investigate the related problems of cognitive bias selection and cognitive bias interaction for the feature weighting approach. In particular, we compare two cross-validation algorithms for bias selection that make different assumptions about the independence of individual component biases. In evaluations on four natural language learning tasks, we show that the bias selection algorithms can determine which cognitive bias or biases are relevant for each learning task and that the accuracy of the case-based learning algorithm improves significantly when the selected bias(es) are incorporated into the baseline instance representation.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aha, D. W. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36, 267–287.
Aha, D.W. & Goldstone, R. L. (1992). Concept learning and flexible weighting. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 534-539). Hillsdale, NJ: Lawrence Erlbaum Associates.
Aha, D.W., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66.
Atkeson, C. G., Moore, A.W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11-73.
Broadbent, D. E. (1958). Perception and communication. London: Pergamon.
Cain, T., Pazzani, M., & Silverstein, G. (1991). Using domain knowledge to influence similarity judgement. In Proceedings of the Case-Based Reasoning Workshop (pp. 191-199). San Francisco, CA: Morgan Kaufmann.
Cardie, C. (1993). Using decision trees to improve case-based learning. In Proceedings of the Tenth International Conference on Machine Learning (pp. 25-32). San Francisco, CA: Morgan Kaufmann.
Cardie, C. (1997). Empirical methods in information extraction. AI Magazine, 18(4), 65-79.
Cardie, C. (1999). Integrating case-based learning and cognitive biases for machine learning of natural language. Journal of Experimental and Theoretical Artificial Intelligence, 11, 297-337
Cardie, C.& Howe, N. (1997). Improving minority class prediction using case-specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 57-65). San Francisco, CA: Morgan Kaufmann.
Carreiras, M., Gernsbacher, M. A., & Villa, V. (1995). The advantage of first mention in Spanish. Psychonomic Bulletin and Review, 2, 124-129.
Caruana, R. & Freitag, D. (1994). Greedy attribute selection. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 28-36). San Francisco, CA: Morgan Kaufmann.
Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading mips and memory for knowledge engineering. Communications of the ACM, 35, 48-64.
Cuetos, F. & Mitchell, D. C. (1988). Cross-linguistic differences in parsing: Restrictions on the use of the late closure strategy in Spanish. Cognition, 30(1), 73-105.
]Daelemans, W. (Ed.). (1999). Special issue on case-based learning of natural language. Journal of Experimental and Theoretical Artificial Intelligence, 11.
Daelemans, W., van den Bosch, A., & Zavrel, J. (1999). Forgetting exceptions is harmful in language learning. Machine Learning, 34(1–3), 11-43.
Daneman, M. & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466.
Daneman, M. & Carpenter, P. A. (1983). Individual differences in integrating information between and within sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 561-584.
Domingos, P. (1997). Context-sensitive feature selection for lazy learners. Artificial Intelligence Review, 11, 227-253.
Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139-172.
Frazier, L. & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6, 291-325.
Friedman, J. H. (1994). Flexible metric nearest neighbor classification. Unpublished manuscript available at playfair.stanford.edu via /pub/friedman/README.
Gernsbacher, M. A., Hargreaves, D. J., & Beeman, M. (1989). Building and accessing clausal representations: The advantage of first mention versus the advantage of clause recency. Journal of Memory and Language, 28, 735-755.
Gibson, E. (1990). Recency preferences and garden-path effects. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society (pp. 72–79). Hillsdale, NJ: Lawrence Erlbaum Associates.
Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., & Hickok, G. (1993). Cross-linguistic attachment preferences: Evidence from English and Spanish. In Sixth Annual CUNY Sentence Processing Conference. Amherst, MA: University of Massachusetts.
Gordon, D. & desJardins, M. (1995). Evaluation and selection of biases in machine learning. Machine Learning, 20(1–2), 5-22.
Hastie, T. J. & Tibshirani, R. J. (1994). Discriminant adaptive nearest neighbor classification. Unpublished manuscript available at playfair.stanford.edu as /pub/hastie/dann.ps.Z.
Howe, N.& Cardie, C. (1997). Examining locally varying weights for nearest neighbor algorithms. In Proceedings of the Second International Conference on Case-Based Reasoning (pp. 455-466). Berlin: Springer.
John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 121-129). San Francisco, CA: Morgan Kaufmann.
Kimball, J. (1973). Seven principles of surface structure parsing in natural language. Cognition, 2, 15-47.
King, J.& Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language, 30, 580-602.
Langley, P. (1996). Elements of machine learning. San Francisco, CA: Morgan Kaufmann.
Lehnert, W. (1990). Symbolic/subsymbolic sentence analysis: Exploiting the best of two worlds. In J. Barnden & J. Pollack (Eds.), Advances in connectionist and neural computation theory. Norwood, NJ: Ablex Publishers.
Lehnert, W., Cardie, C., Fisher, D., Riloff, E., & Williams, R. (1991). University of Massachusetts: Description of the CIRCUS system as used in MUC-3. In Proceedings of the Third Message Understanding Conference (MUC-3) (pp. 223-233). San Mateo, CA: Morgan Kaufmann.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285-318.
Maron, O. & Moore, A. (1997). The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, 11(1–5), 193-225.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(1), 81-97.
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
MUC-3 (1991). Proceedings of the Third Message Understanding Conference (MUC-3). San Mateo, CA: Morgan Kaufmann.
MUC-5 (1994). Proceedings of the Fifth Message Understanding Conference (MUC-5). San Mateo, CA: Morgan Kaufmann.
Newport, E. (1990). Maturational constraints on language learning. Cognitive Science, 14, 11-28.
Nicol, J. (1988). Coreference processing during sentence comprehension. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA.
Provost, F.& Buchanan, B. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20(1–2), 35-62.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10(2), 153-178.
Skalak, D. (1992). Representing cases as knowledge sources that apply local similarity metrics. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 325-330). Hillsdale, NJ: Lawrence Erlbaum Associates.
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 293–301). San Francisco, CA: Morgan Kaufmann.
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213-1228.
Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior, 18, 645-660.
Wettschereck, D., Aha, D.W.,& Mohri, T. (1997).A review and empirical evaluation of feature-weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273-314.
Wilson, R. A. & Keil, F. (Eds.). (1999). The MIT encyclopedia of the cognitive sciences. Cambridge, MA: MIT Press.
Xu, L., Yan, P., & Chang, T. (1989). Best first strategy for feature selection. In Ninth International Conference on Pattern Recognition (pp. 706-708). IEEE Computer Society Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cardie, C. A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners. Machine Learning 41, 85–116 (2000). https://doi.org/10.1023/A:1007665204628
Issue Date:
DOI: https://doi.org/10.1023/A:1007665204628