A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners

Cardie, Claire

doi:10.1023/A:1007665204628

A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners

Published: October 2000

Volume 41, pages 85–116, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners

Download PDF

Claire Cardie¹

731 Accesses
9 Citations
Explore all metrics

Abstract

Research in psychology, psycholinguistics, and cognitive science has discovered and examined numerous psychological constraints on human information processing. Short term memory limitations, a focus of attention bias, and a preference for the use of temporally recent information are three examples. This paper shows that psychological constraints such as these can be used effectively as domain-independent sources of bias to guide feature set selection and weighting for case-based learning algorithms.

We first show that cognitive biases can be automatically and explicitly encoded into the baseline instance representation: each bias modifies the representation by changing features, deleting features, or modifying feature weights. Next, we investigate the related problems of cognitive bias selection and cognitive bias interaction for the feature weighting approach. In particular, we compare two cross-validation algorithms for bias selection that make different assumptions about the independence of individual component biases. In evaluations on four natural language learning tasks, we show that the bias selection algorithms can determine which cognitive bias or biases are relevant for each learning task and that the accuracy of the case-based learning algorithm improves significantly when the selected bias(es) are incorporated into the baseline instance representation.

Article PDF

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

Article 02 July 2018

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

Article 07 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aha, D. W. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36, 267–287.
Google Scholar
Aha, D.W. & Goldstone, R. L. (1992). Concept learning and flexible weighting. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 534-539). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Aha, D.W., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66.
Google Scholar
Atkeson, C. G., Moore, A.W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11-73.
Google Scholar
Broadbent, D. E. (1958). Perception and communication. London: Pergamon.
Google Scholar
Cain, T., Pazzani, M., & Silverstein, G. (1991). Using domain knowledge to influence similarity judgement. In Proceedings of the Case-Based Reasoning Workshop (pp. 191-199). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Cardie, C. (1993). Using decision trees to improve case-based learning. In Proceedings of the Tenth International Conference on Machine Learning (pp. 25-32). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Cardie, C. (1997). Empirical methods in information extraction. AI Magazine, 18(4), 65-79.
Google Scholar
Cardie, C. (1999). Integrating case-based learning and cognitive biases for machine learning of natural language. Journal of Experimental and Theoretical Artificial Intelligence, 11, 297-337
Google Scholar
Cardie, C.& Howe, N. (1997). Improving minority class prediction using case-specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 57-65). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Carreiras, M., Gernsbacher, M. A., & Villa, V. (1995). The advantage of first mention in Spanish. Psychonomic Bulletin and Review, 2, 124-129.
Google Scholar
Caruana, R. & Freitag, D. (1994). Greedy attribute selection. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 28-36). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
Google Scholar
Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading mips and memory for knowledge engineering. Communications of the ACM, 35, 48-64.
Google Scholar
Cuetos, F. & Mitchell, D. C. (1988). Cross-linguistic differences in parsing: Restrictions on the use of the late closure strategy in Spanish. Cognition, 30(1), 73-105.
Google Scholar
]Daelemans, W. (Ed.). (1999). Special issue on case-based learning of natural language. Journal of Experimental and Theoretical Artificial Intelligence, 11.
Daelemans, W., van den Bosch, A., & Zavrel, J. (1999). Forgetting exceptions is harmful in language learning. Machine Learning, 34(1–3), 11-43.
Google Scholar
Daneman, M. & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466.
Google Scholar
Daneman, M. & Carpenter, P. A. (1983). Individual differences in integrating information between and within sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 561-584.
Google Scholar
Domingos, P. (1997). Context-sensitive feature selection for lazy learners. Artificial Intelligence Review, 11, 227-253.
Google Scholar
Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Google Scholar
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139-172.
Google Scholar
Frazier, L. & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6, 291-325.
Google Scholar
Friedman, J. H. (1994). Flexible metric nearest neighbor classification. Unpublished manuscript available at playfair.stanford.edu via /pub/friedman/README.
Gernsbacher, M. A., Hargreaves, D. J., & Beeman, M. (1989). Building and accessing clausal representations: The advantage of first mention versus the advantage of clause recency. Journal of Memory and Language, 28, 735-755.
Google Scholar
Gibson, E. (1990). Recency preferences and garden-path effects. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society (pp. 72–79). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., & Hickok, G. (1993). Cross-linguistic attachment preferences: Evidence from English and Spanish. In Sixth Annual CUNY Sentence Processing Conference. Amherst, MA: University of Massachusetts.
Google Scholar
Gordon, D. & desJardins, M. (1995). Evaluation and selection of biases in machine learning. Machine Learning, 20(1–2), 5-22.
Google Scholar
Hastie, T. J. & Tibshirani, R. J. (1994). Discriminant adaptive nearest neighbor classification. Unpublished manuscript available at playfair.stanford.edu as /pub/hastie/dann.ps.Z.
Howe, N.& Cardie, C. (1997). Examining locally varying weights for nearest neighbor algorithms. In Proceedings of the Second International Conference on Case-Based Reasoning (pp. 455-466). Berlin: Springer.
Google Scholar
John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 121-129). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Kimball, J. (1973). Seven principles of surface structure parsing in natural language. Cognition, 2, 15-47.
Google Scholar
King, J.& Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language, 30, 580-602.
Google Scholar
Langley, P. (1996). Elements of machine learning. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Lehnert, W. (1990). Symbolic/subsymbolic sentence analysis: Exploiting the best of two worlds. In J. Barnden & J. Pollack (Eds.), Advances in connectionist and neural computation theory. Norwood, NJ: Ablex Publishers.
Google Scholar
Lehnert, W., Cardie, C., Fisher, D., Riloff, E., & Williams, R. (1991). University of Massachusetts: Description of the CIRCUS system as used in MUC-3. In Proceedings of the Third Message Understanding Conference (MUC-3) (pp. 223-233). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285-318.
Google Scholar
Maron, O. & Moore, A. (1997). The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, 11(1–5), 193-225.
Google Scholar
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(1), 81-97.
Google Scholar
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
Google Scholar
MUC-3 (1991). Proceedings of the Third Message Understanding Conference (MUC-3). San Mateo, CA: Morgan Kaufmann.
Google Scholar
MUC-5 (1994). Proceedings of the Fifth Message Understanding Conference (MUC-5). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Newport, E. (1990). Maturational constraints on language learning. Cognitive Science, 14, 11-28.
Google Scholar
Nicol, J. (1988). Coreference processing during sentence comprehension. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA.
Google Scholar
Provost, F.& Buchanan, B. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20(1–2), 35-62.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10(2), 153-178.
Google Scholar
Skalak, D. (1992). Representing cases as knowledge sources that apply local similarity metrics. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society (pp. 325-330). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 293–301). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213-1228.
Google Scholar
Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior, 18, 645-660.
Google Scholar
Wettschereck, D., Aha, D.W.,& Mohri, T. (1997).A review and empirical evaluation of feature-weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273-314.
Google Scholar
Wilson, R. A. & Keil, F. (Eds.). (1999). The MIT encyclopedia of the cognitive sciences. Cambridge, MA: MIT Press.
Google Scholar
Xu, L., Yan, P., & Chang, T. (1989). Best first strategy for feature selection. In Ninth International Conference on Pattern Recognition (pp. 706-708). IEEE Computer Society Press.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, 14853–7501, USA
Claire Cardie

Authors

Claire Cardie
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cardie, C. A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners. Machine Learning 41, 85–116 (2000). https://doi.org/10.1023/A:1007665204628

Download citation

Issue Date: October 2000
DOI: https://doi.org/10.1023/A:1007665204628

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners

Abstract

Article PDF

Similar content being viewed by others

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Cognitive Bias Approach to Feature Selection and Weighting for Case-Based Learners

Abstract

Article PDF

Similar content being viewed by others

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation