The Virtue of Reward: Performance, Reinforcement and Discovery in Case-Based Reasoning
Agents commonly reason and act over extended periods of time. In some environments, for an agent to solve even a single problem requires many decisions and actions. Consider a robot or animat situated in a real or virtual world, acting to achieve some distant goal; or an agent that controls a sequential process such as a factory production line; or a conversational diagnostic system or recommender system. Equally, over its life time, a long-lived agent will make many decisions and take many actions, even if each problem-solving episode requires just one decision and one action. In spam detection, for example, each incoming email requires a single classification decision before it moves to its designated folder; but continuous operation requires numerous decisions and actions.