Observer effect from stateful resources in agent sensing

Eck, Adam; Soh, Leen-Kiat

doi:10.1007/s10458-011-9189-y

Observer effect from stateful resources in agent sensing

Published: 01 February 2012

Volume 26, pages 202–244, (2013)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Adam Eck¹ &
Leen-Kiat Soh¹

300 Accesses
3 Citations
Explore all metrics

Abstract

In many real-world applications of multi-agent systems, agent reasoning suffers from bounded rationality caused by both limited resources and limited knowledge. When agent sensing to overcome its knowledge limitations also requires resource use, the agent’s knowledge refinement is affected due to its inability to always sense when and as accurately as needed, further leading to poor decision making. In this paper, we consider what happens when sensing actions require the use of stateful resources, which we define as resources whose state-dependent behavior changes over time based on usage. Current literature addressing agent sensing with limited resources primarily investigates stateless resources, such as avoiding the use of too much time or energy during sensing. However, sensing itself can change the state of a resource, and thus its behavior, which affects both the information gathered and the resulting knowledge refinement. This produces a phenomenon where the sensing action can and will distort its own outcome (and potentially future outcomes), termed the Observer Effect (OE) after the similar phenomenon in the physical sciences. Under this effect, when deliberating about when and how to perform sensing that requires use of stateful resources, an agent faces a strategic tradeoff between satisfying the need for (1) knowledge refinement to support its reasoning, and (2) avoiding knowledge corruption due to distorted sensing outcomes. To address this tradeoff, we model sensing action selection as a partially observable Markov decision process where an agent optimizes knowledge refinement while considering the (possibly hidden) state of the resources used during sensing. In this model, the agent uses reinforcement learning to learn a controller for action selection, as well as how to predict expected knowledge refinement based on resource use during sensing. Our approach is unique from other bounded rationality and sensing research as we consider how to make decisions about sensing with stateful resources that produce side effects such as the OE, as opposed to simply using stateless resources with no such side effect. We evaluate our approach in a fully and partially observable agent mining simulation. The results demonstrate that considering resource state and the OE during sensing action selection through our approach (1) yielded better knowledge refinement, (2) appropriately balanced current and future refinement to avoid knowledge corruption, and (3) exploited the relationship (i.e., high, positive correlation) between sensing and task performance to boost task performance through improved sensing. Further, our methodology also achieved good knowledge refinement even when the OE is not present, indicating that it can improve sensing performance in a wide variety of environments. Finally, our results also provide insights into the types and configurations of learning algorithms useful for learning within our methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adamczyk, P. D., & Bailey, B. P. (2004). If not now, when? The effects of interruption at different moments within task execution. In Proc. of CHI’04, Vienna, Austria, April 24–29 (pp. 271–278).
Adomavicius G., Tuzhulin A. (2005) Toward the next generation of recommender systems: A survey of state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6): 734–749
Article Google Scholar
Akyildiz I. F., Pompili D., Melodia T. (2005) Underwater acoustic sensor networks: Research challenges. Ad hoc Networks 3(3): 257–279
Article Google Scholar
Araya-Lopez, M., Buffet, O., Thomas, V., & Charpillet, F. (2010). A POMDP extension with belief-dependent rewards. In Proc. of NIPS’10.
Arisha K., Youssef M., Younis M. (2002) Energy-aware TDMA-based MAC for sensor networks. In: Karri R., Goodman D. (eds) System-level power optimization for wireless multimedia communication. Kluwer Academic Publishers, Norwell, MA, pp 21–40
Chapter Google Scholar
Bernstein D. S., Givan R., Immerman N., Zilberstein S. (2002) The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4): 819–840
Article MathSciNet MATH Google Scholar
Boutilier, C. (2002). A POMDP formulation of preference elicitation problems. In Proc. of AAAI’02 (pp. 239–246).
Brafman R. I., Tennenholtz M. (2002) R-max—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3: 213–231
MathSciNet Google Scholar
Casper J., Murphy R. R. (2003) Human-robot interactions during the robot-assisted urban search and rescue response at the World Trade Center. IEEE Transactions on SMC Part B: Cybernetics 33(3): 367–385
Article Google Scholar
Chalupsky, H., et al. (2001). Electric Elves: Applying agent technology to support human organizations. In Proc. of IAAI’01, Seattle, WA, August 7–9 (pp. 51–58).
Cox M. T., Raja A. (2011) Metareasoning: An introduction. In: Cox M., Raja A. (eds) Metareasoning: Thinking about thinking. MIT Press, Cambridge, MA, pp 3–14
Google Scholar
Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. Proc. of AAMAS’08 (pp. 493–500).
Ermon, S., et al. (2010). Playing games against nature: optimal policies for renewable resource allocation. In Proc. of UAI’10.
Fowler H. J., Leland W. E. (1991) Local area network traffic characteristics, with implications for broadband network congestion management. IEEE Journal on Selected Areas of Communications 9(7): 1139–1149
Article Google Scholar
Gers F. A., Schmidhuber J., Cummins J. (2000) Learning to forget: Continual prediction with LSTM. Neural Computation 12(10): 2451–2471
Article Google Scholar
Grass, J., & Zilberstein, S. (1997). Value-driven information gathering. In Proc. of AAAI workshop on building resource-bounded reasoning systems.
Grass J., Zilberstein S. (2000) A value-driven system for autonomous information gathering. Journal of Intelligent Information Systems 14: 5–27
Article Google Scholar
Guo, A. (2003). Decision-theoretic active sensing for autonomous agents. In Proc. of AAMAS’03 (pp. 1002–1003).
Hochreiter S., Schmidhuber J. (1997) Long short-term memory. Neural Computation 9: 1735–1780
Article Google Scholar
Hoey, J., et al. (2007). Assisting persons with dementia during handwashing using a partially observable Markov decision process. In Proc. of ICVS’07.
Josang A. (2001) A logic for uncertain probabilities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9: 279–311
MathSciNet Google Scholar
Kaelbling L. P., Littman M. L., Moore W. (1996) Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4: 237–285
Google Scholar
Kaelbling L. P., Littman M. L., Cassandra A. R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101: 99–134
Article MathSciNet MATH Google Scholar
Khandaker N., Soh L.-K., Miller L. D., Eck A., Jiang H. (2011) Lessons learned from comprehensive deployments of multiagent CSCL applications I-MINDS and ClassroomWiki. IEEE Transactions on Learning Technologies 4(1): 47–58
Article Google Scholar
Klein J., Moon Y., Picard R. W. (2002) This computer responds to user frustration: Theory, design, and results. Interacting with Computers 14: 119–140
Article Google Scholar
Krause, A., & Guestrin, C. (2005). Optimal nonmyopic value of information in graphical models—efficient algorithms and theoretical limits. In Proc. of IJCAI’05 (pp. 1339–1345).
Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In Proc. of AAAI’07.
Krause A., Guestrin C. (2009) Optimizing sensing: From water to the web. IEEE Computer 42(8): 38–45
Article Google Scholar
Krause A. et al (2008) Robust submodular observation selection. Journal of Machine Learning Research 9: 2761–2801
MATH Google Scholar
Landeldt, B., Sookavantana, P., & Seneviratne, A. (2000). The case for a hybrid passive/active network monitoring scheme in the wireless Internet. In Proc. of ICON’00 (pp. 139–143).
Lesser V. et al (2000) BIG: An agent for resource-bounded information gathering and decision making. Artificial Intelligence 118: 197–244
Article MATH Google Scholar
Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. In Proc. of CHI’08 (pp. 107–110).
Monostori L., Vancza J., Kumara S. R. T. (2006) Agent-based systems for manufacturing. CIRP Annals: Manufacturing Technology 55(2): 697–720
Article Google Scholar
Myers K. L. et al (2007) An intelligent personal assistant for task and time management. AI Magazine 28(2): 47–61
Google Scholar
North M. J., Collier N. T., Vos J. R. (2006) Experiences creating three implementations of the Repast agent modeling toolkit. ACM Transactions on Modeling and Computer Simulation 16: 1–25
Article Google Scholar
Padhy, P., Dash, R. K., Martinez, K., & Jennings, N. R. (2006). A utility-based sensing and communication model for a glacial sensor network. In Proc AAMAS’06, Hakodate, Japan, May 8–12 (pp. 1353–1360).
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In Proc. of IJCAI’03 (pp. 1025–1032).
Pollack, M. E., & Ringuette, M. (1990). Introducing the tileworld: Experimentally evaluating agent architectures. In Proc. of AAAI’90 (pp. 183–189).
Raja A., Lesser V. (2007) A framework for meta-level control in multi-agent systems. JAAMAS 15: 147–196
Google Scholar
Ross, S., Chaib-draa, B., & Pineau, J. (2007). Bayes-adaptive POMDPs. In Proc. of NIPS’07.
Ross S., Pineau J., Paquet S., Chaib-draa B. (2008) Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research 32: 663–704
MathSciNet MATH Google Scholar
Rumelhart D. E., Hinton G. E., Williams R. J. (1986) Learning internal representations by error propogation. In: Rumelhart D. E., McClelland J. L. (eds) Parallel distributed processing: explorations in the microstructure of cognitions. MIT Press, Cambridge, MA, pp 318–362
Google Scholar
Shah, R. C., & Rabaey, J. M. (2002). Energy aware routing for low energy ad hoc sensor networks. In Proc. of WCNC’02, March 17–21 (pp. 350–355).
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Proc. UAI’04 (pp. 520–527).
Spaan, M. T. J. (2008). Cooperative active perception using POMDPs. In AAAI 2008 workshop on advancements in POMDP solvers.
Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
The Biofinity Project. (2010). Retrieved March 7, 2011, from http://biofinity.unl.edu.
Watkins, C. J. (1989). Learning from delayed rewards. PhD Thesis, Cambridge University.
Werbos P. J. (1990) Backpropogation through time: What it does and how to do it. Proceedings of the IEEE 78(10): 1550–1560
Article Google Scholar
Weyns D., Steegmans E., Holvoet T. (2004) Towards active perception in situated multi-agent systems. Applied Artificial Intelligence 18: 867–883
Article Google Scholar
Weyns, D., Helleboogh, A., & Holvoet, T. (2005). The packet-world: A test bed for investigating situated multi-agent systems. In R. Unland, M. Klusch, & M. Calisti (Eds.), Software agent-based applications, platforms, and development kits (pp. 383–408).
Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs with recurrent policy gradients. In Proc. of ICANN’07 (pp. 697-706).
Williams R. J. (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8: 229–256
MATH Google Scholar
Williams J. D., Young S. (2007) Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21: 393–422
Article Google Scholar
Yorke-Smith, N., Saddati, S., Meyers, K. L., & Morley, D. N. (2009). Like an intuitive and courteous butler: A proactive personal agent for task management. In Proc. of AAMAS’09, Budapest, Hungary, May 13–15 (pp. 337–344).
Zilberstein S. (1996) Resource-bounded sensing and planning in autonomous systems. Autonomous Robots 3: 31–48
Article Google Scholar
Zilberstein S. (2011) Metareasoning and bounded rationality. In: Cox M., Raja A. (eds) Metareasoning: Thinking about thinking. MIT Press, Cambridge, MA, pp 27–40
Google Scholar
Zilberstein, S., & Russell, S. J. (1993). Anytime sensing, planning, and action: A practical model for robot control. Proc. of IJCAI’93 (pp. 1402–1407).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Nebraska, 256 Avery Hall, Lincoln, NE, 68588-0115, USA
Adam Eck & Leen-Kiat Soh

Authors

Adam Eck
View author publications
You can also search for this author in PubMed Google Scholar
Leen-Kiat Soh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Eck.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eck, A., Soh, LK. Observer effect from stateful resources in agent sensing. Auton Agent Multi-Agent Syst 26, 202–244 (2013). https://doi.org/10.1007/s10458-011-9189-y

Download citation

Published: 01 February 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10458-011-9189-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Observer effect from stateful resources in agent sensing

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

Survey on Explainable AI: From Approaches, Limitations and Applications Aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Observer effect from stateful resources in agent sensing

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

Survey on Explainable AI: From Approaches, Limitations and Applications Aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation