Optimal Control to Support High-Level User Goals in Human-Computer Interaction

Gebhardt, Christoph; Hilliges, Otmar

doi:10.1007/978-3-030-82681-9_2

Christoph Gebhardt⁴ &
Otmar Hilliges⁴

Part of the book series: Human–Computer Interaction Series ((HCIS))

2083 Accesses

Abstract

With emerging technologies like robots, mixed-reality systems or mobile devices, machine-provided capabilities are increasing, so is the complexity of their control and display mechanisms. To address this dichotomy, we propose optimal control as a framework to support users in achieving their high-level goals in human-computer tasks. We reason that it will improve user support over usual approaches for adaptive interfaces as its formalism implicitly captures the iterative nature of human-computer interaction. We conduct two case studies to test this hypothesis. First, we propose a model-predictive-control-based optimization scheme that supports end-users to plan and execute robotic aerial videos. Second, we introduce a reinforcement-learning-based method to adapt mixed-reality augmentations based on users’ preferences or tasks learned from their gaze interactions with a UI. Our results show that optimal control can better support users’ high-level goals in human-computer tasks than common approaches. Optimal control models human-computer interaction as a sequential decision problem which represents its nature and, hence, results in better predictability of user behavior than for other methods. In addition, our work highlights that optimization- and learning-based optimal control have complementary strengths with respect to interface adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We refer the reader to [30] for details on the results and experimental design of both studies.
2.
This also prevents solutions of infinitely long trajectories in time where adding steps with \(\mathbf {u}_i\approx 0\) is free w.r.t. to Eq. (10)).
3.
These points can be seen Fig. 3 and are the intersections of the blue dotted lines.
4.
For more details on experimental design and results, see [33].
5.
For a more detailed version of this section, we refer the interested reader to [28].
6.
Myopic policies only consider the attainable reward in the next state and neglect other future states when selecting an action.

References

Abbeel P, Dolgov D, Ng AY, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: IEEE international conference on intelligent robots and systems 2008. IROS ’08. IEEE, pp 1083–1090
Google Scholar
Pieter A, Ng Andrew Y (2004) Apprenticeship learning via inverse reinforcement learning. p 1
Google Scholar
Kumaripaba A, Alan M, Antti O, Giulio J, Dorota G (2016) Beyond relevance: adapting exploration/exploitation in information retrieval. Association for Computing Machinery, New York, NY, USA
Google Scholar
Audronis T (2014) How to get cinematic drone shots
Google Scholar
Aytar Y, Pfaff T, Budden D, Le Paine T, Wang Z, de Freitas N (2018) Playing hard exploration games by watching youtube. In: Advances in neural information processing systems
Google Scholar
Gilles B, Antti O, Timo K, Sabrina H (2013) Menuoptimizer: interactive optimization of menu systems. pp 331–342
Google Scholar
Banovic N, Buzali T, Chevalier F, Mankoff J, Dey AK (2016) Modeling and understanding human routine behavior. In: Proceedings of the 2016 CHI conference on human factors in computing systems, CHI ’16. ACM, pp 248–260
Google Scholar
Bemporad A, Morari M, Dua V, Pistikopoulos EN (2002) The explicit linear quadratic regulator for constrained systems. Automatica 38(1):3–20
Google Scholar
Bertsekas Dimitri P, Tsitsiklis John N (1995). Neuro-dynamic programming: an overview, vol 1. IEEE, pp 560–564
Google Scholar
Bronner S, Shippen J (2015) Biomechanical metrics of aesthetic perception in dance. Exp Brain Res 233(12), 3565–3581:12
Google Scholar
Chapanis A (1976) Engineering psychology. Rand McNally, Chicago
Google Scholar
Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi H (eds) (2019) Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the twelfth ACM international conference on web search and data mining, WSDM ’19. ACM, pp 456–464
Google Scholar
Chen X, Bailly G, Brumby DP, Oulasvirta A, Howes A (2015) The emergence of interactive behavior: A model of rational menu search. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, CHI ’15, pp 4217-4226, New York, NY, USA. Association for Computing Machinery
Google Scholar
Xiuli C, Sandra Dorothee S, Chris B, Andrew H (2017). A cognitive model of how people make decisions through interaction with visual displays. Association for Computing Machinery, New York, NY, USA
Google Scholar
Cheng E (2016) Aerial photography and videography using drones, vol 1. Peachpit Press
Google Scholar
Chipalkatty R, Droge G, Egerstedt MB (2013) Less is more: mixed-initiative model-predictive control with human inputs. IEEE Trans Rob 29(3):695–703
Article Google Scholar
Chipalkatty R, Egerstedt M (2010) Human-in-the-loop: Terminal constraint receding horizon control with human inputs. pp 2712–2717
Google Scholar
Christiano PF, Leike J, Brown T, Martic M Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: Advances in neural information processing systems, pp 4299–4307
Google Scholar
Clarke DW, Mohtadi C, Tuffs PS (1987) Generalized predictive control-part i. the basic algorithm. Automatica 23(2):137–148
Article Google Scholar
Coates A, Abbeel P, Ng AY (2009) Apprenticeship learning for helicopter control. Commun ACM 52(7):97–105
Article Google Scholar
Cutler CR, Ramaker BL (1980) Dynamic matrix control - a computer control algorithm. In: Joint automatic control conference, vol 17, p 72
Google Scholar
Dulac-Arnold G, Evans R, van Hasselt H, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015). Deep reinforcement learning in large discrete action spaces. arXiv:1512.07679
Engbert R, Kliegl R (2003) Microsaccades uncover the orientation of covert attention. Vis Res 43(9):1035–1045
Article Google Scholar
Findlater L, Gajos KZ (2009) Design space and evaluation challenges of adaptive graphical user interfaces. AI Mag 30(4):68–68
Google Scholar
Frans K, Ho J, Chen X, Abbeel X, Schulman J (2017) Meta learning shared hierarchies. arXiv:1710.09767
Fritsch FN, Carlson RE (1980) Monotone piecewise cubic interpolation. SIAM J Numer Anal 17(2):238–246
Article MathSciNet Google Scholar
Gašić M, Young S (2014) Gaussian processes for POMDP-based dialogue manager optimization. IEEE Trans Audio Speech Lang Process 22(1):28–40
Article Google Scholar
Gebhardt C, Hecox B, van Opheusden B, Wigdor D, Hillis J, Hilliges O, Benko H (2019) Learning cooperative personalized policies from gaze data. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, UIST ’19, New York, NY, US. ACM
Google Scholar
gebhardt c, hepp b, naegeli t, stevsic s, hilliges o (2061) airways: optimization-based Planning of Quadrotor Trajectories according to High-Level User Goals. In: ACM SIGCHI conference on human factors in computing systems, CHI ’16, New York, NY, USA. ACM
Google Scholar
Gebhardt C, Hilliges O (2018) WYFIWYG: investigating effective user support in aerial videography. arXiv:1801.05972
Christoph G, Otmar H (2020) Optimizing for cinematographic quadrotor camera target framing. In: Submission to ACM SIGCHI
Google Scholar
Gebhardt C, Oulasvirta A, Hilliges O (2020) Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. arXiv:2001.02122
Gebhardt C, Stevsic S, Hilliges O (2018) Optimizing for aesthetically pleasing quadrotor camera motion. ACM Trans Graph (Proc ACM SIGGRAPH) 37(4):90:1–90:11:8
Google Scholar
Ali G, Judith B, Atsuto M, Danica K, Mårten B (2016) A sensorimotor reinforcement learning framework for physical human-robot interaction. pp 2682–2688
Google Scholar
Dorota G, Tuukka R, Ksenia K, Kumaripaba A, Samuel K, Giulio J (2013) Directing exploratory search: Reinforcement learning from user interactions with keywords. pp 117–128
Google Scholar
Görges D (2017) Relations between model predictive control and reinforcement learning. IFAC-PapersOnLine 50(1):4920–4928
Google Scholar
Grieder P, Borrelli F, Torrisi F, Morari M (2004) Computation of the constrained infinite time linear quadratic regulator. Automatica 40(4):701–708
Google Scholar
Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. In: Advances in neural information processing systems, pp 3909–3917
Google Scholar
Hennessy J (2015) 13 powerful tips to improve your aerial cinematography
Google Scholar
Ho B-J, Balaji B, Koseoglu M, Sandha S, Pei S, Srivastava M (2020) Quick question: Interrupting users for microtasks with reinforcement learning. arXiv:2007.09515
Hogan N (1984) Adaptive control of mechanical impedance by coactivation of antagonist muscles. IEEE Trans Autom Control 29(8):681–690
Google Scholar
Horvitz EJ, Breese JS, Heckerman D, Hovel D, Rommelse K (2013) The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. arXiv:1301.7385
Howes A, Chen X, Acharya A, Lewis RL (2018) Interaction as an emergent property of a partially observable markov decision process. Computational interaction design. pp 287–310
Google Scholar
Zehong H, Liang Y, Zhang J, Li Z, Liu Y (2018) Inference aided reinforcement learning for incentive mechanism design in crowdsourcing. In: Advances in Neural Information Processing Systems. NIPS ’18:5508–5518
Google Scholar
Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, Hutter M (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot 4(26)
Google Scholar
Anthony J, Krzysztof GZ (2012) Systems that adapt to their users. The Human-Computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC Press, Boca Raton, FL
Google Scholar
Johansen TA (2004) Approximate explicit receding horizon control of constrained nonlinear systems. Automatica 40(2):293–300
Article MathSciNet Google Scholar
Jorgensen SJ, Campbell O, Llado T, Kim D, Ahn J, Sentis L (2017) Exploring model predictive control to generate optimal control policies for hri dynamical systems. arXiv:1701.03839
Joubert N, Roberts M, Truong A, Berthouzoz F, Hanrahan P (2015) An interactive tool for designing quadrotor camera shots. vol 34. ACM, New York, NY, USA, pp 238:1–238:11
Google Scholar
Julier S, Lanzagorta M, Baillot Y, Rosenblum L, Feiner S, Hollerer T, Sestito S (2000) Information filtering for mobile augmented reality. In: Proceedings IEEE and ACM international symposium on augmented reality (ISAR 2000). IEEE, pp 3–11
Google Scholar
Kartoun U, Stern H, Edan Y (2010) A human-robot collaborative reinforcement learning algorithm. J Intell Robot Syst 60(2):217–239
Article Google Scholar
Kirches C (2011) Fast numerical methods for mixed-integer nonlinear model-predictive control. Springer
Google Scholar
Krishnan S, Garg A, Liaw R, Miller L, Pokorny FT, Goldberg K (2016) Hirl: hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv:1604.06508
Kostadin K, Jason P, Elizabeth WD (2016) “Silence your phones” smartphone notifications increase inattention and hyperactivity symptoms. pp 1011–1020
Google Scholar
Lam D, Manzie C, Good MC (2013) Multi-axis model predictive contouring control. Int J Control 86(8):1410–1424
Article MathSciNet Google Scholar
(2020) Optimal control for electromagnetic haptic guidance systems. In: Langerak Thomas, Zarate Juan, Vechev Velko, Lindlbauer David, Panozzo Daniele, Hilliges Otmar (eds)
Google Scholar
Lee SJ, Popović Z (2010) Learning behavior styles with inverse reinforcement learning. In: ACM transactions on graphics (TOG), vol 29. ACM, p 122
Google Scholar
Lee Y, Wampler K, Bernstein G, Popović J, Popović Z (2010) Motion fields for interactive character locomotion. In: ACM transactions on graphics (TOG), vol 29. ACM, p 138
Google Scholar
Liebman E, Saar-Tsechansky M, Stone P (2015) Dj-mc: a reinforcement-learning agent for music playlist recommendation. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems, AAMAS ’15, pp 591–599
Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (eds) (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Liniger A, Domahidi A, Morari M (2015) Optimization-based autonomous racing of 1: 43 scale rc cars. Opt Control Appl Methods 36(5):628–647
Article MathSciNet Google Scholar
Liu F, Tang R, Li X, Zhang W, Ye Y, Chen H, Guo H, Zhang Y (2018) Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv:1810.12027
Lo W-Y, Zwicker M (2008) Real-time planning for parameterized human motion. In: Proceedings of the 2008 ACM SIGGRAPH/eurographics symposium on computer animation, SCA ’08, pp 29–38
Google Scholar
Justin M, Wei L, Tovi G, George F (2009) Communitycommands: command recommendations for software applications. pp 193–202
Google Scholar
McCann J, Pollard N (2007) Responsive characters from motion fragments. In: ACM transactions on graphics (TOG), vol 26. ACM, p 6
Google Scholar
McRuer Duane T, Jex Henry R (1967) A review of quasi-linear pilot models
Google Scholar
Michalska H, Mayne DQ (1993) Robust receding horizon control of constrained nonlinear systems. IEEE Trans Autom Control 38(11):1623–1633, 11
Google Scholar
Bastian M, Andreas K (2010) User model for predictive calibration control on interactive screens. pp 32–37
Google Scholar
Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2006) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. J Robot Soc Jpn 24(7):820–829
Article Google Scholar
Modares H, Ranatunga I, Lewis FL, Popa DO (2015) Optimized assistive human-robot interaction using reinforcement learning. IEEE Trans Cybernet 46(3):655–667
Article Google Scholar
Müller J, Oulasvirta A, Murray-Smith R (2017) Control theoretic models of pointing. ACM Trans Comput-Hum Interact (TOCHI) 24(4):1–36
Article Google Scholar
Murray-Smith R (2018) Control theory, dynamics and continuous interaction
Google Scholar
Nägeli T, Alonso-Mora J, Domahidi A, Rus D, Hilliges O (2017) Real-time motion planning for aerial videography with dynamic obstacle avoidance and viewpoint optimization. IEEE Robot Autom Lett PP(99):1–1
Google Scholar
Nägeli T, Meier L, Domahidi A, Alonso-Mora J, Hilliges O (2017) Real-time planning for automated multi-view drone cinematography. ACM Trans Graph 36(4):132:1–132:10
Google Scholar
Thomas N, Ying-Yin H, Andreas K (2014) Planning redirection techniques for optimal free walking experience using model predictive control. pp 111–118
Google Scholar
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00, pp 663–670
Google Scholar
Oliff H, Liu Y, Kumar M, Williams M, Ryan M (2020) Reinforcement learning for facilitating human-robot-interaction in manufacturing. J Manuf Syst 56:326–340
Article Google Scholar
Park S, Gebhardt C, Rädle R, Feit A, Vrzakova H, Dayama N, Yeo H-S, Klokmose C, Quigley A, Oulasvirta A, Hilliges O (2018) AdaM: adapting multi-user interfaces for collaborative environments in real-time. In: ACM SIGCHI conference on human factors in computing systems, cHI ’18, New York, NY, USA. ACM
Google Scholar
Bin Peng X, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans Graph 37(4):8
Google Scholar
Bin Peng X, Kanazawa A, Malik J, Abbeel P, Levine S (2018) Sfv: Reinforcement learning of physical skills from videos. ACM Trans Graph, 37
Google Scholar
Purves D, Fitzpatrick D, Katz LC, Lamantia AS, McNamara JO, Williams SM, Augustine GJ (2000) Neuroscience. Sinauer Associates
Google Scholar
Rachael JA, Rault A, Testud JL, Papon J (1978) Model predictive heuristic control: application to an industrial process. Automatica 14(5):413–428
Article Google Scholar
Mizanoor Rahman SM, Behzad S, Yue W (2015)Trust-based optimal subtask allocation and model predictive control for human-robot collaborative assembly in manufacturing, vol 57250. American Society of Mechanical Engineers, p page V002T32A004
Google Scholar
Rajeswaran A, Lowrey K, Todorov EV, Kakade SM (2017) Towards generalization and simplicity in continuous control. In Advances in Neural Information Processing Systems. NIPS ’17:6550–6561
Google Scholar
Roberts M, Hanrahan P (2016) Generating dynamically feasible trajectories for quadrotor cameras. ACM Trans Graph 354:61:1-61:11
Google Scholar
Safavi A, Zadeh MH (2017) Teaching the user by learning from the user: personalizing movement control in physical human-robot interaction. IEEE/CAA J Autom Sinica 4(4):704–713
Article Google Scholar
Sheridan TB, Ferrell WR (1974) Man-machine systems; Information, control, and decision models of human performance. The MIT press
Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Su P-H, Budzianowski P, Ultes S, Gasic M, Young S (2017) Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. arXiv:1707.00130
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22
Article Google Scholar
Rowan S, Kieran F, Owen C (2019) A reinforcement learning and synthetic data approach to mobile notification management. pp 155–164
Google Scholar
Teramae T, Noda T, Morimoto J (2018) Emg-based model predictive control for physical human-robot interaction: application for assist-as-needed control. IEEE Robot Autom Lett 3(1):210–217
Article Google Scholar
Tjomsland J, Shafti A, Aldo Faisal A (2019) Human-robot collaboration via deep reinforcement learning of real-world interactions. arXiv:1912.01715
Treuille A, Lee Y, Popović Z (2007) Near-optimal character animation with continuous control. ACM Trans Graph 26(3):7
Article Google Scholar
(1989) Christopher John Cornish Hellaby Watkins. Learning from delayed rewards
Google Scholar
Wiener N (2019) Cybernetics or Control and Communication in the Animal and the Machine. MIT press
Google Scholar

Download references

Author information

Authors and Affiliations

ETH Zürich, Department of Computer Science, Stampfenbachstrasse 48, 8092, Zürich, Switzerland
Christoph Gebhardt & Otmar Hilliges

Authors

Christoph Gebhardt
View author publications
You can also search for this author in PubMed Google Scholar
Otmar Hilliges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Gebhardt .

Editor information

Editors and Affiliations

Google Research (United States), Mountain View, CA, USA
Yang Li
Advanced Interactive Technologies Lab, ETH Zurich, Zurich, Switzerland
Otmar Hilliges

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gebhardt, C., Hilliges, O. (2021). Optimal Control to Support High-Level User Goals in Human-Computer Interaction. In: Li, Y., Hilliges, O. (eds) Artificial Intelligence for Human Computer Interaction: A Modern Approach. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-030-82681-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-82681-9_2
Published: 05 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82680-2
Online ISBN: 978-3-030-82681-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics