Abstract
With emerging technologies like robots, mixed-reality systems or mobile devices, machine-provided capabilities are increasing, so is the complexity of their control and display mechanisms. To address this dichotomy, we propose optimal control as a framework to support users in achieving their high-level goals in human-computer tasks. We reason that it will improve user support over usual approaches for adaptive interfaces as its formalism implicitly captures the iterative nature of human-computer interaction. We conduct two case studies to test this hypothesis. First, we propose a model-predictive-control-based optimization scheme that supports end-users to plan and execute robotic aerial videos. Second, we introduce a reinforcement-learning-based method to adapt mixed-reality augmentations based on users’ preferences or tasks learned from their gaze interactions with a UI. Our results show that optimal control can better support users’ high-level goals in human-computer tasks than common approaches. Optimal control models human-computer interaction as a sequential decision problem which represents its nature and, hence, results in better predictability of user behavior than for other methods. In addition, our work highlights that optimization- and learning-based optimal control have complementary strengths with respect to interface adaptation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We refer the reader to [30] for details on the results and experimental design of both studies.
- 2.
This also prevents solutions of infinitely long trajectories in time where adding steps with \(\mathbf {u}_i\approx 0\) is free w.r.t. to Eq. (10)).
- 3.
These points can be seen Fig. 3 and are the intersections of the blue dotted lines.
- 4.
For more details on experimental design and results, see [33].
- 5.
For a more detailed version of this section, we refer the interested reader to [28].
- 6.
Myopic policies only consider the attainable reward in the next state and neglect other future states when selecting an action.
References
Abbeel P, Dolgov D, Ng AY, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: IEEE international conference on intelligent robots and systems 2008. IROS ’08. IEEE, pp 1083–1090
Pieter A, Ng Andrew Y (2004) Apprenticeship learning via inverse reinforcement learning. p 1
Kumaripaba A, Alan M, Antti O, Giulio J, Dorota G (2016) Beyond relevance: adapting exploration/exploitation in information retrieval. Association for Computing Machinery, New York, NY, USA
Audronis T (2014) How to get cinematic drone shots
Aytar Y, Pfaff T, Budden D, Le Paine T, Wang Z, de Freitas N (2018) Playing hard exploration games by watching youtube. In: Advances in neural information processing systems
Gilles B, Antti O, Timo K, Sabrina H (2013) Menuoptimizer: interactive optimization of menu systems. pp 331–342
Banovic N, Buzali T, Chevalier F, Mankoff J, Dey AK (2016) Modeling and understanding human routine behavior. In: Proceedings of the 2016 CHI conference on human factors in computing systems, CHI ’16. ACM, pp 248–260
Bemporad A, Morari M, Dua V, Pistikopoulos EN (2002) The explicit linear quadratic regulator for constrained systems. Automatica 38(1):3–20
Bertsekas Dimitri P, Tsitsiklis John N (1995). Neuro-dynamic programming: an overview, vol 1. IEEE, pp 560–564
Bronner S, Shippen J (2015) Biomechanical metrics of aesthetic perception in dance. Exp Brain Res 233(12), 3565–3581:12
Chapanis A (1976) Engineering psychology. Rand McNally, Chicago
Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi H (eds) (2019) Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the twelfth ACM international conference on web search and data mining, WSDM ’19. ACM, pp 456–464
Chen X, Bailly G, Brumby DP, Oulasvirta A, Howes A (2015) The emergence of interactive behavior: A model of rational menu search. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, CHI ’15, pp 4217-4226, New York, NY, USA. Association for Computing Machinery
Xiuli C, Sandra Dorothee S, Chris B, Andrew H (2017). A cognitive model of how people make decisions through interaction with visual displays. Association for Computing Machinery, New York, NY, USA
Cheng E (2016) Aerial photography and videography using drones, vol 1. Peachpit Press
Chipalkatty R, Droge G, Egerstedt MB (2013) Less is more: mixed-initiative model-predictive control with human inputs. IEEE Trans Rob 29(3):695–703
Chipalkatty R, Egerstedt M (2010) Human-in-the-loop: Terminal constraint receding horizon control with human inputs. pp 2712–2717
Christiano PF, Leike J, Brown T, Martic M Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: Advances in neural information processing systems, pp 4299–4307
Clarke DW, Mohtadi C, Tuffs PS (1987) Generalized predictive control-part i. the basic algorithm. Automatica 23(2):137–148
Coates A, Abbeel P, Ng AY (2009) Apprenticeship learning for helicopter control. Commun ACM 52(7):97–105
Cutler CR, Ramaker BL (1980) Dynamic matrix control - a computer control algorithm. In: Joint automatic control conference, vol 17, p 72
Dulac-Arnold G, Evans R, van Hasselt H, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015). Deep reinforcement learning in large discrete action spaces. arXiv:1512.07679
Engbert R, Kliegl R (2003) Microsaccades uncover the orientation of covert attention. Vis Res 43(9):1035–1045
Findlater L, Gajos KZ (2009) Design space and evaluation challenges of adaptive graphical user interfaces. AI Mag 30(4):68–68
Frans K, Ho J, Chen X, Abbeel X, Schulman J (2017) Meta learning shared hierarchies. arXiv:1710.09767
Fritsch FN, Carlson RE (1980) Monotone piecewise cubic interpolation. SIAM J Numer Anal 17(2):238–246
Gašić M, Young S (2014) Gaussian processes for POMDP-based dialogue manager optimization. IEEE Trans Audio Speech Lang Process 22(1):28–40
Gebhardt C, Hecox B, van Opheusden B, Wigdor D, Hillis J, Hilliges O, Benko H (2019) Learning cooperative personalized policies from gaze data. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, UIST ’19, New York, NY, US. ACM
gebhardt c, hepp b, naegeli t, stevsic s, hilliges o (2061) airways: optimization-based Planning of Quadrotor Trajectories according to High-Level User Goals. In: ACM SIGCHI conference on human factors in computing systems, CHI ’16, New York, NY, USA. ACM
Gebhardt C, Hilliges O (2018) WYFIWYG: investigating effective user support in aerial videography. arXiv:1801.05972
Christoph G, Otmar H (2020) Optimizing for cinematographic quadrotor camera target framing. In: Submission to ACM SIGCHI
Gebhardt C, Oulasvirta A, Hilliges O (2020) Hierarchical Reinforcement Learning as a Model of Human Task Interleaving. arXiv:2001.02122
Gebhardt C, Stevsic S, Hilliges O (2018) Optimizing for aesthetically pleasing quadrotor camera motion. ACM Trans Graph (Proc ACM SIGGRAPH) 37(4):90:1–90:11:8
Ali G, Judith B, Atsuto M, Danica K, Mårten B (2016) A sensorimotor reinforcement learning framework for physical human-robot interaction. pp 2682–2688
Dorota G, Tuukka R, Ksenia K, Kumaripaba A, Samuel K, Giulio J (2013) Directing exploratory search: Reinforcement learning from user interactions with keywords. pp 117–128
Görges D (2017) Relations between model predictive control and reinforcement learning. IFAC-PapersOnLine 50(1):4920–4928
Grieder P, Borrelli F, Torrisi F, Morari M (2004) Computation of the constrained infinite time linear quadratic regulator. Automatica 40(4):701–708
Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. In: Advances in neural information processing systems, pp 3909–3917
Hennessy J (2015) 13 powerful tips to improve your aerial cinematography
Ho B-J, Balaji B, Koseoglu M, Sandha S, Pei S, Srivastava M (2020) Quick question: Interrupting users for microtasks with reinforcement learning. arXiv:2007.09515
Hogan N (1984) Adaptive control of mechanical impedance by coactivation of antagonist muscles. IEEE Trans Autom Control 29(8):681–690
Horvitz EJ, Breese JS, Heckerman D, Hovel D, Rommelse K (2013) The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. arXiv:1301.7385
Howes A, Chen X, Acharya A, Lewis RL (2018) Interaction as an emergent property of a partially observable markov decision process. Computational interaction design. pp 287–310
Zehong H, Liang Y, Zhang J, Li Z, Liu Y (2018) Inference aided reinforcement learning for incentive mechanism design in crowdsourcing. In: Advances in Neural Information Processing Systems. NIPS ’18:5508–5518
Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, Hutter M (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot 4(26)
Anthony J, Krzysztof GZ (2012) Systems that adapt to their users. The Human-Computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC Press, Boca Raton, FL
Johansen TA (2004) Approximate explicit receding horizon control of constrained nonlinear systems. Automatica 40(2):293–300
Jorgensen SJ, Campbell O, Llado T, Kim D, Ahn J, Sentis L (2017) Exploring model predictive control to generate optimal control policies for hri dynamical systems. arXiv:1701.03839
Joubert N, Roberts M, Truong A, Berthouzoz F, Hanrahan P (2015) An interactive tool for designing quadrotor camera shots. vol 34. ACM, New York, NY, USA, pp 238:1–238:11
Julier S, Lanzagorta M, Baillot Y, Rosenblum L, Feiner S, Hollerer T, Sestito S (2000) Information filtering for mobile augmented reality. In: Proceedings IEEE and ACM international symposium on augmented reality (ISAR 2000). IEEE, pp 3–11
Kartoun U, Stern H, Edan Y (2010) A human-robot collaborative reinforcement learning algorithm. J Intell Robot Syst 60(2):217–239
Kirches C (2011) Fast numerical methods for mixed-integer nonlinear model-predictive control. Springer
Krishnan S, Garg A, Liaw R, Miller L, Pokorny FT, Goldberg K (2016) Hirl: hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv:1604.06508
Kostadin K, Jason P, Elizabeth WD (2016) “Silence your phones” smartphone notifications increase inattention and hyperactivity symptoms. pp 1011–1020
Lam D, Manzie C, Good MC (2013) Multi-axis model predictive contouring control. Int J Control 86(8):1410–1424
(2020) Optimal control for electromagnetic haptic guidance systems. In: Langerak Thomas, Zarate Juan, Vechev Velko, Lindlbauer David, Panozzo Daniele, Hilliges Otmar (eds)
Lee SJ, Popović Z (2010) Learning behavior styles with inverse reinforcement learning. In: ACM transactions on graphics (TOG), vol 29. ACM, p 122
Lee Y, Wampler K, Bernstein G, Popović J, Popović Z (2010) Motion fields for interactive character locomotion. In: ACM transactions on graphics (TOG), vol 29. ACM, p 138
Liebman E, Saar-Tsechansky M, Stone P (2015) Dj-mc: a reinforcement-learning agent for music playlist recommendation. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems, AAMAS ’15, pp 591–599
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (eds) (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Liniger A, Domahidi A, Morari M (2015) Optimization-based autonomous racing of 1: 43 scale rc cars. Opt Control Appl Methods 36(5):628–647
Liu F, Tang R, Li X, Zhang W, Ye Y, Chen H, Guo H, Zhang Y (2018) Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv:1810.12027
Lo W-Y, Zwicker M (2008) Real-time planning for parameterized human motion. In: Proceedings of the 2008 ACM SIGGRAPH/eurographics symposium on computer animation, SCA ’08, pp 29–38
Justin M, Wei L, Tovi G, George F (2009) Communitycommands: command recommendations for software applications. pp 193–202
McCann J, Pollard N (2007) Responsive characters from motion fragments. In: ACM transactions on graphics (TOG), vol 26. ACM, p 6
McRuer Duane T, Jex Henry R (1967) A review of quasi-linear pilot models
Michalska H, Mayne DQ (1993) Robust receding horizon control of constrained nonlinear systems. IEEE Trans Autom Control 38(11):1623–1633, 11
Bastian M, Andreas K (2010) User model for predictive calibration control on interactive screens. pp 32–37
Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2006) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. J Robot Soc Jpn 24(7):820–829
Modares H, Ranatunga I, Lewis FL, Popa DO (2015) Optimized assistive human-robot interaction using reinforcement learning. IEEE Trans Cybernet 46(3):655–667
Müller J, Oulasvirta A, Murray-Smith R (2017) Control theoretic models of pointing. ACM Trans Comput-Hum Interact (TOCHI) 24(4):1–36
Murray-Smith R (2018) Control theory, dynamics and continuous interaction
Nägeli T, Alonso-Mora J, Domahidi A, Rus D, Hilliges O (2017) Real-time motion planning for aerial videography with dynamic obstacle avoidance and viewpoint optimization. IEEE Robot Autom Lett PP(99):1–1
Nägeli T, Meier L, Domahidi A, Alonso-Mora J, Hilliges O (2017) Real-time planning for automated multi-view drone cinematography. ACM Trans Graph 36(4):132:1–132:10
Thomas N, Ying-Yin H, Andreas K (2014) Planning redirection techniques for optimal free walking experience using model predictive control. pp 111–118
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00, pp 663–670
Oliff H, Liu Y, Kumar M, Williams M, Ryan M (2020) Reinforcement learning for facilitating human-robot-interaction in manufacturing. J Manuf Syst 56:326–340
Park S, Gebhardt C, Rädle R, Feit A, Vrzakova H, Dayama N, Yeo H-S, Klokmose C, Quigley A, Oulasvirta A, Hilliges O (2018) AdaM: adapting multi-user interfaces for collaborative environments in real-time. In: ACM SIGCHI conference on human factors in computing systems, cHI ’18, New York, NY, USA. ACM
Bin Peng X, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans Graph 37(4):8
Bin Peng X, Kanazawa A, Malik J, Abbeel P, Levine S (2018) Sfv: Reinforcement learning of physical skills from videos. ACM Trans Graph, 37
Purves D, Fitzpatrick D, Katz LC, Lamantia AS, McNamara JO, Williams SM, Augustine GJ (2000) Neuroscience. Sinauer Associates
Rachael JA, Rault A, Testud JL, Papon J (1978) Model predictive heuristic control: application to an industrial process. Automatica 14(5):413–428
Mizanoor Rahman SM, Behzad S, Yue W (2015)Trust-based optimal subtask allocation and model predictive control for human-robot collaborative assembly in manufacturing, vol 57250. American Society of Mechanical Engineers, p page V002T32A004
Rajeswaran A, Lowrey K, Todorov EV, Kakade SM (2017) Towards generalization and simplicity in continuous control. In Advances in Neural Information Processing Systems. NIPS ’17:6550–6561
Roberts M, Hanrahan P (2016) Generating dynamically feasible trajectories for quadrotor cameras. ACM Trans Graph 354:61:1-61:11
Safavi A, Zadeh MH (2017) Teaching the user by learning from the user: personalizing movement control in physical human-robot interaction. IEEE/CAA J Autom Sinica 4(4):704–713
Sheridan TB, Ferrell WR (1974) Man-machine systems; Information, control, and decision models of human performance. The MIT press
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Su P-H, Budzianowski P, Ultes S, Gasic M, Young S (2017) Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. arXiv:1707.00130
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22
Rowan S, Kieran F, Owen C (2019) A reinforcement learning and synthetic data approach to mobile notification management. pp 155–164
Teramae T, Noda T, Morimoto J (2018) Emg-based model predictive control for physical human-robot interaction: application for assist-as-needed control. IEEE Robot Autom Lett 3(1):210–217
Tjomsland J, Shafti A, Aldo Faisal A (2019) Human-robot collaboration via deep reinforcement learning of real-world interactions. arXiv:1912.01715
Treuille A, Lee Y, Popović Z (2007) Near-optimal character animation with continuous control. ACM Trans Graph 26(3):7
(1989) Christopher John Cornish Hellaby Watkins. Learning from delayed rewards
Wiener N (2019) Cybernetics or Control and Communication in the Animal and the Machine. MIT press
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gebhardt, C., Hilliges, O. (2021). Optimal Control to Support High-Level User Goals in Human-Computer Interaction. In: Li, Y., Hilliges, O. (eds) Artificial Intelligence for Human Computer Interaction: A Modern Approach. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-030-82681-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-82681-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82680-2
Online ISBN: 978-3-030-82681-9
eBook Packages: Computer ScienceComputer Science (R0)