Abstract
In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.
Similar content being viewed by others
References
Abel D, Hershkowitz DE, Littman ML (2017) Near optimal behavior via approximate state abstraction. arXiv:1701.04113
Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern Part C Appl Rev 42(2):201–212
AISC (2010) Seismic provisions for structural steel buildings. American Institute of Steel Construction, AISC
AISC (2011) Steel construction manual. American Institute of Steel Construction
Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: 18th national conference on artificial intelligence. American Association for Artificial Intelligence, Edmonton, pp 119–125
American Society of Civil Engineers (2007) Seismic rehabilitation of existing buildings, ASCE Publications, 41(6):64–71. Reston, VA
ATC (1996) Seismic evaluation and retrofit of concrete buildings. Applied Technology Council, Report ATC-40
Balling R (1999) Design by shopping: a new paradigm?. In: Proceedings of the third world congress of structural and multidisciplinary optimization (WCSMO-3), vol 1, pp 295–297
Bauchau OA, Craig JI (2009) Structural analysis: with applications to aerospace structures, vol 163. Springer Science & Business Media
Bellman R (1957) Dynamic programming. Princeton University Press
Busemeyer JR, Townsend JT (1993) Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol Rev 100(3):432–459
Chhabra JPS, Warn GP (2017) Sequential decision process for tradespace exploration by bounding probabilistic decision criteria using mean-risk analysis. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68112
Chopra AK, Goel RK (1999) Capacity-demand-diagram methods based on inelastic design spectrum. Earthq Spectra 15(4):637–656
Deierlein GG, Reinhorn AM, Willford MR (2010) Nonlinear structural analysis for seismic design. Technical Report 4, NEHRP. NIST GCR 10-917-5
FEMA (2000) Recommended seismic design criteria for new steel Moment-Frame buildings federal emergency management agency. Washington, DC
FEMA (2006) NEHRP recommended provisions: design examples FEMA 451. Building seismic safety council. Washington, DC
Foutch DA, Yun S-Y (2002) Modeling of steel moment frames for seismic loads. J Constr Steel Res 58 (5):529–564
Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, Clausing DP (2009) The pugh controlled convergence method: model-based evaluation and implications for design theory. Res Eng Des 20 (1):41–58
Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP et al (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Found Trends®;, Mach Learn 6 (4):375–451
Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1):163–223
Kalyanakrishnan S, Stone P (2007) Batch reinforcement learning in a complex domain. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems, pp 94. ACM
Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th international symposium on artificial intelligence and mathematics, pp 531–539
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321
Mattson CA, Messac A (2003) Concept selection using s-pareto frontiers. AIAA J 41(6):1190–1198
Mazzoni S, McKenna F, Scott MH, Fenves GL et al (2006) OpenSees command language manual. Pacific Earthquake Engineering Research (PEER) Center
Miller SW, Simpson TW, Yukish MA (2017) Two applications of design as a sequential decision process. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68150
Miller SW, Simpson TW, Yukish MA, Bennett LA, Lego SE, Stump GM (2013) Preference construction, sequential decision making, and trade space exploration. In: ASME 2013 international design engineering technical conferences and computers and information in engineering conference, Portland, Oregon. American Society of Mechanical Engineers. Paper no. DETC2013/DAC-13098
Miller SW, Yukish MA, Simpson TW (2018) Design as a sequential decision process. Struct Multidiscip Optim 57(1):305–324
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Moehle J, Deierlein GG (2004) A framework methodology for performance-based earthquake engineering. In: 13th world conference on earthquake engineering, Vancouver, Canada. Paper no. 679
Pareto V (1971) Manual of political economy. Macmillan
Payne JW, Bettman JR, Johnson EJ (1993) The adaptive decision maker. Cambridge University Press, Cambridge
Qian Z, Seepersad CC, Joseph VR, Allen JK, Wu CJ (2006) Building surrogate models based on detailed and approximate simulations. J Mech Des 128(4):668–677
Sarkisian M, Shook D, Desai D, Wang N (2015) Developing a basis for design–embodied carbon in structures. In: IABSE symposium report, vol 105. International Association for Bridge and Structural Engineering, pp 1–8
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Shahan DW, Seepersad CC (2012) Bayesian network classifiers for set-based collaborative design. J Mech Des 134(7):071001
Shocker AD, Ben-Akiva M, Boccara B, Nedungadi P (1991) Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Mark Lett 2(3):181– 197
Simpson TW, Poplinski J, Koch PN, Allen J (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17(2):129–150
Simpson TW, Spencer D, Yukish MA, Stump G (2008) Visual Steering commands and test problems to support research in trade space exploration. In: 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, Victoria, British Columbia, Canada. Paper no. AIAA 2008-6085
Singer DJ, Doerry N, Buckley ME (2009) What is set-based design? Nav Eng J 121(4):31–43
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Tsetsos K, Usher M, Chater N (2010) Preference reversal in multiattribute choice. Psychol Rev 117 (4):1275
Unal M, Miller SW, Chhabra JPS, Warn GP, Yukish MA, Simpson TW (2017) A sequential decision process for the system-level design of structural frames. Struct Multidiscip Optim 56(5):991– 1011
Ward AC (1989) A theory of quantitative inference for artifact sets applied to a mechanical design compiler. Technical report, DTIC Document
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279– 292
Woodruff MJ, Reed PM, Simpson TW (2013) Many objective visual analytics: rethinking the design of complex engineered systems. Struct Multidiscip Optim 48(1):201–219
Funding
This study received support from the National Science Foundation (NSF) under NSF Grant CMMI-1455444, and the Graduate Excellence Fellowship provided by the College of Engineering at Pennsylvania State University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclaimer
Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Pennsylvania State University.
Additional information
Responsible Editor: Somanath Nagendra
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chhabra, J.P.S., Warn, G.P. A method for model selection using reinforcement learning when viewing design as a sequential decision process. Struct Multidisc Optim 59, 1521–1542 (2019). https://doi.org/10.1007/s00158-018-2145-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-018-2145-6