A method for model selection using reinforcement learning when viewing design as a sequential decision process

Chhabra, Jaskanwal P. S.; Warn, Gordon P.

doi:10.1007/s00158-018-2145-6

A method for model selection using reinforcement learning when viewing design as a sequential decision process

Research Paper
Published: 15 December 2018

Volume 59, pages 1521–1542, (2019)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

14 Citations
3 Altmetric
Explore all metrics

Abstract

In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Increasing the discriminatory power of bounding models using problem-specific knowledge when viewing design as a sequential decision process

Article 24 March 2020

Design as a sequential decision process

Article 13 July 2017

A method to efficiently localize non-dominated regions using surrogate modeling with multi-fidelity data from a sequential decision process

Article 14 December 2019

References

Abel D, Hershkowitz DE, Littman ML (2017) Near optimal behavior via approximate state abstraction. arXiv:1701.04113
Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern Part C Appl Rev 42(2):201–212
Article Google Scholar
AISC (2010) Seismic provisions for structural steel buildings. American Institute of Steel Construction, AISC
AISC (2011) Steel construction manual. American Institute of Steel Construction
Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: 18th national conference on artificial intelligence. American Association for Artificial Intelligence, Edmonton, pp 119–125
American Society of Civil Engineers (2007) Seismic rehabilitation of existing buildings, ASCE Publications, 41(6):64–71. Reston, VA
ATC (1996) Seismic evaluation and retrofit of concrete buildings. Applied Technology Council, Report ATC-40
Balling R (1999) Design by shopping: a new paradigm?. In: Proceedings of the third world congress of structural and multidisciplinary optimization (WCSMO-3), vol 1, pp 295–297
Bauchau OA, Craig JI (2009) Structural analysis: with applications to aerospace structures, vol 163. Springer Science & Business Media
Bellman R (1957) Dynamic programming. Princeton University Press
Busemeyer JR, Townsend JT (1993) Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol Rev 100(3):432–459
Article Google Scholar
Chhabra JPS, Warn GP (2017) Sequential decision process for tradespace exploration by bounding probabilistic decision criteria using mean-risk analysis. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68112
Chopra AK, Goel RK (1999) Capacity-demand-diagram methods based on inelastic design spectrum. Earthq Spectra 15(4):637–656
Article Google Scholar
Deierlein GG, Reinhorn AM, Willford MR (2010) Nonlinear structural analysis for seismic design. Technical Report 4, NEHRP. NIST GCR 10-917-5
FEMA (2000) Recommended seismic design criteria for new steel Moment-Frame buildings federal emergency management agency. Washington, DC
FEMA (2006) NEHRP recommended provisions: design examples FEMA 451. Building seismic safety council. Washington, DC
Foutch DA, Yun S-Y (2002) Modeling of steel moment frames for seismic loads. J Constr Steel Res 58 (5):529–564
Article Google Scholar
Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, Clausing DP (2009) The pugh controlled convergence method: model-based evaluation and implications for design theory. Res Eng Des 20 (1):41–58
Article Google Scholar
Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP et al (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Found Trends®;, Mach Learn 6 (4):375–451
Article MATH Google Scholar
Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1):163–223
Article MathSciNet MATH Google Scholar
Kalyanakrishnan S, Stone P (2007) Batch reinforcement learning in a complex domain. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems, pp 94. ACM
Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th international symposium on artificial intelligence and mathematics, pp 531–539
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321
Article Google Scholar
Mattson CA, Messac A (2003) Concept selection using s-pareto frontiers. AIAA J 41(6):1190–1198
Article Google Scholar
Mazzoni S, McKenna F, Scott MH, Fenves GL et al (2006) OpenSees command language manual. Pacific Earthquake Engineering Research (PEER) Center
Miller SW, Simpson TW, Yukish MA (2017) Two applications of design as a sequential decision process. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68150
Miller SW, Simpson TW, Yukish MA, Bennett LA, Lego SE, Stump GM (2013) Preference construction, sequential decision making, and trade space exploration. In: ASME 2013 international design engineering technical conferences and computers and information in engineering conference, Portland, Oregon. American Society of Mechanical Engineers. Paper no. DETC2013/DAC-13098
Miller SW, Yukish MA, Simpson TW (2018) Design as a sequential decision process. Struct Multidiscip Optim 57(1):305–324
Article MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Moehle J, Deierlein GG (2004) A framework methodology for performance-based earthquake engineering. In: 13th world conference on earthquake engineering, Vancouver, Canada. Paper no. 679
Pareto V (1971) Manual of political economy. Macmillan
Payne JW, Bettman JR, Johnson EJ (1993) The adaptive decision maker. Cambridge University Press, Cambridge
Book Google Scholar
Qian Z, Seepersad CC, Joseph VR, Allen JK, Wu CJ (2006) Building surrogate models based on detailed and approximate simulations. J Mech Des 128(4):668–677
Article Google Scholar
Sarkisian M, Shook D, Desai D, Wang N (2015) Developing a basis for design–embodied carbon in structures. In: IABSE symposium report, vol 105. International Association for Bridge and Structural Engineering, pp 1–8
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Shahan DW, Seepersad CC (2012) Bayesian network classifiers for set-based collaborative design. J Mech Des 134(7):071001
Article Google Scholar
Shocker AD, Ben-Akiva M, Boccara B, Nedungadi P (1991) Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Mark Lett 2(3):181– 197
Article Google Scholar
Simpson TW, Poplinski J, Koch PN, Allen J (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17(2):129–150
Article MATH Google Scholar
Simpson TW, Spencer D, Yukish MA, Stump G (2008) Visual Steering commands and test problems to support research in trade space exploration. In: 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, Victoria, British Columbia, Canada. Paper no. AIAA 2008-6085
Singer DJ, Doerry N, Buckley ME (2009) What is set-based design? Nav Eng J 121(4):31–43
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
MATH Google Scholar
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Tsetsos K, Usher M, Chater N (2010) Preference reversal in multiattribute choice. Psychol Rev 117 (4):1275
Article Google Scholar
Unal M, Miller SW, Chhabra JPS, Warn GP, Yukish MA, Simpson TW (2017) A sequential decision process for the system-level design of structural frames. Struct Multidiscip Optim 56(5):991– 1011
Article Google Scholar
Ward AC (1989) A theory of quantitative inference for artifact sets applied to a mechanical design compiler. Technical report, DTIC Document
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279– 292
Article MATH Google Scholar
Woodruff MJ, Reed PM, Simpson TW (2013) Many objective visual analytics: rethinking the design of complex engineered systems. Struct Multidiscip Optim 48(1):201–219
Article Google Scholar

Download references

Funding

This study received support from the National Science Foundation (NSF) under NSF Grant CMMI-1455444, and the Graduate Excellence Fellowship provided by the College of Engineering at Pennsylvania State University.

Author information

Authors and Affiliations

The Pennsylvania State University, University Park, State College, PA, 16802, USA
Jaskanwal P. S. Chhabra & Gordon P. Warn

Authors

Jaskanwal P. S. Chhabra
View author publications
You can also search for this author in PubMed Google Scholar
Gordon P. Warn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaskanwal P. S. Chhabra.

Ethics declarations

Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Pennsylvania State University.

Additional information

Responsible Editor: Somanath Nagendra

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chhabra, J.P.S., Warn, G.P. A method for model selection using reinforcement learning when viewing design as a sequential decision process. Struct Multidisc Optim 59, 1521–1542 (2019). https://doi.org/10.1007/s00158-018-2145-6

Download citation

Received: 31 January 2018
Revised: 12 October 2018
Accepted: 04 November 2018
Published: 15 December 2018
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s00158-018-2145-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for model selection using reinforcement learning when viewing design as a sequential decision process

Abstract

Access this article

Similar content being viewed by others

Increasing the discriminatory power of bounding models using problem-specific knowledge when viewing design as a sequential decision process

Design as a sequential decision process

A method to efficiently localize non-dominated regions using surrogate modeling with multi-fidelity data from a sequential decision process

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclaimer

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A method for model selection using reinforcement learning when viewing design as a sequential decision process

Abstract

Access this article

Similar content being viewed by others

Increasing the discriminatory power of bounding models using problem-specific knowledge when viewing design as a sequential decision process

Design as a sequential decision process

A method to efficiently localize non-dominated regions using surrogate modeling with multi-fidelity data from a sequential decision process

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclaimer

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation