Skip to main content
Log in

A method for model selection using reinforcement learning when viewing design as a sequential decision process

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  • Abel D, Hershkowitz DE, Littman ML (2017) Near optimal behavior via approximate state abstraction. arXiv:1701.04113

  • Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern Part C Appl Rev 42(2):201–212

    Article  Google Scholar 

  • AISC (2010) Seismic provisions for structural steel buildings. American Institute of Steel Construction, AISC

  • AISC (2011) Steel construction manual. American Institute of Steel Construction

  • Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: 18th national conference on artificial intelligence. American Association for Artificial Intelligence, Edmonton, pp 119–125

  • American Society of Civil Engineers (2007) Seismic rehabilitation of existing buildings, ASCE Publications, 41(6):64–71. Reston, VA

  • ATC (1996) Seismic evaluation and retrofit of concrete buildings. Applied Technology Council, Report ATC-40

  • Balling R (1999) Design by shopping: a new paradigm?. In: Proceedings of the third world congress of structural and multidisciplinary optimization (WCSMO-3), vol 1, pp 295–297

  • Bauchau OA, Craig JI (2009) Structural analysis: with applications to aerospace structures, vol 163. Springer Science & Business Media

  • Bellman R (1957) Dynamic programming. Princeton University Press

  • Busemeyer JR, Townsend JT (1993) Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol Rev 100(3):432–459

    Article  Google Scholar 

  • Chhabra JPS, Warn GP (2017) Sequential decision process for tradespace exploration by bounding probabilistic decision criteria using mean-risk analysis. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68112

  • Chopra AK, Goel RK (1999) Capacity-demand-diagram methods based on inelastic design spectrum. Earthq Spectra 15(4):637–656

    Article  Google Scholar 

  • Deierlein GG, Reinhorn AM, Willford MR (2010) Nonlinear structural analysis for seismic design. Technical Report 4, NEHRP. NIST GCR 10-917-5

  • FEMA (2000) Recommended seismic design criteria for new steel Moment-Frame buildings federal emergency management agency. Washington, DC

  • FEMA (2006) NEHRP recommended provisions: design examples FEMA 451. Building seismic safety council. Washington, DC

  • Foutch DA, Yun S-Y (2002) Modeling of steel moment frames for seismic loads. J Constr Steel Res 58 (5):529–564

    Article  Google Scholar 

  • Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, Clausing DP (2009) The pugh controlled convergence method: model-based evaluation and implications for design theory. Res Eng Des 20 (1):41–58

    Article  Google Scholar 

  • Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP et al (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Found Trends®;, Mach Learn 6 (4):375–451

    Article  MATH  Google Scholar 

  • Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1):163–223

    Article  MathSciNet  MATH  Google Scholar 

  • Kalyanakrishnan S, Stone P (2007) Batch reinforcement learning in a complex domain. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems, pp 94. ACM

  • Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th international symposium on artificial intelligence and mathematics, pp 531–539

  • Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321

    Article  Google Scholar 

  • Mattson CA, Messac A (2003) Concept selection using s-pareto frontiers. AIAA J 41(6):1190–1198

    Article  Google Scholar 

  • Mazzoni S, McKenna F, Scott MH, Fenves GL et al (2006) OpenSees command language manual. Pacific Earthquake Engineering Research (PEER) Center

  • Miller SW, Simpson TW, Yukish MA (2017) Two applications of design as a sequential decision process. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68150

  • Miller SW, Simpson TW, Yukish MA, Bennett LA, Lego SE, Stump GM (2013) Preference construction, sequential decision making, and trade space exploration. In: ASME 2013 international design engineering technical conferences and computers and information in engineering conference, Portland, Oregon. American Society of Mechanical Engineers. Paper no. DETC2013/DAC-13098

  • Miller SW, Yukish MA, Simpson TW (2018) Design as a sequential decision process. Struct Multidiscip Optim 57(1):305–324

    Article  MathSciNet  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Moehle J, Deierlein GG (2004) A framework methodology for performance-based earthquake engineering. In: 13th world conference on earthquake engineering, Vancouver, Canada. Paper no. 679

  • Pareto V (1971) Manual of political economy. Macmillan

  • Payne JW, Bettman JR, Johnson EJ (1993) The adaptive decision maker. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Qian Z, Seepersad CC, Joseph VR, Allen JK, Wu CJ (2006) Building surrogate models based on detailed and approximate simulations. J Mech Des 128(4):668–677

    Article  Google Scholar 

  • Sarkisian M, Shook D, Desai D, Wang N (2015) Developing a basis for design–embodied carbon in structures. In: IABSE symposium report, vol 105. International Association for Bridge and Structural Engineering, pp 1–8

  • Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952

  • Shahan DW, Seepersad CC (2012) Bayesian network classifiers for set-based collaborative design. J Mech Des 134(7):071001

    Article  Google Scholar 

  • Shocker AD, Ben-Akiva M, Boccara B, Nedungadi P (1991) Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Mark Lett 2(3):181– 197

    Article  Google Scholar 

  • Simpson TW, Poplinski J, Koch PN, Allen J (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17(2):129–150

    Article  MATH  Google Scholar 

  • Simpson TW, Spencer D, Yukish MA, Stump G (2008) Visual Steering commands and test problems to support research in trade space exploration. In: 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, Victoria, British Columbia, Canada. Paper no. AIAA 2008-6085

  • Singer DJ, Doerry N, Buckley ME (2009) What is set-based design? Nav Eng J 121(4):31–43

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  • Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063

  • Tsetsos K, Usher M, Chater N (2010) Preference reversal in multiattribute choice. Psychol Rev 117 (4):1275

    Article  Google Scholar 

  • Unal M, Miller SW, Chhabra JPS, Warn GP, Yukish MA, Simpson TW (2017) A sequential decision process for the system-level design of structural frames. Struct Multidiscip Optim 56(5):991– 1011

    Article  Google Scholar 

  • Ward AC (1989) A theory of quantitative inference for artifact sets applied to a mechanical design compiler. Technical report, DTIC Document

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279– 292

    Article  MATH  Google Scholar 

  • Woodruff MJ, Reed PM, Simpson TW (2013) Many objective visual analytics: rethinking the design of complex engineered systems. Struct Multidiscip Optim 48(1):201–219

    Article  Google Scholar 

Download references

Funding

This study received support from the National Science Foundation (NSF) under NSF Grant CMMI-1455444, and the Graduate Excellence Fellowship provided by the College of Engineering at Pennsylvania State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaskanwal P. S. Chhabra.

Ethics declarations

Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Pennsylvania State University.

Additional information

Responsible Editor: Somanath Nagendra

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chhabra, J.P.S., Warn, G.P. A method for model selection using reinforcement learning when viewing design as a sequential decision process. Struct Multidisc Optim 59, 1521–1542 (2019). https://doi.org/10.1007/s00158-018-2145-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-018-2145-6

Keywords

Navigation