Feature-based classifiers for design optimization

Liu, Haoyang; Igusa, T.

doi:10.1007/s00163-006-0024-4

Feature-based classifiers for design optimization

Original Paper
Published: 01 November 2006

Volume 17, pages 189–206, (2007)
Cite this article

Research in Engineering Design Aims and scope Submit manuscript

Haoyang Liu¹ &
T. Igusa²

190 Accesses
4 Citations
Explore all metrics

Abstract

We present a design optimization method for systems with high-dimensional parameter spaces using inductive decision trees. The essential idea is to map designs into a relatively low-dimensional feature space, and to derive a classifier to search for high-performing design alternatives within this space. Unlike learning classifier systems that were pioneered by Holland and Goldberg, classifiers defined by inductive decision trees were not originally developed for design optimization. In this paper, we explore modifications to such classifiers to make them more effective in the optimization problem. We expand the notions of feature space, generalize the tree construction heuristic beyond the original information-theoretic definitions, increase the reliance on domain expertise, and facilitate the transfer of design knowledge between related systems. There is a relatively small but rapidly growing body of work in the use of inductive trees for engineering design; the method presented herein is complementary to this research effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining User Knowledge and Online Innovization for Faster Solution to Multi-objective Design Optimization Problems

Optimization Processes for Automated Design of Industrial Systems

Engineering Optimization and Industrial Applications

Notes

All columns have the same length and cross-section properties; the beams are 50% longer than the columns with a second moment of cross-section area that is twice that of the columns; the same elastic material is used for all beams and columns.
C ^j_n is the binomial coefficient of n and j and the reduction by the factor of 2 is due to symmetry.
Standard finite-element analysis is used to compute g.
We can set γ₀ = ∞ and γ_M = − ∞ so that classes C ₁ and C _M would correspond to the highest and lowest performing designs.
Here ${\overline g }$ and s _g are the sample average and standard deviation of the performance values g(x _j) of the training data set, and the multiplier of 0.8 was chosen so that the three classes had approximately the same number of supervised data.
This is also known as the naive Bayes classifier and is based on the maximum likelihood principle.
For relatively small m, the knowledge abstraction level and the associated evaluation effort of the feature functions are usually high; the corresponding features are termed high-level features (as opposed to low-level features). It is preferable, however, to separate any quantities that require substantial evaluation effort from the feature coordinates and to treat them separately in the knowledge modeling process. For instance, low-fidelity approximate models g _approx(x) which are simpler than the original performance function g(x) may be too complex to satisfy the simplicity attribute for the feature vector. It is shown in Sect. 4.2 how g _approx(x) can be used in the design problem.
This is simply the ratio of the column height over width. Other measures of slenderness will depend on the type of column being examined. For thin-walled steel members, such measures would be in terms of the first and second moment of cross-section area and other aggregated dimensional quantities (Liu et al. 2004).
The domain expert that we use is a novice structural engineer with knowledge of frame behavior gained primarily from a graduate-level course in structural mechanics. This level of expertise is sufficient for meaningful interaction with the design process as illustrated in Fig. 1.
For instance, reduced-error pruning repeatedly analyzes the classifier resolution, quantified by an average information entropy, at each non-leaf node. If the resolution of the subtree with the non-leaf node as the root is not much higher than that of the trivial subtree where the same node is replaced by a leaf, then the tree is pruned at this node.
It may be necessary to adjust the performance thresholds used in (2) so that the expanded training data would be more evenly distributed among the classes C _j.
Although the average information entropy in (10) could be viewed as a special case of the expected utility in (12) by setting u _ij = −log₂ P _ij, the utilities are not usually defined in terms of probabilities.
Statistically derived models (Rudnyi 1996; Buzas 1997; Haq and Kibria 1997; De La Cruz-Mesia and Marshall 2003) are not of interest herein because they are typically in terms of basis function expansions with non-informative coefficients.
In this method, it is assumed that the inflection points are at the midpoint of every beam and column, except for the lower columns.
The thresholds that define the classes C′_j used in the second system would have to be adjusted so that class 1 would still represent high-performing designs.
Our classifier approach has also been successfully applied to a much less intuitive design problem involving thin-walled steel columns (Liu et al. 2004). It is noted that the emphasis of that paper was on an exploration of a new nonlinear model for cold-formed steel columns; only a brief outline of a simpler form of the classifier approach was given.

Abbreviations

C _j :: class j defined in terms of a performance interval
C _e,neg :: class of large negative errors
C _e,pos :: class of large positive errors
C _e,small :: class of small errors
c(x):: classifier, defined for each parameter vector x
c _exact(x):: exact classifier
c _max(x):: maximum likelihood (naive Bayes) classifier
c ^*(x):: optimum classifier
D ⊂ Ω:: subset of designs in parameter space
D _K ⊂ Ω:: designs corresponding to leaves K of a decision tree
D ^* :: set of high-performing design alternatives
e(x) = g(x)− g _approx(x):: error in the approximate performance function
F :: space of all possible feature vectors
F ^*_e , D ^*_e :: designs in feature and parameter spaces corresponding to a high likelihood of large positive errors
{F _i}:: partition of feature space
f _i(x):: ith feature function
f = {f ₁(x) ... f _m(x)}:: feature vector of all m feature functions
f ∈G _k :: binary expression in terms of the feature vector
g(x):: performance measure
g _approx(x):: approximation for the performance function
I :: average information entropy
I _i :: information entropy of leaf i
j(i):: class assigned to leaf i of a decision tree
K ⊂ {1, ..., L}:: decision tree leaf indices associated with class 1 designs
l _j(D):: class likelihood function for set D and class C _j
M :: number of performance classes
m :: dimension of the feature vector
n :: dimension of the original design parameter vector
P _ij :: conditional probability of class C _j given that the feature vector lies in F _i
Q = {(f(x _j),g(x _j))}:: training data set defined in feature space
T :: decision tree
U _i [{P _ij},c]:: expected utility at leaf i given probabilities {P _ij} and classifier c(x)
u(x):: utility of design x
u _c(x):: utility function associated with classifier c(x)
u _ij :: utility of a design in leaf i with performance class j
u ₀ = u ^* + u _missed :: difference in utilities in locating and not locating a class 1 design
−u _eval :: negative utility associated with the cost of evaluating the performance function
−u _missed :: negative utility reflecting the opportunity cost of not including a class 1 design
u ^* :: utility of locating a class 1 design
x :: n-vector of design parameters
γ_j :: thresholds used to define class performance intervals
π_i :: proportion of unsupervised data in decision tree leaf i
Ω:: space of feasible design parameters

References

Bailey R, Bras B, Allen JK (1999) Using robust concept exploration and system dynamics models in the design of complex industrial ecosystems. Eng Optim 32(1):33-58
Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International, Belmont
MATH Google Scholar
Buntine W (1990) A theory of learning classification rules, PhD thesis, University of Technology, Sydney
Buntine W (1992) Learning classification tree. Statistics and computing 2:63–73, DOI:10.1007/BF01889584
Buzas JS (1997) Instrumental variable estimation in nonlinear measurement error models. Commun Stat Theory Methods 26(12):2861–2877
MATH MathSciNet Google Scholar
Chen W, Allen JK, Mavris D, Mistree F (1996) A concept exploration method for determining robust top-level specifications. Eng Optim 26:137–158
Google Scholar
DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
MATH Google Scholar
De La Cruz-Mesia R, Marshall G (2003) A Bayesian approach for nonlinear regression models with continuous errors. Commun Stat Theory Methods 32:1631–1646, DOI:10.1081/STA-120022248
Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin Heidelberg New York
MATH Google Scholar
Duda RO, Hart PE, Sytork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Forouraghi B (1999) On utility of inductive learning in multi-objective robust design. Artif Intell Eng Des Anal Manuf 13:27–36, DOI:10.1017/S0890060499131032
Gero JS, Kazakov VA (1995) Evolving building blocks for design using genetic engineering. In: IEEE international conference on evolutionary computing, pp 340–345
Goldberg DE (1987) Simple genetic algorithm and the minimal deceptive problem. In: Research notes in artificial intelligence, chapter 6. Morgan Kauffmann Publishers, Inc, San Francisco, pp 74–88
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, Addison-Wesley, Reading
Haq M, Kibria B (1997) Predictive inference for linear and multivariate linear models with MA (1) error processes. Commun Stat Theory Methods 26(2):331–353
MATH MathSciNet Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New York
MATH Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Igusa T, Liu H, Schafer BW, Naiman DQ (2003) Bayesian classification trees and clustering for rapid generation and selection of design alternatives. In: Reddy RG (ed) NSF design, manufacturing, and industrial innovation research conference, January 6–9, Birmingham
Kovacs T (2004) Bibliography of real-world classifier systems applications. In: Bull L (ed) Applications of learning classifier systems. Springer, Berlin Heidelberg New York, pp 300–305
Google Scholar
Lee J, Hajela P (2001) Application of classifier systems in improving response surface based approximations for design optimization. Comput Struct 79:333–344, DOI:10.1016/S0045-7949(00)00132-2
Google Scholar
Liu H (2003) Bayesian classifiers for uncertainty modeling with applications to global optimization and solid mechanics problems, PhD thesis, Department of Civil Engineering, Johns Hopkins University, Baltimore
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Norwell
MATH Google Scholar
Liu H, Igusa T, Schafer BW (2004) Knowledge-based global optimization of cold-formed steel columns. Thin-Walled Struct 42:785–801, DOI:10.1016/j.tws.2004.01.001
Google Scholar
Matheus C (1991a) The need for constructive induction. In: Machine learning: proceeding of the eighth international workshop, pp 173–177
Matheus C (1991b) The need for constructive induction. In: Machine learning: proceedings of the eighth international workshop, pp 173–177
Mili F, Shen W, Martinez I, et al. (2001) Knowledge modeling for design decisions. Artif Intell Eng 15:153–164, DOI:10.1016/S0954-1810(01)00013-9
Google Scholar
Myers RH, Khuri AI, Carte, WH (1989) Response surface methodology: 1966–1988. Technometrics 31:137–157, DOI:10.2307/1268813
Google Scholar
Perremans P (1996) Feature-based description of modular fixturing elements: The key to an expert system for the automatic design of the physical fixture. Adv Eng Software 25:19–27, DOI:10.1016/0965-9978(95)00082-8
Google Scholar
Quinlan JR (1993) C45: Programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Reckhow KH (1999) Water quality prediction, mechanism, and probability network models. Canad J Fish Aquat Sci 56:1150–1158, DOI:10.1139/cjfas-56-7-1150
Google Scholar
Rosenman MA (1997) The generation of form using an evolutionary approach. In: Dasgupta D, Michalewicz Z (eds) Evolutionary algorithms in engineering applications. Springer, Berlin Heidelberg New York, pp 69–86
Google Scholar
Rudnyi EB (1996) Statistical model of systematic errors: linear error model. Chemometrics and intelligent laboratory systems 34:41–54, DOI:10.1016/0169-7439(96)00004-4
Salustri FA, Venter RD (1992) An axiomatic theory of engineering design information. Eng Comput 8:197–211, DOI:10.1007/BF01194322
Google Scholar
Schwabacher M, Ellman T, Hirsh H (1998) Learning to set up numerical optimizations of engineering designs. Artif Intell Eng Des Anal Manuf 12:173-192, DOI:10.1017/S0890060498122084
Google Scholar
Stahovich TF, Bal H (2002) An inductive approach to learning and reusing design strategies. Res Eng Des 13:109–121, DOI:10.1007/s00163-001-0010-9
Google Scholar
Varadarajan S, Chen W, Pelka CJ (2000) Robust concept exploration of propulsion systems with enhanced model approximation capabilities. Eng Optim 32:309–334
Google Scholar
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 4:67–82, DOI:10.1109/4235.585893
Google Scholar
Wyse N, Dubes R, Jain A (1980) A critical evaluation of intrinsic dimensionality algorithms. In: Pattern recognition in practice. Morgan Kaufmann Publishers, San Francisco, pp 415–425

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation at the Johns Hopkins University under Grant Number DMI-0087032. This research support is gratefully acknowledged. The authors would also like to thank the reviewers for their insightful comments.

Author information

Authors and Affiliations

SDR Engineering, 3370 Capital Circle NE, Tallahassee, FL, 32308, USA
Haoyang Liu
Department of Civil Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
T. Igusa

Authors

Haoyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
T. Igusa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Igusa.

Appendix

1.1 Derivation details for the generalized heuristic

The difference between the minimum entropy and expected utility heuristics for tree construction can be reconciled by examining the incremental value of information. The basic idea is to use Bayesian analysis to compare the value of information obtained in the cases for finite and infinite supervised data. We begin by defining n _ij and P _ij = n _ij/N _i as the number and observed proportion of class j designs in leaf i, where N _i = ∑ ^M_j=1 n _ij is the total number of data in leaf i. If standard Bayesian conjugate distributions are used, then the posterior distribution of the actual proportions π_ij of class j designs in leaf i would be Dirichlet with joint density (DeGroot 1970)

$$\phi_{i} {\left({{\left\{{\pi_{{ij}}}\right\}}}\right)} = \Gamma {\left({N_{i}}\right)}{\prod\limits_{j = 1}^M}\frac{{\pi^{{n_{{ij}} - 1}}_{{ij}}}}{{\Gamma {\left({n_{{ij}}}\right)}}}$$

(A1)

The mean value of π_ij is approximately equal to the observed proportion P _ij and the standard deviation of π_ij, which is a measure of the uncertainty in the actual proportion, is inversely proportional to the square root of the number of data N _i. The expected utility U _i for leaf i given the N _i data is

$$E{\left[ {u_{i}} | c,N_{i}\;\hbox{data} \right]} = U_{i} {\left[ {{\left\{{P_{{ij}}}\right\}},c} \right]}$$

(A2)

where the functional form of U _i is given in (14) in terms of the classifier c.

Next, we consider the effect of additional data on classifier accuracy. What is needed here is an explicit decision rule for redefining the classifier c as data is collected. Equations 15 and 16 are two examples of such explicit rules in which the observed proportions P _ij are recomputed and the classifiers c _max and c ^* are redefined using the new values for P _ij. The topology and the binary relations defining the tree T and its leaves are fixed at this stage; improvements to the tree are considered later. Given a decision rule for redefining c, it can be shown that, in the limit of infinite data, the expected utility for leaf j would be

$$E{\left[ {u_{i}} | c,\infty\;\hbox{data} \right]} = {\int\limits_0^1} \cdots {\int\limits_0^1}U_{i} {\left[ {{\left\{{\pi_{{ij}}}\right\}},c} \right]}\phi_{i} {\left({{\left\{{\pi_{{ij}}}\right\}}}\right)}{\rm d}\pi_{{i1}} \cdots {\rm d}\pi_{{iM}} $$

(A3)

It follows from the definitions that as the number of data N _i in leaf i becomes large, the preceding expected utilities must agree:

$$E{\left[u_{i} | c,\infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c,N_{i}\;\hbox{data}\right]} \to 0 \quad \hbox{as}\;N_{i} \to \infty $$

(A4)

In the special case where the maximum likelihood classifier c _max in (15) is used, it can be shown, by a slight generalization of the arguments presented in (Buntine 1990), that the difference in the preceding expected utilities has the following asymptotic result:

$$E{\left[ u_{i} | c_{{\max}} ,\infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c_{{\max}}, N_{i}\;\hbox{data} \right]} \sim \exp {\left({- \lambda_{i} N_{i}}\right)}$$

(A5)

This shows that the difference in expected utilities decays exponentially with the number of data N _i. The exponential rate λ_i is

$$\frac{{\lambda_{i}}}{{\log 2}} = - I_{i} {\left({{\left\{{P_{{ij}}}\right\}}}\right)} + 1$$

(A6)

If the product of the differences in expected utilities is examined, then the following asymptotic result is obtained

$${\prod\limits_{i = 1}^L}{\left({E{\left[ {u_{i}} |c_{{\max}}, \infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c_{{\max}}, N_{i}\;\hbox{data} \right]}}\right)} \sim \exp {\left({- \lambda N}\right)}$$

(A7)

where N = ∑ ^L_i=1 N _i is the total number of supervised data for all L leaves and the exponential rate is

$$\frac{\lambda}{{\log 2}} = - I + 1$$

(A8)

The preceding equations provide useful relationships between the expected utility of the maximum likelihood classifier and the information entropy.

In the next stage of the analysis, changes in the tree T are considered. Following the information-theoretic viewpoint, the accuracy of the tree is reflected by how fully it reflects the information in the supervised data. With this viewpoint, the difference of utilities in (A7) is a measure of the information in the N data that is not incorporated into the tree. To increase the accuracy of the tree, it is necessary to make the rate λ as large as possible. As shown in (A8), this is equivalent to minimizing the information entropy. Hence, the information entropy heuristic can be tied to a Bayesian analysis of the utility of information.

With these results, we are now able to obtain a tree construction heuristic based on the utility function in Sect. 4.1 for the generation of design alternatives. Starting from the Bayesian analysis results in Eqs. A1–A8, the main task is to determine if an exponential rate λ_i exists when the maximum likelihood classifier c _max in (A5) is replaced by the classifier c ^* for generating design alternatives. With some approximation, it can be shown that the exponential rate remains valid for c ^*; the result is

$$\frac{{\lambda_{i}}}{{\log 2}} \approx - I_{i} {\left({{\left\{{{P}^{\prime}_{{ij}}}\right\}}}\right)} + \hbox{constant}$$

(A9)

where P ^′_ij are given by

$${P}^{\prime}_{{ij}} = \left\{{\begin{array}{*{20}l} {{u_{0} P_{{i1}} {\left[ {1 + {\left({u_{0} - 1}\right)}P_{{i1}}} \right]}^{{- 1}}}}& {{\hbox{if}\;j = 1}} \\ {{P_{{ij}} {\left[ {1 + {\left({u_{0} - 1}\right)}P_{{j1}}} \right]}^{{- 1}}}}& {{\hbox{otherwise}}}\\ \end{array}} \right.$$

(A10)

with u ₀ defined after (16). The revised proportion P ^′_i1 can be interpreted as the proportion of class 1 design at leaf i if the number of class 1 designs is weighted by the utility-related value u ₀.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Igusa, T. Feature-based classifiers for design optimization. Res Eng Design 17, 189–206 (2007). https://doi.org/10.1007/s00163-006-0024-4

Download citation

Received: 16 January 2006
Revised: 25 April 2006
Accepted: 05 October 2006
Published: 01 November 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s00163-006-0024-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature-based classifiers for design optimization

Abstract

Access this article

Similar content being viewed by others

Combining User Knowledge and Online Innovization for Faster Solution to Multi-objective Design Optimization Problems

Optimization Processes for Automated Design of Industrial Systems

Engineering Optimization and Industrial Applications

Notes

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Derivation details for the generalized heuristic

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature-based classifiers for design optimization

Abstract

Access this article

Similar content being viewed by others

Combining User Knowledge and Online Innovization for Faster Solution to Multi-objective Design Optimization Problems

Optimization Processes for Automated Design of Industrial Systems

Engineering Optimization and Industrial Applications

Notes

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Derivation details for the generalized heuristic

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation