Skip to main content
Log in

Feature-based classifiers for design optimization

  • Original Paper
  • Published:
Research in Engineering Design Aims and scope Submit manuscript

Abstract

We present a design optimization method for systems with high-dimensional parameter spaces using inductive decision trees. The essential idea is to map designs into a relatively low-dimensional feature space, and to derive a classifier to search for high-performing design alternatives within this space. Unlike learning classifier systems that were pioneered by Holland and Goldberg, classifiers defined by inductive decision trees were not originally developed for design optimization. In this paper, we explore modifications to such classifiers to make them more effective in the optimization problem. We expand the notions of feature space, generalize the tree construction heuristic beyond the original information-theoretic definitions, increase the reliance on domain expertise, and facilitate the transfer of design knowledge between related systems. There is a relatively small but rapidly growing body of work in the use of inductive trees for engineering design; the method presented herein is complementary to this research effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. All columns have the same length and cross-section properties; the beams are 50% longer than the columns with a second moment of cross-section area that is twice that of the columns; the same elastic material is used for all beams and columns.

  2. C j n is the binomial coefficient of n and j and the reduction by the factor of 2 is due to symmetry.

  3. Standard finite-element analysis is used to compute g.

  4. We can set γ0 =  ∞ and γ M =  − ∞ so that classes C 1 and C M would correspond to the highest and lowest performing designs.

  5. Here \({\overline g }\) and s g are the sample average and standard deviation of the performance values g(x j ) of the training data set, and the multiplier of 0.8 was chosen so that the three classes had approximately the same number of supervised data.

  6. This is also known as the naive Bayes classifier and is based on the maximum likelihood principle.

  7. For relatively small m, the knowledge abstraction level and the associated evaluation effort of the feature functions are usually high; the corresponding features are termed high-level features (as opposed to low-level features). It is preferable, however, to separate any quantities that require substantial evaluation effort from the feature coordinates and to treat them separately in the knowledge modeling process. For instance, low-fidelity approximate models g approx(x) which are simpler than the original performance function g(x) may be too complex to satisfy the simplicity attribute for the feature vector. It is shown in Sect. 4.2 how g approx(x) can be used in the design problem.

  8. This is simply the ratio of the column height over width. Other measures of slenderness will depend on the type of column being examined. For thin-walled steel members, such measures would be in terms of the first and second moment of cross-section area and other aggregated dimensional quantities (Liu et al. 2004).

  9. The domain expert that we use is a novice structural engineer with knowledge of frame behavior gained primarily from a graduate-level course in structural mechanics. This level of expertise is sufficient for meaningful interaction with the design process as illustrated in Fig. 1.

  10. For instance, reduced-error pruning repeatedly analyzes the classifier resolution, quantified by an average information entropy, at each non-leaf node. If the resolution of the subtree with the non-leaf node as the root is not much higher than that of the trivial subtree where the same node is replaced by a leaf, then the tree is pruned at this node.

  11. It may be necessary to adjust the performance thresholds used in (2) so that the expanded training data would be more evenly distributed among the classes C j .

  12. Although the average information entropy in (10) could be viewed as a special case of the expected utility in (12) by setting u ij =  −log2 P ij , the utilities are not usually defined in terms of probabilities.

  13. Statistically derived models (Rudnyi 1996; Buzas 1997; Haq and Kibria 1997; De La Cruz-Mesia and Marshall 2003) are not of interest herein because they are typically in terms of basis function expansions with non-informative coefficients.

  14. In this method, it is assumed that the inflection points are at the midpoint of every beam and column, except for the lower columns.

  15. The thresholds that define the classes C j used in the second system would have to be adjusted so that class 1 would still represent high-performing designs.

  16. Our classifier approach has also been successfully applied to a much less intuitive design problem involving thin-walled steel columns (Liu et al. 2004). It is noted that the emphasis of that paper was on an exploration of a new nonlinear model for cold-formed steel columns; only a brief outline of a simpler form of the classifier approach was given.

Abbreviations

C j :

class j defined in terms of a performance interval

C e,neg :

class of large negative errors

C e,pos :

class of large positive errors

C e,small :

class of small errors

c(x):

classifier, defined for each parameter vector x

c exact(x):

exact classifier

c max(x):

maximum likelihood (naive Bayes) classifier

c *(x):

optimum classifier

D ⊂ Ω:

subset of designs in parameter space

D K ⊂ Ω:

designs corresponding to leaves K of a decision tree

D * :

set of high-performing design alternatives

e(x) =  g(x)− g approx(x):

error in the approximate performance function

F :

space of all possible feature vectors

F * e , D * e :

designs in feature and parameter spaces corresponding to a high likelihood of large positive errors

{F i }:

partition of feature space

f i (x):

ith feature function

f =  {f 1(x) ... f m (x)}:

feature vector of all m feature functions

fG k :

binary expression in terms of the feature vector

g(x):

performance measure

g approx(x):

approximation for the performance function

I :

average information entropy

I i :

information entropy of leaf i

j(i):

class assigned to leaf i of a decision tree

K ⊂ {1, ..., L}:

decision tree leaf indices associated with class 1 designs

l j (D):

class likelihood function for set D and class C j

M :

number of performance classes

m :

dimension of the feature vector

n :

dimension of the original design parameter vector

P ij :

conditional probability of class C j given that the feature vector lies in F i

Q =  {(f(x j ),g(x j ))}:

training data set defined in feature space

T :

decision tree

U i [{P ij },c]:

expected utility at leaf i given probabilities {P ij } and classifier c(x)

u(x):

utility of design x

u c (x):

utility function associated with classifier c(x)

u ij :

utility of a design in leaf i with performance class j

u 0u *u missed :

difference in utilities in locating and not locating a class 1 design

u eval :

negative utility associated with the cost of evaluating the performance function

u missed :

negative utility reflecting the opportunity cost of not including a class 1 design

u * :

utility of locating a class 1 design

x :

n-vector of design parameters

γ j :

thresholds used to define class performance intervals

π i :

proportion of unsupervised data in decision tree leaf i

Ω:

space of feasible design parameters

References

  • Bailey R, Bras B, Allen JK (1999) Using robust concept exploration and system dynamics models in the design of complex industrial ecosystems. Eng Optim 32(1):33-58

    Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International, Belmont

    MATH  Google Scholar 

  • Buntine W (1990) A theory of learning classification rules, PhD thesis, University of Technology, Sydney

  • Buntine W (1992) Learning classification tree. Statistics and computing 2:63–73, DOI:10.1007/BF01889584

  • Buzas JS (1997) Instrumental variable estimation in nonlinear measurement error models. Commun Stat Theory Methods 26(12):2861–2877

    MATH  MathSciNet  Google Scholar 

  • Chen W, Allen JK, Mavris D, Mistree F (1996) A concept exploration method for determining robust top-level specifications. Eng Optim 26:137–158

    Google Scholar 

  • DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York

    MATH  Google Scholar 

  • De La Cruz-Mesia R, Marshall G (2003) A Bayesian approach for nonlinear regression models with continuous errors. Commun Stat Theory Methods 32:1631–1646, DOI:10.1081/STA-120022248

  • Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Duda RO, Hart PE, Sytork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Forouraghi B (1999) On utility of inductive learning in multi-objective robust design. Artif Intell Eng Des Anal Manuf 13:27–36, DOI:10.1017/S0890060499131032

  • Gero JS, Kazakov VA (1995) Evolving building blocks for design using genetic engineering. In: IEEE international conference on evolutionary computing, pp 340–345

  • Goldberg DE (1987) Simple genetic algorithm and the minimal deceptive problem. In: Research notes in artificial intelligence, chapter 6. Morgan Kauffmann Publishers, Inc, San Francisco, pp 74–88

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, Addison-Wesley, Reading

  • Haq M, Kibria B (1997) Predictive inference for linear and multivariate linear models with MA (1) error processes. Commun Stat Theory Methods 26(2):331–353

    MATH  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Igusa T, Liu H, Schafer BW, Naiman DQ (2003) Bayesian classification trees and clustering for rapid generation and selection of design alternatives. In: Reddy RG (ed) NSF design, manufacturing, and industrial innovation research conference, January 6–9, Birmingham

  • Kovacs T (2004) Bibliography of real-world classifier systems applications. In: Bull L (ed) Applications of learning classifier systems. Springer, Berlin Heidelberg New York, pp 300–305

    Google Scholar 

  • Lee J, Hajela P (2001) Application of classifier systems in improving response surface based approximations for design optimization. Comput Struct 79:333–344, DOI:10.1016/S0045-7949(00)00132-2

    Google Scholar 

  • Liu H (2003) Bayesian classifiers for uncertainty modeling with applications to global optimization and solid mechanics problems, PhD thesis, Department of Civil Engineering, Johns Hopkins University, Baltimore

  • Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Norwell

    MATH  Google Scholar 

  • Liu H, Igusa T, Schafer BW (2004) Knowledge-based global optimization of cold-formed steel columns. Thin-Walled Struct 42:785–801, DOI:10.1016/j.tws.2004.01.001

    Google Scholar 

  • Matheus C (1991a) The need for constructive induction. In: Machine learning: proceeding of the eighth international workshop, pp 173–177

  • Matheus C (1991b) The need for constructive induction. In: Machine learning: proceedings of the eighth international workshop, pp 173–177

  • Mili F, Shen W, Martinez I, et al. (2001) Knowledge modeling for design decisions. Artif Intell Eng 15:153–164, DOI:10.1016/S0954-1810(01)00013-9

    Google Scholar 

  • Myers RH, Khuri AI, Carte, WH (1989) Response surface methodology: 1966–1988. Technometrics 31:137–157, DOI:10.2307/1268813

    Google Scholar 

  • Perremans P (1996) Feature-based description of modular fixturing elements: The key to an expert system for the automatic design of the physical fixture. Adv Eng Software 25:19–27, DOI:10.1016/0965-9978(95)00082-8

    Google Scholar 

  • Quinlan JR (1993) C45: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  • Reckhow KH (1999) Water quality prediction, mechanism, and probability network models. Canad J Fish Aquat Sci 56:1150–1158, DOI:10.1139/cjfas-56-7-1150

    Google Scholar 

  • Rosenman MA (1997) The generation of form using an evolutionary approach. In: Dasgupta D, Michalewicz Z (eds) Evolutionary algorithms in engineering applications. Springer, Berlin Heidelberg New York, pp 69–86

    Google Scholar 

  • Rudnyi EB (1996) Statistical model of systematic errors: linear error model. Chemometrics and intelligent laboratory systems 34:41–54, DOI:10.1016/0169-7439(96)00004-4

  • Salustri FA, Venter RD (1992) An axiomatic theory of engineering design information. Eng Comput 8:197–211, DOI:10.1007/BF01194322

    Google Scholar 

  • Schwabacher M, Ellman T, Hirsh H (1998) Learning to set up numerical optimizations of engineering designs. Artif Intell Eng Des Anal Manuf 12:173-192, DOI:10.1017/S0890060498122084

    Google Scholar 

  • Stahovich TF, Bal H (2002) An inductive approach to learning and reusing design strategies. Res Eng Des 13:109–121, DOI:10.1007/s00163-001-0010-9

    Google Scholar 

  • Varadarajan S, Chen W, Pelka CJ (2000) Robust concept exploration of propulsion systems with enhanced model approximation capabilities. Eng Optim 32:309–334

    Google Scholar 

  • Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 4:67–82, DOI:10.1109/4235.585893

    Google Scholar 

  • Wyse N, Dubes R, Jain A (1980) A critical evaluation of intrinsic dimensionality algorithms. In: Pattern recognition in practice. Morgan Kaufmann Publishers, San Francisco, pp 415–425

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation at the Johns Hopkins University under Grant Number DMI-0087032. This research support is gratefully acknowledged. The authors would also like to thank the reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Igusa.

Appendix

Appendix

1.1 Derivation details for the generalized heuristic

The difference between the minimum entropy and expected utility heuristics for tree construction can be reconciled by examining the incremental value of information. The basic idea is to use Bayesian analysis to compare the value of information obtained in the cases for finite and infinite supervised data. We begin by defining n ij and P ij n ij /N i as the number and observed proportion of class j designs in leaf i, where N i =  ∑ M j=1 n ij is the total number of data in leaf i. If standard Bayesian conjugate distributions are used, then the posterior distribution of the actual proportions π ij of class j designs in leaf i would be Dirichlet with joint density (DeGroot 1970)

$$\phi_{i} {\left({{\left\{{\pi_{{ij}}}\right\}}}\right)} = \Gamma {\left({N_{i}}\right)}{\prod\limits_{j = 1}^M}\frac{{\pi^{{n_{{ij}} - 1}}_{{ij}}}}{{\Gamma {\left({n_{{ij}}}\right)}}}$$
(A1)

The mean value of π ij is approximately equal to the observed proportion P ij and the standard deviation of π ij , which is a measure of the uncertainty in the actual proportion, is inversely proportional to the square root of the number of data N i . The expected utility U i for leaf i given the N i data is

$$E{\left[ {u_{i}} | c,N_{i}\;\hbox{data} \right]} = U_{i} {\left[ {{\left\{{P_{{ij}}}\right\}},c} \right]}$$
(A2)

where the functional form of U i is given in (14) in terms of the classifier c.

Next, we consider the effect of additional data on classifier accuracy. What is needed here is an explicit decision rule for redefining the classifier c as data is collected. Equations 15 and 16 are two examples of such explicit rules in which the observed proportions P ij are recomputed and the classifiers c max and c * are redefined using the new values for P ij . The topology and the binary relations defining the tree T and its leaves are fixed at this stage; improvements to the tree are considered later. Given a decision rule for redefining c, it can be shown that, in the limit of infinite data, the expected utility for leaf j would be

$$E{\left[ {u_{i}} | c,\infty\;\hbox{data} \right]} = {\int\limits_0^1} \cdots {\int\limits_0^1}U_{i} {\left[ {{\left\{{\pi_{{ij}}}\right\}},c} \right]}\phi_{i} {\left({{\left\{{\pi_{{ij}}}\right\}}}\right)}{\rm d}\pi_{{i1}} \cdots {\rm d}\pi_{{iM}} $$
(A3)

It follows from the definitions that as the number of data N i in leaf i becomes large, the preceding expected utilities must agree:

$$E{\left[u_{i} | c,\infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c,N_{i}\;\hbox{data}\right]} \to 0 \quad \hbox{as}\;N_{i} \to \infty $$
(A4)

In the special case where the maximum likelihood classifier c max in (15) is used, it can be shown, by a slight generalization of the arguments presented in (Buntine 1990), that the difference in the preceding expected utilities has the following asymptotic result:

$$E{\left[ u_{i} | c_{{\max}} ,\infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c_{{\max}}, N_{i}\;\hbox{data} \right]} \sim \exp {\left({- \lambda_{i} N_{i}}\right)}$$
(A5)

This shows that the difference in expected utilities decays exponentially with the number of data N i . The exponential rate λ i is

$$\frac{{\lambda_{i}}}{{\log 2}} = - I_{i} {\left({{\left\{{P_{{ij}}}\right\}}}\right)} + 1$$
(A6)

If the product of the differences in expected utilities is examined, then the following asymptotic result is obtained

$${\prod\limits_{i = 1}^L}{\left({E{\left[ {u_{i}} |c_{{\max}}, \infty\;\hbox{data} \right]} - E{\left[ {u_{i}} | c_{{\max}}, N_{i}\;\hbox{data} \right]}}\right)} \sim \exp {\left({- \lambda N}\right)}$$
(A7)

where N =  ∑ L i=1 N i is the total number of supervised data for all L leaves and the exponential rate is

$$\frac{\lambda}{{\log 2}} = - I + 1$$
(A8)

The preceding equations provide useful relationships between the expected utility of the maximum likelihood classifier and the information entropy.

In the next stage of the analysis, changes in the tree T are considered. Following the information-theoretic viewpoint, the accuracy of the tree is reflected by how fully it reflects the information in the supervised data. With this viewpoint, the difference of utilities in (A7) is a measure of the information in the N data that is not incorporated into the tree. To increase the accuracy of the tree, it is necessary to make the rate λ as large as possible. As shown in (A8), this is equivalent to minimizing the information entropy. Hence, the information entropy heuristic can be tied to a Bayesian analysis of the utility of information.

With these results, we are now able to obtain a tree construction heuristic based on the utility function in Sect. 4.1 for the generation of design alternatives. Starting from the Bayesian analysis results in Eqs. A1A8, the main task is to determine if an exponential rate λ i exists when the maximum likelihood classifier c max in (A5) is replaced by the classifier c * for generating design alternatives. With some approximation, it can be shown that the exponential rate remains valid for c *; the result is

$$\frac{{\lambda_{i}}}{{\log 2}} \approx - I_{i} {\left({{\left\{{{P}^{\prime}_{{ij}}}\right\}}}\right)} + \hbox{constant}$$
(A9)

where P ij are given by

$${P}^{\prime}_{{ij}} = \left\{{\begin{array}{*{20}l} {{u_{0} P_{{i1}} {\left[ {1 + {\left({u_{0} - 1}\right)}P_{{i1}}} \right]}^{{- 1}}}}& {{\hbox{if}\;j = 1}} \\ {{P_{{ij}} {\left[ {1 + {\left({u_{0} - 1}\right)}P_{{j1}}} \right]}^{{- 1}}}}& {{\hbox{otherwise}}}\\ \end{array}} \right.$$
(A10)

with u 0 defined after (16). The revised proportion P i1 can be interpreted as the proportion of class 1 design at leaf i if the number of class 1 designs is weighted by the utility-related value u 0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Igusa, T. Feature-based classifiers for design optimization. Res Eng Design 17, 189–206 (2007). https://doi.org/10.1007/s00163-006-0024-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00163-006-0024-4

Keywords

Navigation