Abstract
We present a design optimization method for systems with high-dimensional parameter spaces using inductive decision trees. The essential idea is to map designs into a relatively low-dimensional feature space, and to derive a classifier to search for high-performing design alternatives within this space. Unlike learning classifier systems that were pioneered by Holland and Goldberg, classifiers defined by inductive decision trees were not originally developed for design optimization. In this paper, we explore modifications to such classifiers to make them more effective in the optimization problem. We expand the notions of feature space, generalize the tree construction heuristic beyond the original information-theoretic definitions, increase the reliance on domain expertise, and facilitate the transfer of design knowledge between related systems. There is a relatively small but rapidly growing body of work in the use of inductive trees for engineering design; the method presented herein is complementary to this research effort.
Similar content being viewed by others
Notes
All columns have the same length and cross-section properties; the beams are 50% longer than the columns with a second moment of cross-section area that is twice that of the columns; the same elastic material is used for all beams and columns.
C j n is the binomial coefficient of n and j and the reduction by the factor of 2 is due to symmetry.
Standard finite-element analysis is used to compute g.
We can set γ0 = ∞ and γ M = − ∞ so that classes C 1 and C M would correspond to the highest and lowest performing designs.
Here \({\overline g }\) and s g are the sample average and standard deviation of the performance values g(x j ) of the training data set, and the multiplier of 0.8 was chosen so that the three classes had approximately the same number of supervised data.
This is also known as the naive Bayes classifier and is based on the maximum likelihood principle.
For relatively small m, the knowledge abstraction level and the associated evaluation effort of the feature functions are usually high; the corresponding features are termed high-level features (as opposed to low-level features). It is preferable, however, to separate any quantities that require substantial evaluation effort from the feature coordinates and to treat them separately in the knowledge modeling process. For instance, low-fidelity approximate models g approx(x) which are simpler than the original performance function g(x) may be too complex to satisfy the simplicity attribute for the feature vector. It is shown in Sect. 4.2 how g approx(x) can be used in the design problem.
This is simply the ratio of the column height over width. Other measures of slenderness will depend on the type of column being examined. For thin-walled steel members, such measures would be in terms of the first and second moment of cross-section area and other aggregated dimensional quantities (Liu et al. 2004).
The domain expert that we use is a novice structural engineer with knowledge of frame behavior gained primarily from a graduate-level course in structural mechanics. This level of expertise is sufficient for meaningful interaction with the design process as illustrated in Fig. 1.
For instance, reduced-error pruning repeatedly analyzes the classifier resolution, quantified by an average information entropy, at each non-leaf node. If the resolution of the subtree with the non-leaf node as the root is not much higher than that of the trivial subtree where the same node is replaced by a leaf, then the tree is pruned at this node.
It may be necessary to adjust the performance thresholds used in (2) so that the expanded training data would be more evenly distributed among the classes C j .
In this method, it is assumed that the inflection points are at the midpoint of every beam and column, except for the lower columns.
The thresholds that define the classes C′ j used in the second system would have to be adjusted so that class 1 would still represent high-performing designs.
Our classifier approach has also been successfully applied to a much less intuitive design problem involving thin-walled steel columns (Liu et al. 2004). It is noted that the emphasis of that paper was on an exploration of a new nonlinear model for cold-formed steel columns; only a brief outline of a simpler form of the classifier approach was given.
Abbreviations
- C j :
-
class j defined in terms of a performance interval
- C e,neg :
-
class of large negative errors
- C e,pos :
-
class of large positive errors
- C e,small :
-
class of small errors
- c(x):
-
classifier, defined for each parameter vector x
- c exact(x):
-
exact classifier
- c max(x):
-
maximum likelihood (naive Bayes) classifier
- c *(x):
-
optimum classifier
- D ⊂ Ω:
-
subset of designs in parameter space
- D K ⊂ Ω:
-
designs corresponding to leaves K of a decision tree
- D * :
-
set of high-performing design alternatives
- e(x) = g(x)− g approx(x):
-
error in the approximate performance function
- F :
-
space of all possible feature vectors
- F * e , D * e :
-
designs in feature and parameter spaces corresponding to a high likelihood of large positive errors
- {F i }:
-
partition of feature space
- f i (x):
-
ith feature function
- f = {f 1(x) ... f m (x)}:
-
feature vector of all m feature functions
- f ∈G k :
-
binary expression in terms of the feature vector
- g(x):
-
performance measure
- g approx(x):
-
approximation for the performance function
- I :
-
average information entropy
- I i :
-
information entropy of leaf i
- j(i):
-
class assigned to leaf i of a decision tree
- K ⊂ {1, ..., L}:
-
decision tree leaf indices associated with class 1 designs
- l j (D):
-
class likelihood function for set D and class C j
- M :
-
number of performance classes
- m :
-
dimension of the feature vector
- n :
-
dimension of the original design parameter vector
- P ij :
-
conditional probability of class C j given that the feature vector lies in F i
- Q = {(f(x j ),g(x j ))}:
-
training data set defined in feature space
- T :
-
decision tree
- U i [{P ij },c]:
-
expected utility at leaf i given probabilities {P ij } and classifier c(x)
- u(x):
-
utility of design x
- u c (x):
-
utility function associated with classifier c(x)
- u ij :
-
utility of a design in leaf i with performance class j
- u 0 = u * + u missed :
-
difference in utilities in locating and not locating a class 1 design
- −u eval :
-
negative utility associated with the cost of evaluating the performance function
- −u missed :
-
negative utility reflecting the opportunity cost of not including a class 1 design
- u * :
-
utility of locating a class 1 design
- x :
-
n-vector of design parameters
- γ j :
-
thresholds used to define class performance intervals
- π i :
-
proportion of unsupervised data in decision tree leaf i
- Ω:
-
space of feasible design parameters
References
Bailey R, Bras B, Allen JK (1999) Using robust concept exploration and system dynamics models in the design of complex industrial ecosystems. Eng Optim 32(1):33-58
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International, Belmont
Buntine W (1990) A theory of learning classification rules, PhD thesis, University of Technology, Sydney
Buntine W (1992) Learning classification tree. Statistics and computing 2:63–73, DOI:10.1007/BF01889584
Buzas JS (1997) Instrumental variable estimation in nonlinear measurement error models. Commun Stat Theory Methods 26(12):2861–2877
Chen W, Allen JK, Mavris D, Mistree F (1996) A concept exploration method for determining robust top-level specifications. Eng Optim 26:137–158
DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
De La Cruz-Mesia R, Marshall G (2003) A Bayesian approach for nonlinear regression models with continuous errors. Commun Stat Theory Methods 32:1631–1646, DOI:10.1081/STA-120022248
Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin Heidelberg New York
Duda RO, Hart PE, Sytork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Forouraghi B (1999) On utility of inductive learning in multi-objective robust design. Artif Intell Eng Des Anal Manuf 13:27–36, DOI:10.1017/S0890060499131032
Gero JS, Kazakov VA (1995) Evolving building blocks for design using genetic engineering. In: IEEE international conference on evolutionary computing, pp 340–345
Goldberg DE (1987) Simple genetic algorithm and the minimal deceptive problem. In: Research notes in artificial intelligence, chapter 6. Morgan Kauffmann Publishers, Inc, San Francisco, pp 74–88
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning, Addison-Wesley, Reading
Haq M, Kibria B (1997) Predictive inference for linear and multivariate linear models with MA (1) error processes. Commun Stat Theory Methods 26(2):331–353
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New York
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Igusa T, Liu H, Schafer BW, Naiman DQ (2003) Bayesian classification trees and clustering for rapid generation and selection of design alternatives. In: Reddy RG (ed) NSF design, manufacturing, and industrial innovation research conference, January 6–9, Birmingham
Kovacs T (2004) Bibliography of real-world classifier systems applications. In: Bull L (ed) Applications of learning classifier systems. Springer, Berlin Heidelberg New York, pp 300–305
Lee J, Hajela P (2001) Application of classifier systems in improving response surface based approximations for design optimization. Comput Struct 79:333–344, DOI:10.1016/S0045-7949(00)00132-2
Liu H (2003) Bayesian classifiers for uncertainty modeling with applications to global optimization and solid mechanics problems, PhD thesis, Department of Civil Engineering, Johns Hopkins University, Baltimore
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Norwell
Liu H, Igusa T, Schafer BW (2004) Knowledge-based global optimization of cold-formed steel columns. Thin-Walled Struct 42:785–801, DOI:10.1016/j.tws.2004.01.001
Matheus C (1991a) The need for constructive induction. In: Machine learning: proceeding of the eighth international workshop, pp 173–177
Matheus C (1991b) The need for constructive induction. In: Machine learning: proceedings of the eighth international workshop, pp 173–177
Mili F, Shen W, Martinez I, et al. (2001) Knowledge modeling for design decisions. Artif Intell Eng 15:153–164, DOI:10.1016/S0954-1810(01)00013-9
Myers RH, Khuri AI, Carte, WH (1989) Response surface methodology: 1966–1988. Technometrics 31:137–157, DOI:10.2307/1268813
Perremans P (1996) Feature-based description of modular fixturing elements: The key to an expert system for the automatic design of the physical fixture. Adv Eng Software 25:19–27, DOI:10.1016/0965-9978(95)00082-8
Quinlan JR (1993) C45: Programs for machine learning. Morgan Kaufmann, San Mateo
Reckhow KH (1999) Water quality prediction, mechanism, and probability network models. Canad J Fish Aquat Sci 56:1150–1158, DOI:10.1139/cjfas-56-7-1150
Rosenman MA (1997) The generation of form using an evolutionary approach. In: Dasgupta D, Michalewicz Z (eds) Evolutionary algorithms in engineering applications. Springer, Berlin Heidelberg New York, pp 69–86
Rudnyi EB (1996) Statistical model of systematic errors: linear error model. Chemometrics and intelligent laboratory systems 34:41–54, DOI:10.1016/0169-7439(96)00004-4
Salustri FA, Venter RD (1992) An axiomatic theory of engineering design information. Eng Comput 8:197–211, DOI:10.1007/BF01194322
Schwabacher M, Ellman T, Hirsh H (1998) Learning to set up numerical optimizations of engineering designs. Artif Intell Eng Des Anal Manuf 12:173-192, DOI:10.1017/S0890060498122084
Stahovich TF, Bal H (2002) An inductive approach to learning and reusing design strategies. Res Eng Des 13:109–121, DOI:10.1007/s00163-001-0010-9
Varadarajan S, Chen W, Pelka CJ (2000) Robust concept exploration of propulsion systems with enhanced model approximation capabilities. Eng Optim 32:309–334
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann Publishers, San Francisco
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 4:67–82, DOI:10.1109/4235.585893
Wyse N, Dubes R, Jain A (1980) A critical evaluation of intrinsic dimensionality algorithms. In: Pattern recognition in practice. Morgan Kaufmann Publishers, San Francisco, pp 415–425
Acknowledgments
This material is based upon work supported by the National Science Foundation at the Johns Hopkins University under Grant Number DMI-0087032. This research support is gratefully acknowledged. The authors would also like to thank the reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Derivation details for the generalized heuristic
The difference between the minimum entropy and expected utility heuristics for tree construction can be reconciled by examining the incremental value of information. The basic idea is to use Bayesian analysis to compare the value of information obtained in the cases for finite and infinite supervised data. We begin by defining n ij and P ij = n ij /N i as the number and observed proportion of class j designs in leaf i, where N i = ∑ M j=1 n ij is the total number of data in leaf i. If standard Bayesian conjugate distributions are used, then the posterior distribution of the actual proportions π ij of class j designs in leaf i would be Dirichlet with joint density (DeGroot 1970)
The mean value of π ij is approximately equal to the observed proportion P ij and the standard deviation of π ij , which is a measure of the uncertainty in the actual proportion, is inversely proportional to the square root of the number of data N i . The expected utility U i for leaf i given the N i data is
where the functional form of U i is given in (14) in terms of the classifier c.
Next, we consider the effect of additional data on classifier accuracy. What is needed here is an explicit decision rule for redefining the classifier c as data is collected. Equations 15 and 16 are two examples of such explicit rules in which the observed proportions P ij are recomputed and the classifiers c max and c * are redefined using the new values for P ij . The topology and the binary relations defining the tree T and its leaves are fixed at this stage; improvements to the tree are considered later. Given a decision rule for redefining c, it can be shown that, in the limit of infinite data, the expected utility for leaf j would be
It follows from the definitions that as the number of data N i in leaf i becomes large, the preceding expected utilities must agree:
In the special case where the maximum likelihood classifier c max in (15) is used, it can be shown, by a slight generalization of the arguments presented in (Buntine 1990), that the difference in the preceding expected utilities has the following asymptotic result:
This shows that the difference in expected utilities decays exponentially with the number of data N i . The exponential rate λ i is
If the product of the differences in expected utilities is examined, then the following asymptotic result is obtained
where N = ∑ L i=1 N i is the total number of supervised data for all L leaves and the exponential rate is
The preceding equations provide useful relationships between the expected utility of the maximum likelihood classifier and the information entropy.
In the next stage of the analysis, changes in the tree T are considered. Following the information-theoretic viewpoint, the accuracy of the tree is reflected by how fully it reflects the information in the supervised data. With this viewpoint, the difference of utilities in (A7) is a measure of the information in the N data that is not incorporated into the tree. To increase the accuracy of the tree, it is necessary to make the rate λ as large as possible. As shown in (A8), this is equivalent to minimizing the information entropy. Hence, the information entropy heuristic can be tied to a Bayesian analysis of the utility of information.
With these results, we are now able to obtain a tree construction heuristic based on the utility function in Sect. 4.1 for the generation of design alternatives. Starting from the Bayesian analysis results in Eqs. A1–A8, the main task is to determine if an exponential rate λ i exists when the maximum likelihood classifier c max in (A5) is replaced by the classifier c * for generating design alternatives. With some approximation, it can be shown that the exponential rate remains valid for c *; the result is
where P ′ ij are given by
with u 0 defined after (16). The revised proportion P ′ i1 can be interpreted as the proportion of class 1 design at leaf i if the number of class 1 designs is weighted by the utility-related value u 0.
Rights and permissions
About this article
Cite this article
Liu, H., Igusa, T. Feature-based classifiers for design optimization. Res Eng Design 17, 189–206 (2007). https://doi.org/10.1007/s00163-006-0024-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00163-006-0024-4