Abstract
Unlocking the mystery of natural phenomena is a universal objective in scientific research. The rules governing a phenomenon can most often be learned by observing it under a sufficiently large number of conditions that are sufficiently high in resolution. The general knowledge discovery process is not always easy or efficient, and even if knowledge is produced it may be hard to understand, interpret, validate, remember, and use. Monotonicity is a pervasive property in nature: it applies when each predictor variable has a nonnegative effect on the phenomenon under study. Due to the monotonicity property, being able to observe the phenomenon under specifically selected conditions may increase the accuracy and completeness of the knowledge at a faster rate than a passive observer who may not receive the pieces relevant to the puzzle soon enough. This scenario can be thought of as learning by successively submitting queries to an oracle which responds with a Boolean value (phenomenon is present or absent). In practice, the oracle may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases. Our main goal is to pinpoint the queries that minimize the total number of queries used to completely reconstruct all of the underlying rules defined on a given finite set of observable conditions V = {0,1}n. We summarize the optimal query selections in the simple form of selection criteria, which are near optimal and only take polynomial time (in the number of conditions) to compute. Extensive unbiased empirical results show that the proposed selection criterion approach is far superior to any of the existing methods. In fact, the average number of queries is reduced exponentially in the number of variables n and more than exponentially in the oracle’s error rate.
Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 149–192, 2006.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. Ayer, H.D. Brunk, G.M. Ewing, W.T. Reid, and E. Silverman, “An Empirical Distribution Function for Sampling with Incomplete Information,” Annals of Mathematical Statistics, Vol. 26, pp. 641–647, 1955.
A. Ben-David. “Automatic Generation of Symbolic Multiattribute Ordinal Knowledge-Based DSSs: Methodology and Applications,” Decision Sciences, Vol. 23, No. 6, pp. 1357–1372, 1992.
A. Ben-David, “Monotonicity Maintenance in Information-Theoretic Machine Learning Algorithms,” Machine Learning, Vol. 19, No. 1, pp. 29–43, 1995.
J.C. Bioch and T. Ibaraki, “Complexity of Identification and Dualization of Positive Boolean Functions,” Information and Computation, Vol. 123 pp. 50–63, 1995.
D.A. Bloch and B. W. Silverman, “Monotone Discriminant Functions and Their Applications in Rheumatology,” Journal of the American Statistical Association, Vol. 92, No. 437, pp. 144–153, 1997.
H. Block, S. Qian, and A. Sampson, “Structure Algorithms for Partially Ordered Isotonic Regression,” Journal of Computational and Graphical Statistics, Vol. 3, No. 3, pp. 285–300, 1994.
E. Boros, P.L. Hammer, and J.N. Hooker, “Predicting Cause-Effect Relationships from Incomplete Discrete Observations,” SIAM Journal on Discrete Mathematics, Vol. 7, No. 4, pp. 531–543, 1994.
E. Boros, P.L. Hammer, and J.N. Hooker, “Boolean Regression,” Annals of Operations Research, Vol. 58, pp. 201–226, 1995.
E. Boros, P.L. Hammer, T. Ibaraki., and K. Makino, “Polynomial-Time Recognition of 2-Monotonic Positive Boolean Functions Given by an Oracle,” SIAM Journal on Computing, Vol. 26, No. 1, pp. 93–109, 1997.
V. Chandru and J.N. Hooker, “Optimization Methods for Logical Inference,” John Wiley & Sons, New York, NY, USA, 1999.
R. Church, “Numerical Analysis of Certain Free Distributive Structures,” Duke Mathematical Journal, Vol. 6, pp. 732–734, 1940.
R. Church, “Enumeration by Rank of the Free Distributive Lattice with 7 Generators,” Notices of the American Mathematical Society, Vol. 11 pp. 724, 1965.
D. A. Cohn, “Neural Network Exploration Using Optimal Experiment Design,” Neural Networks, Vol. 9, No. 6, pp. 1071–1083, 1996.
D.A. Cohn, “Minimizing Statistical Bias with Queries,” A.I. Memo No. 1552, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, 1995.
T.H. Cormen, C.H. Leiserson, and R.L. Rivest, “Introduction to Algorithms,” The MIT Press, Cambridge, MA, USA, 1997.
R. Dedekind, R, “Ueber Zerlegungen von Zahlen durch ihre Grössten Gemeinsamen Teiler,” Festschrift Hoch. Brauhnschweig u. ges Werke II, pp. 103–148, 1897.
T. Eiter and G. Gottlob, “Identifying the Minimal Transversals of a Hypergraph and Related Problems,” SIAM Journal on Computing, Vol. 24, No. 6, pp. 1278–1304, 1995.
K. Engel. Encyclopedia of Mathematics and its Applications 65: Sperner Theory,” Cambridge University Press, Cambridge, MA, USA, 1997.
V.V. Federov, “Theory of Optimal Experiments,” Academic Press, New York, NY, USA, 1972.
G. Felici and K. Truemper, “A MINSAT Approach for Learning in Logic Domains,” INFORMS Journal on Computing, Vol. 14, No. 1, pp. 20–36, 2002.
M.L. Fredman and L. Khachiyan, “On the Complexity of Dualization of Monotone Disjunctive Normal Forms,” Journal of Algorithms, Vol. 21, pp. 618–628, 1996.
D.N. Gainanov, “On One Criterion of the Optimality of an Algorithm for Evaluating Monotonic Boolean Functions,” U.S.S. R. Computational Mathematics and Mathematical Physics, Vol. 24, No. 4, pp. 176–181, 1984.
G. Hansel, “Sur Le Nombre Des Foncions Booleenes Monotones De n Variables,” C R. Acad. Sc. Paris, Vol. 262, pp. 1088–1090, 1966.
J.N. Hooker, “Logic Based Methods for Optimization,” John Wiley & Sons, New York, NY, USA, 2000.
D.G. Horvitz and D.J. Thompson, “A Generalization of Sampling without Replacement from a Finite Universe,” Journal of the American Statistical Association, Vol. 47, pp. 663–685, 1952.
D.H. Judson, “On the Inference of Semi-coherent Structures from Data,” A Master’s Thesis, University of Nevada, Reno, NV, USA, 1999.
D.H. Judson, “A Partial Order Approach to Record Linkage,” Federal Committee on Statistical Methodology Conference, November 14–16, Arlington, VA, USA, 2001.
A.V. Karzanov, “Determining the Maximal Flow in a Network by the Method of Preflows,” Soviet Mathematics Doklady, Vol. 15, pp. 434–437, 1974.
A.D. Korshunov, On the Number of Monotone Boolean Functions,” Problemy Kibernetiki, Vol. 38, pp. 5–108, 1981 (in Russian).
B. Kovalerchuk, E. Triantaphyllou, and A.S. Deshpande, “Interactive Learning of Monotone Boolean Functions,” Information Sciences, Vol. 94, pp. 87–118, 1996.
B. Kovalerchuk, E. Triantaphyllou, J.F. Ruiz, V.I. Torvik, and E. Vitayev, “The Reliability Issue of Computer-Aided Breast Cancer Diagnosis,” Computers and Biomedical Research, Vol. 33, pp. 296–313, 2000a.
B. Kovalerchuk, B. and E. Vityaev, “Data Mining in Finance,” Kluwer Academic Publishers, Boston, MA, USA, 2000b.
C.I.C. Lee, “The min-max Algorithm and Isotonic Regression,” The Annals of Statistics, Vol. 11, pp. 467–477, 1983.
D.J.C. MacKay, “Information-based Objective Functions for Active Data Selection,” Neural Computation, Vol. 4, No. 4, pp. 589–603, 1992.
K. Makino and T. Ibaraki, “A Fast and Simple Algorithm for Identifying 2-Monotonic Positive Boolean Functions,” Proceedings of ISAACS’95, Algorithms and Computation, Springer-Verlag, Berlin, Germany, pp. 291–300, 1995.
K. Makino and T. Ibaraki, “The Maximum Latency and Identification of Positive Boolean Functions,” SIAM Journal on Computing, Vol. 26, No. 5, pp. 1363–1383, 1997.
K. Makino, T. Suda, H. Ono, and T. Ibaraki, “Data Analysis by Positive Decision Trees. IEICE Transactions on Information and Systems, Vol. E82-D, No. 1, pp. 76–88, 1999.
S. Nieto-Sanchez, E. Triantaphyllou, J. Chen, and T.W. Liao, “An Incremental Learning Algorithm for Constructing Boolean Functions From Positive and Negative Examples,” Computers and Operations Research, Vol. 29, No. 12, pp. 1677–1700, 2002.
J.C. Picard, “Maximal Closure of a Graph and Applications to Combinatorial Problems,” Management Science, Vol. 22, pp. 1268–1272, 1976.
T. Robertson, F.T. Wright, and R.L. Dykstra, “Order Restricted Statistical Inference. John Wiley & Sons, New York, NY, USA, 1988
I. Shmulevich, “Properties and Applications of Monotone Boolean Functions and Stack Filters,” A Ph.D. Dissertation, Department of Electrical Engineering, Purdue University, West Lafayette, IN, USA, 1997.
N.A. Sokolov, “On the Optimal Evaluation of Monotonic Boolean Functions,” U.S.S.R. Computational Mathematics and Mathematical Physics,” Vol. 22, No. 2, pp. 207–220, 1982.
C. Tatsuoka and T. Ferguson, “Sequential Classification on Partially Ordered Sets,” Technical Report 99-05, Department of Statistics, The George Washington University, Washington, D.C., USA, 1999.
E. Triantaphyllou, “Inference of a Minimum Size Boolean Function by Using a New Efficient Branch-and-Bound Approach from Examples,” Journal of Global Optimization, Vol. 5, pp. 69–84, 1994.
E. Triantaphyllou and A.L. Soyster, “An Approach to Guided Learning of Boolean Functions,” Mathematical and Computer Modelling, Vol. 23, No. 3, pp 69–86, 1996a.
E. Triantaphyllou and A.L. Soyster, “On the Minimum Number of Logical Clauses Which Can be Inferred From Examples,” Computers and Operations Research, Vol. 23, No. 8, pp. 783–799, 1996b.
V.I. Torvik, E. Triantaphyllou, T.W. Liao and S.W. Waly, “Predicting Muscle Fatigue via Electromyography: A Comparative Study,” Proceedings of the 25th International Conference of Computers and Industrial Engineering, pp. 277–280, 1999.
V.I. Torvik and E. Triantaphyllou, “Minimizing the Average Query Complexity of Learning Monotone Boolean Functions,” INFORMS Journal on Computing, Vol. 14, No. 2, pp. 144–174, 2002.
V.I. Torvik and E. Triantaphyllou, “Guided Inference of Nested Monotone Boolean Functions. Information Sciences, Vol. 151, 171–200, 2003.
V.I. Torvik, M. Weeber, D.R. Swanson, and N.R. Smalheiser, “A Probabilistic Similarity Metric for Medline Records: A Model for Author Name Disambiguation,” J. of the Amer. Soc. For Info. Sci. and Tech. (JASIST), Vol. 56, No. 2, pp. 140–158, 2005.
V.I. Torvik and E. Triantaphyllou, “Guided Inference of Stochastic Monotone Boolean Functions”, Working Paper, 2004.
L.G. Valiant, “A Theory of the Learnable,” Communications of the ACM, Vol. 27, No. 11, pp. 1134–1142, 1984.
M. Ward, “Note on the Order of the Free Distributive Lattice,” Bulletin of the American Mathematical Society, Vol. 52, No. 135, pp. 423, 1946.
D. Wiedemann, “A Computation of the Eight Dedekind Number,” Order, Vol. 8, pp. 5–6, 1991.
E. Yilmaz, E. Triantaphyllou, J. Chen, and T.W. Liao, “A Heuristic for Mining Association Rules In Polynomial Time,” Mathematical and Computer Modelling, Vol. 37, No. 1–2, pp. 219–233, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Torvik, V.I., Triantaphyllou, E. (2006). Discovering Rules That Govern Monotone Phenomena. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_4
Download citation
DOI: https://doi.org/10.1007/0-387-34296-6_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)