INTERACTIVE LEARNING OF MONOTONE BOOLEAN FUNCTIONS

INTERACTIVE LEARNING OF MONOTONE BOOLEAN FUNCTIONS Last revision: October 21, 1995 BORIS KOVALERCHUK*, EVANGELOS TRIANTAPHYLLOU*1, ANIRUDDHA A DESHPANDE*, AND EUGENE VITYAEV** * Department of Industrial and Manufacturing Systems Engineering, 3128 CEBA Building, Louisiana State University, Baton Rouge, LA 70803-6409, U.S.A E-mail: borisk@unix1.sncc.lsu.edu, ietrian@lsuvm.sncc.lsu.edu, or adeshpa@unix1.sncc.lsu.edu ** Institute of Mathematics, Russian Academy of Science, Novosibirsk, 630090, Russia E-mail: vityaev@math.nsk.su Abstract: This paper presents some optimal interactive algorithms for learning a monotone Boolean function These algorithms are based on the fundamental Hansel lemma (Hansel, 1966) The advantage of the algorithms is that they are not heuristics, as is often the case of many known algorithms for general Boolean functions, but they are optimal in the sense of the Shannon function This paper also formulates a new problem for the joint restoration of two nested monotone Boolean functions f and f2 This formulation allows further decreasing the dialogue with an expert and restore non-monotone functions of the form f 2& f1 The effectiveness of the proposed approach is demonstrated by some illustrative computational experiments related to engineering and medical applications Key Words Learning from Examples, Boolean Functions, Monotone Boolean Functions, Shannon Function INTRODUCTION An interesting problem in machine learning is the one which involves the inference of a Boolean function from collections of positive and negative examples This is also called the logical analysis problem (Hammer and Boros, 1994) and is a special case of inductive inference This kind of knowledge extraction is desirable when one is interested in deriving a set of rules which, in turn, can be easily comprehended by a field expert In many application domains the end users not have sophisticated computer and modeling expertise As result, systems which are based on techniques such as neural networks, statistics, or linear programming, are not appealing to them nor these methods provide a plausible explanation of the underlying decision-making process On the other hand, a logical analysis approach, when it is applicable, can result in rules which are already known to the end user (thus increasing his/her confidence on the method) or lead to new discoveries In other words, a logical analysis The first two authors gratefully acknowledge the support from the Office of Naval Research (ONR) grant N00014-95-1-0639 approach has the distinct advantage of high comprehensibility when it is compared with other methods and to the discovery of new explicit knowledge The most recent advances in distinguishing between elements of two pattern sets can be classified into six distinct categories These are: a clause satisfiability approach to inductive inference by Kamath et al (1992, 1994); some modified branch-and-bound approaches of generating a small set of logical rules by Triantaphyllou et al (1994), and Triantaphyllou (1994); some improved polynomial time and NP-complete cases of Boolean function decomposition by Boros et al (1994); linear programming approaches by Wolberg and Mangasarian (1990), and Mangasarian (1995); some knowledge based learning approaches by combining symbolic and connectionist (neural networks) machine based learning as proposed by Shavlik (1994), Fu (1993), Goldman et al (1994) and Cohn et al (1994) and finally, some nearest neighbor classification approaches by Hattoru and Torii (1993), Kurita (1991), Kamgar-Parsi and Kanal (1985) From the above six categories, the first three can be considered as logical analysis approaches, since they deal with inference of Boolean functions The general approach of pure machine learning and inductive inference includes the following two steps: (i) obtaining in advance a sufficient number of examples (vectors) for different classes of observations, and (ii) formulation of the assumptions about the required mathematical structure of the example population (see, for instance, (Bongard, 1966),, Zagoruiko (1979), (Dietterich and Michalski, 1983), and (Vityaev and Moskvitin, 1993)) Human interaction is used just when one obtains new examples and formulates the assumptions The general problem of learning a Boolean function has many applications Such applications can be found in the areas of medical diagnosis, hardware diagnosis, astrophysics and finance among others as it is best demonstrated by the plethora of databases in the Machine Learning Repository in the University of California, at Irvine (Murphy and Aha, 1994) However, traditional machine learning approaches have some difficulties In particular, the size of the hypothesis space is influential in determining the sample complexity of a learning algorithm That is, the expected number of examples needed to accurately approximate a target concept The presence of bias in the selection of a hypothesis from the hypothesis space can be beneficial in reducing the sample complexity of a learning algorithm (Mitchell, 1980), (Natarajan, 1989) and (Goldman and Sloan, 1992) Usually the amount of bias in the hypothesis space H is measured in terms of the Vapnik-Chervonenkis dimension, denoted as VCdim(H), (Vapnik, 1982) and (Haussler, 1988) Theoretical results regarding the VCdim(H) are well known (Vapnik, 1982) The results in (Vapnik, 1982) are still better than some other bounds given in (Blumer, 1989) However, all these bounds are still overestimates (Haussler and Warmuth, 1993) The learning problem examined in this paper is how one can infer a Boolean function We assume that initially some Boolean vectors (input examples) are available These vectors are defined in the space {0,1}n, where n is the number of binary attributes or atoms Each such vector represents either a positive or a negative example, depending on whether it must be accepted or rejected by the target Boolean function In an interactive environment we assume that the user sarts with an initial set (which may be empty) of positive and negative examples and then he/she asks an oracle for membership classification of new examples which are selected according to some interactive guided learning strategy In (Triantaphyllou and Soyster, 1995) there is a discussion of this issue and a guided learning strategy for general Boolean functions is presented and analyzed The main challenge in inferring a target Boolean function from positive and negative examples is that the user can never be absolutely certain about the correctness of the inferred function, unless he/she has used the entire set of all possible examples which is of size 2n Apparently, even for a small value of n, this task may be practically impossible to realize Fortunately, many real life applications are governed by the behavior of a monotone system or can be described by a combination of a small number of monotone systems Roughly speaking, monotonicity means that the value of the output increases or decreases when the value of the input increases A formal definition of this concept is provided in the next section This is common, for example in many medical applications, where for instance, the severity of a condition directly depends on the magnitude of the blood pressure and body temperature within certain intervals In machine learning monotonicity offers some unique computational advantages By knowing the value of certain examples, one can easily infer the values of more examples This, in turn, can significantly expedite the learning process This paper is organized as follows The next section provides some basic definitions and results about monotone Boolean functions The third section presents some key research problems with highlights of some possible solution approaches The forth section describes the procedure for inferring a monotone Boolean function previous procedures in terms of two applications The fifth and sixth sections illustrate the Finally, the paper ends with some concluding remarks and suggestions for possible extensions SOME BASIC DEFINITIONS AND RESULTS ABOUT MONOTONE BOOLEAN FUNCTIONS Let En denote the set of all binary vectors of length n Let α and β be two such vectors Then, the vector α = (α1,α2,α3, ,αn) precedes the vector β = (β1,β2,β3, ,βn) (denoted as: α ≥ β) if and only if the following is true: αi  βi, for all  i  n If, at the same time: α ≠ β, then it is said that α strictly precedes β (denoted as: α > β) The two binary vectors α and β are said to be comparable if one of the relationships α ≥ β or β ≥ α holds A Boolean function f(x) is monotone if for any vectors α,β ∈ En, the relation f(α) ≥  f(β) follows from the fact that α ≥ β Let Mn be the set of all monotone Boolean functions defined on n variables A binary vector α of length n is said to be an upper zero of a function f(α) ∈ Mn, if f(α) = and, for any vector β such that β ≤ α, we have f(β) = Also, we shall call the number of unities (i.e., the number of the "1" elements) in vector α as its level and denote this by U(α) An upper zero α of a function f is said to be the maximal upper zero if we have U(β)U(α) for any upper zero β of the function f [Kovalerchuk, Lavkov, 1984] Analogously we can define the concepts of lower unit and minimal lower unit A binary vector α of length n is said to be a lower unit of a function f(α) ∈Mn, if f(α) = and, for any vector β from En such that β ≥ α, we get f(β) = A lower unit α of a function f is said to be the minimal lower unit if we have U(α) ≤ U(β)for any lower unit β of the function f Examples of monotone Boolean functions are: the constants and 1, the identity function f(x) = x, the disjunction (x1 ∨x2), the conjunction (x1 ∧x2), etc Any function obtained by a composition of monotone Boolean functions is itself monotone In other words, the class of all monotone Boolean functions is closed Moreover, the class of all monotone Boolean functions is one of the five maximal (pre-complete) classes in the set of all Boolean functions That is, there is no closed class of Boolean functions, containing all monotone Boolean functions and distinct from the class of monotone functions and the class of all Boolean functions The reduced disjunctive normal form (DNF) of any monotone Boolean function, distinct of and 1, does not contain negations of variables The set of functions {0, 1, (x1 ∨x2), (x1 ∧x2)} is a complete system (and moreover, a basis) in the class of all monotone Boolean functions (Alekseev, 1988) For the number ψ(n) of monotone Boolean functions depending on n variables, it is known that:  n (2-1) ψ (n) = 2  n/2   (1 + ε ( n ) ) where < ε(n) < c(logn)/n and c is a constant (see, for instance, (Kleitman, 1969), and (Alekseev, 1988)) Let a monotone Boolean function f be defined with the help of a certain operator Af (also called an oracle) which when fed with a vector α = (α1,α2,α3, ,αn), returns the value of f(α) Let F = {F} be the set of all algorithms which can solve the above problem and ϕ(F, f) be the number of accesses to the operator Af required to solve a given problem about inferring a monotone function f∈Mn Next, we introduce the Shannon function ϕ(n) as follows (Korobkov, 1965): ϕ( n ) = max ϕ( F, f) F ∈F (2-1) f ∈M n The problem examined next is that of finding all maximal upper zeros (lower units) of an arbitrary function f∈Mn with the help of a certain number of accesses to the operator Af It is shown in (Hansel, 1966) that in the case of this problem the following relation is true (known as Hansel's theorem or Hansel's lemma): n  n  +  ϕ( n ) =       n /   n /  +     Here n/2 is the floor (the closest integer number to n/2 which is no greater than n/2) (2-2) In terms of machine learning the set of all maximal upper zeros represents the border elements of the negative pattern In an analogous manner the set of all minimal lower units to represent the border of a positive pattern In way a monotone Boolean function represents two "compact patterns" It has been shown (Hansel, 1966) that restoration algorithms for monotone Boolean functions which use Hansel's lemma are optimal in terms of the Shannon function That is, they minimize the maximum time requirements of any possible restoration algorithm It is interesting to note at this point that, to the best of our knowledge, Hansel's lemma has not been translated into English, although there are numerous references of it in the non English literature (Hansel wrote his paper in French) This lemma is a one of the final results of the long term efforts in Monotone Boolean functions beginning from Dedekind (1897, in German) One approach to use monotonicity in combination with existing pattern recognition/classification approaches is to consider each one of the available points and apply the concept of monotonicity to generate many more data However, such an approach would have to explicitly consider a potentially humongous set of input observations On the other hand, these classification approaches are mostly NP-complete and very CPU time consuming Thus, the previous approach would be inefficient In (Gorbunov and Kovalerchuk, 1982) an approach for restoring a single monotone Boolean function is presented That approach implicitly considers all derivable data from a single observation by utilizing Hansel chains of data (Hansel, 1966) That algorithm is optimal in the sense of the Shannon function However, although this result has also been known in the non English literature (it was originally published in Russian), to the best of our knowledge, it has not yet been described in the English literature We will call a monotone Boolean function to be an increasing (isotone) monotone Boolean function in contrast with a decreasing monotone Boolean function A Boolean function is decreasing (antitone) monotone, if for any vectors α,β∈En, the relation f(α) ≥  f(β) follows from the fact that α ≤ β (Rudeanu, 1974, p.149) Each general discrimination (i.e., a general Boolean function) can be described in terms of several increasing gi(x1, ,xn) and decreasing hi(x1, ,xn) monotone Boolean functions (Kovalerchuk et al., 1995) That is, the following is always true: ∨ ( g ( x )∧ h ( x ) ) m q( x ) = j (2-3) j j=1 Next, let us consider the case in which q(x) = g(x) ∧h(x) Here: q+ = g+ ∩ h+, where: q+ = {x: q(x)=1}, g+ = {x: g(x)=1}, and h+ = {x: h(x)=1} Therefore, one can obtain the set of all positive examples for q as the intersection of the sets of all positive examples for the monotone functions g and h For a general function q(x), represented as in (2-3), the union of all these intersections gives the full set of positive examples: q+ = ∪q+j = ∪ (g+j ∩ h+j) Often, we not need so many separate monotone functions The union of all conjunctions, which not include negations i forms a single increasing monotone Boolean function (see, for instance, (Yablonskii, 1986) and (Alekseev, 1988)) SOME KEY PROBLEMS AND ALGORITHMS In this section we present some key problems and the main steps of algorithms for solving them PROBLEM 1: (Inference of a monotone function with no initial data) Conditions: There are no initial examples to initiate learning All examples should be obtained as a result of the interaction of the designer with an oracle (i.e., an operator Af) It is also required that the discriminant function should be a monotone Boolean function This problem is equivalent to the requirement that we consider only two compact monotone patterns These conditions are natural in many applications Such applications include the estimation of reliability ( see also the illustrative examples next in this section for problem 2) Algorithm A1: Step 1: The user is asked to confirm function monotony Step 2: Apply an iterative algorithm for generating examples and construction of the DNF representation Do so by using Hansel's lemma (to be described in section 4, see also (Hansel, 1966) and (Gorbunov and Kovalerchuk, 1982)) This algorithm is optimal according to relations (2-1) and (2-2) in section PROBLEM 2: (The connected classification problem) Conditions: We should learn to classify an arbitrary vector as a result of the interaction of the designer with an oracle (i.e., an operator Af) The discriminant functions should be a combination of some monotone Boolean functions There are some initial examples for learning for the connected classification problems simultaneously, with monotony supposition for both of them and the supposition that the positive patterns are nested Formally, the above consideration means that for all α ∈ En, the following relation is always true: f2(α)  f1(α) (where f1 and f2 are the discriminant monotone Boolean functions for the first and the second problems, respectively) The last situation is more complex than the previous one However, the use of additional information from both problems allows for the potential to accelerate the rate of learning Some Examples of Nested Problems First illustrative example: The engineering reliability problem For illustrative purposes consider the problem of classifying the states of some system by a reliability related expert This expert is assumed to have worked with this particular system for a long term and thus can serve as an "oracle" (i.e., an operator denoted as Af) States of the system are represented by binary vectors from En (binary space defined on n 0-1 attributes or characteristics) The "oracle" is assumed that can answer questions such as: "Is reliability of a given state guaranteed?" (Yes/No) or: "Is an accident for a given state guaranteed?" (Yes/No) In accordance with these questions, we pose two interrelated nested classification tasks The first one is for answering the first question while the second task is for answering the second question Next, we define the four possible patterns which are possible in this situation Task 1: Pattern 1.1: "Guaranteed reliable states of the system" (denoted as E+1) Pattern 1.2: "Reliability of the states of the system is not guaranteed" (denoted as E—1) Task 2: Pattern 2.1: "States of the system with some possibility for normal operation" (denoted as E+2) Pattern 2.2: "States of the system which guarantee an accident" (denoted as E—2) Our goal is to extract the way the system functions in the form of two discriminant Boolean functions f2 and f1 The first function is related to task 1, while the second function is related to task (as defined above) Also observe, that the following relations must be true: E+2 ⊃ E+1 and f2(α)  f1(α) for all α∈ En, describing the system state, where f1(α) and f2(α) are the discriminant monotone Boolean functions for the first and second tasks, respectively Second illustrative example: Breast cancer diagnosis For the second illustrative example consider the following nested classification problem related to breast cancer diagnosis The first sub-problem is related to the clinical question of whether a biopsy or short term follow-up is necessary or not The second sub-problem is related to the question whether the radiologist believes that the current case is highly suspicious for malignancy or not It is assumed that if the radiologist believes that the case is malignant, then he/she will also definitely recommend a biopsy More formally, these two sub-problems are defined as follows: The Clinical Management Sub-Problem: One and only one of the following two disjoint outcomes is possible 1) "Biopsy/short term follow-up is necessary", or: 2) "Biopsy/short term follow-up is not necessary" The Diagnosis Sub-Problem: Similarly as above, one and only one of two following two disjoint outcomes is possible That is, a given case is: 1) "Highly suspicious for malignancy", or: 2) "Not highly suspicious for malignancy" It can be easily seen that the corresponding states satisfy the nesting conditions of the first illustrative example Third illustrative example: Radioactivity contamination detection The third illustrative example is also diagnosis related The issue now is how to perform radioactivity contamination tests Usually, the more time demanding a test is, the more accurate the result will be Therefore, an efficient strategy would be to perform the less time demanding tests first, and if need rises, to perform the more time demanding (and also more accurate) tests later (this is analogous to the previous breast cancer problem in which case a biopsy can yield more accurate results but it is also more expensive) The corresponding two nested problems can be defined as follows: Sub-problem 1: Diagnosis of radioactivity contamination, low risk case: Pattern 1: "Surely contaminated" Pattern 2: "Not necessarily contaminated" Sub-problem 2: Very high risk for contamination: Pattern 1: "Extra detection of contamination is necessary" Pattern 2: "Extra detection of contamination is not necessary" Again, it can be easily seen that the corresponding states satisfy the nesting conditions of the first illustrative example Next the outline of an algorithm for solving this kind of problems is presented Algorithm A2: Step 1: The user is asked to confirm the monotony for both tasks (i.e., the monotonicity of the functions underlying the patterns related to the previous two tasks) Step 2: Testing of monotony of the initial examples for both tasks Step 3: Reject the examples which violate monotony Step 4: Restoration of f(α) values for the elements of Hansel's chains, using monotony property and known examples (Hansel, 1966; Gorbunov and Kovalerchuk, 1982), see also section Step 5: Apply a dual iterative algorithm for the additional examples which were generated based on Hansel's lemma Next, we discuss step in more detail: Step 5.1: Generate the next vector αi1 for the interaction with Af1 Step 5.2: Generate the next vector αi2 for the interaction with Af2 in the same way as for Af1 Step 5.3: Estimation of the numbers of vectors (N1,1j) for which we can compute f1(α) and f2(α) without asking oracles Af1 and Af2 by temporarily assuming that fj(αi1) = (for j = 1,2), if f2(αi1) = e, where e stands for empty value Step 5.4: Estimation of the numbers of vectors (N1,0j) for which we can compute f1(α) and f2(α) without asking oracles Af1 and Af2 by temporarily assuming that fj(αi1) = (for j = 1,2), if f2(αi1) = e Step 5.5: Estimation of the numbers of vectors (N2,1j) for which we can compute f1(α) and f2(α) without asking oracles Af1 and Af2 by temporarily assuming that fj(αi2) = (for j = 1,2), if f1(αi2) = e Step 5.6: Estimation of the numbers of vectors (N2,0j) for which we can compute f1(α) and f2(α) without asking oracles Af1 and Af2 by temporarily assuming that fj(αi2) = (for j = 1,2), if f1(αi2) = e Step 5.7: Choose the variant with the maximal number of vectors from steps 5.3-5.6 in order to choose which example, α1 or α2, should be sent for the correct classification by the appropriate oracle Comment: In step 5.7 we realize a local algorithm This means that we consider just suppositions about fj(αi1) and fj(αi2) We not simultaneously consider further suppositions about fj(αi+1,1), fj(αi+1,2), fj(αi+2,1), fj(αi+2,2), etc for the next possible results of interaction However, this possible extension of algorithm A2 may decrease the number of interactions, but it also leads to more computations AN ALGORITHM FOR RESTORING A MONOTONE BOOLEAN FUNCTION 4.1 General scheme of the algorithm Next we present algorithm RESTORE for the interactive restoration of a monotone Boolean function and two procedures GENERATE and EXPAND in a pseudo programming language 10 Table Dialogue Sequences for Experiment I # Vector Chain 1.1 (01100) 1.2 (11100) Chain 2.1 (01010) 2.2 (11010) Chain 3.1 (11000) 3.2 (11001) Chain 4.1 (10010) 4.2 (10110) Chain 5.1 (10100) 5.2 (10101) Chain 6.1 (00010) 6.2 (00110) 6.3 (01110) 6.4 (11110) Chain 7.1 (00100) 7.2 (00101) 7.3 (01101) 7.4 (11101) Chain 8.1 (01000) 8.2 (01001) 8.3 (01011) 8.4 (11011) Chain 9.1 (10000) 9.2 (10001) 9.3 (10011) 9.4 (10111) Chain 10 10.1 (00000) 10.2 (00001) 10.3 (00011) 10.4 (00111) 10.5 (01111) 10.6 (11111) Total Calls Extension 00 f1~ f2~ f1^ f2^ 10 11 f1 f2 ψ 11 1* 1* 1* 1.2;6.3;7.3 6.4;7.4 6.4;7.4 5.1;3.1 1* 1* 1 1 1* 1* 0* 1* 1* 2.2;6.3;8.3 6.4;8.4 6.1;8.1 3.1 1* 0* 1* 1* 0* 1* 1* 1* 1* 3.2 7.4;8.4 8.1;9.1 8.2;9.2 1 1* 1* 0* 1* 1* 4.2;9.3 6.4;9.4 6.1;9.1 6.2;5.1 1* 0* 1* 1* 0* 1* 1* 1* 1* 5.2 7.4;9.4 7.1;9.1 7.2;9.2 1 1* 1 1* 0* 1* 1 1* 1 0* 0* 1* 6.2;10.3 6.3;10.4 6.4;10.5 10.6 10.1 7.1 0* 1 1* 1 0* 1* 1 0* 1* 1 1* 1 1* 1 0* 0* 1* 7.2;10.4 7.3;10.4 7.4;10.5 5.6 10.1 10.2 8.2;10.2 1 1 1* 1 1 1 1* 1 0* 1* 1 1* 1 1* 1 8.2 8.3 8.4 10.6 10.1 10.2 10.5;10.3 9.3 0* 1 1* 1 0* 1 1* 1 0* 1* 1 1* 1 1* 1 9.2 9.3 9.4 10.6 10.1 10.2 10.3 10.4 0* 1 1* 1 0* 1* 1 0* 1* 1 1* 1 1 13 0* 1* 1 13 0 1 13 10.2 10.3 10.4 10.5 10.6 1* 1 1 0* 1* 1 13 1* 1 1 22 0 1* 1 19 1* 1 Next x1 was restored as a function ϕ(w1,w2,w3) In this case the exhaustive search would require = calls  1   On the other hand, Hansel's lemma provides an upper limit of   +   = calls This value is 1.33  3  3 times less than what the exhaustive search would had required Although the previous numbers of calls are not excessively high, they still provide a sense of the potential of significant savings which can be achieved if monotony is established and utilized The numbers in this example are supposed only to be for illustrative purposes in clarifying the proposed methodology Next, we present the optimal dialogue in restoring the ϕ(w1,w2,w3) function Chain Vectors: Value of ϕ 1* Chain Chain 0* 1* 0* 1 In the above illustration "1*" and "0*" indicate answers given directly by the expert For instance, during the dialogue the expert decided that: ϕ(010) = 1, ϕ(101) = 1, and ϕ(100) = ϕ(001) = The values without an asterisk (i.e., when we only have "1" or "0") were obtained by utilizing the monotony property Recall that these three Hansel chains for E3 were constructed in section (where the chains for E3 were determined) In this experiment the four calls about the function values for the examples (010), (100), (101) and (001) were sufficient to lead to full restoration Monotony allowed us to extend the previous four values for other For instance, from the expert's testimony that ϕ(010) = 1, it is derived that ϕ(110) = 1, and from examples ϕ(001)=0, it is derived that ϕ(000) = Observe that these calls to the expert are times less than the maximum number of calls (i.e., 8) and 1.5 times less than the guaranteed number of calls provided by Hansel's lemma (which is equal to 6) The above approach leads to the full definition of the functions ϕ and ψ for level as follows: x1 = ϕ(w1,w2,w3,) = w2 ∨w1w3 ∨ w2w3 = w2 ∨w1w3, x2 = ψ(y1,y2,y3,y4,y5) = y1y2∨y2y3∨y2y4 ∨y1y3∨y1y4 ∨y2y3y4 ∨y2y3y5 ∨y2∨y1∨y3y4y5 = (5-1) and = y2∨y1∨y3y4y5 (5-2) The above Boolean expressions were determined from the information depicted in table This is explained in 20 detail in section 5.3 Similarly, table suggests that for level we have the target functions of x1, x2, x3, x4, x5 for the biopsy sub-problem to be defined as follows: f1(x) = x2x3∨x2x4∨x1x2∨x1x4∨x1x3∨x3x4∨x3∨x2x5∨x1x5∨x5 = = x2x4∨x1x2∨x1x4∨x3∨x5 (5-3) and for the cancer sub-problem to be defined as: f2(x) = x2x3∨x1x2x4∨x1x2∨x1x3x4∨x1x3∨x3x4∨x3∨x2x5∨x1x5∨x4x5 = = x1x2∨x3∨x2x5∨x1x5∨x4x5 = x1x2∨x3∨(x2∨x1∨x4)x5 (5-4) Next we compare the two functions f1 and f2 Observe that f1(x) = A ∨ (x2∨x1)x4∨ x5, and f2(x) = A ∨ (x2∨x1∨x4)x5, where A = x1x2∨x3 Hence, these two functions differ only in the parts (x2∨x1)x4∨x5 and (x2∨x1∨x4)x5 The simplification of the disjunctive normal form (DNF) expressions in (5-1) to (5-4) allowed us to exclude some conjunctions, which are not minimal lower units For instance, in (5-2) the term y1y4 is not a maximal lower unit, because y1 covers it Thus, the right hand side parts in expressions (5-1) to (5-4) form the minimal DNFs Next we present a superposition of expressions (5-1) to (5-4) for f1 and f2 as the proposed final formulas for performing biopsy: f1(x) = x2x4∨x1x2∨x1x4 ∨x3∨x5 = = (y2∨y1∨y3y4y5)x4 ∨(w2∨w1w3)(y2∨y1∨y3y4y5) ∨(w2∨w1w3)x4∨x3∨x5, (5-5) and for cancer: f2(x) = x1x2∨x3∨x2x5∨x1x5∨x4x5 = = (w2∨w1w3)(y2∨y1∨y3y4y5) ∨x3∨(y2∨y1∨y3y4y5)x5∨(w2∨w1w3)x5∨x4x5 (5-6) Totally we needed 13 + 13 + = 30 calls to restore f1 as a superposition fi(x1,x2,x3,x4,x5) = fi(ϕ(w1,w2,w3),ψ(y1,y2,y3,y4,y5),x2,x3,x4,x5)) The same 30 calls were also needed to restore f2 independently In reality, however, we used 30 + 13 = 43 calls instead of 60 calls to restore both functions due to the fact that the component functions ϕ and ψ are the same for the target functions of both problems For the record, we spent for the dialogue with the medical expert no more than one hour 21 At this point it is interesting to observe that if one wishes to restore the non monotone function which corresponds to the concept: "biopsy and not cancer", then this can be achieved easily as follows First note that the above concept represents cases in which surgery can potentially be avoided This concept can be presented with the composite formula: f1& f2, and thus it can be computed by using expressions (5-5) and (5-6) Therefore, a total number of 43 calls is just sufficient to restore this function (which is both non monotone and very complex mathematically) 5.3 Independent Search of Functions Table represents the dialogue which was executed based on the algorithm for problem (section 3) The information in table resulted to the formation of formulas (5-2) to (5-6) The first column of table designates binary vectors (examples): the first number indicates a Hansel chain and the second number indicates the location of the vector within that chain The second column presents binary vectors The third, forth and fifth columns present values of the functions f1, f2 and ψ, respectively Asterisks "*" show values obtained directly from the expert Values without asterisks were inferred by using the previous values and the property of monotony The sixth column shows indexes of vectors for which we can use monotony if a given function evaluates to (i.e., true value) For instance, for vector #7.1 we have f1(00100) = (as given by the expert), hence we can also set (by utilizing the monotony property) f1(00101) = 1, because vector (00101) is vector #7.2 We can also the same with vector #10.4 Similarly, the seventh column represents the indexes of vectors for which we can use monotony if a given function evaluates to (i.e., false value) For instance, for the vector #6.1 we have: f1(00010) = (as given by the expert) and thus we can infer (by utilizing monotony) that f1(00000)=0, where (00000) is vector #10.1 The values in column are derived by "up-down sliding" in table according to the following six steps: Step Begin from the first vector #1.1 (i.e., (01100)) Ask the expert about the value of f1(01100) Action: In this case the expert reported that: f1(01100)=1 Step Write "1*" next to vector #1.1 (and under column 3) Recall that an asterisk denotes an answer directly provided by the expert Apparently, if the reply were false (0), then we had to write "0*" in column 22 The case of having a false value corresponds to column in table Step Go to column to find vectors to extend the true (1) value Action: In this case vectors #1.2, #6.3, and #7.3 should also yield (because of monotony) true value (1) Therefore, in column and next to examples #1.2, #6.3, and #7.3 the values of the function must be (1) (note that now no asterisk is used) Step Go to vector #1.2 (next vector while sliding down) Check whether the value of f1(11100) has already been fixed or not If the value of f1(11100) is not fixed (i.e., it is empty), then repeat steps 1-3, above, for this new vector (i.e., vector #1.2) If f1(11100) is not empty (i.e., it has been already fixed), then go to the next vector while sliding down (i.e., move to vector #2.1) (Note that if the value has not been fixed yet, then we will denote it by fi(x) = e; for empty.) Action: f1(11100)e Therefore, extend the values of the function for the vectors #6.4 and #7.4 and go to the next vector in the table (i.e., move to vector #2.1) Step The next vector is #2.1 Continue as above with step but now we have vector #2.1 (instead of vector #1.2) The above procedure is repeated until all vectors and functions have been covered The interpretation of what happens in columns to 11 is provided in the next section In order one to construct formula (5-3) (which was shown in section 5.2) one needs to concentrate in the information depicted in columns and in table One needs to take the first vector marked with "1*" in each one of the chains and construct for each of these vectors a conjunction of non-zero components For instance, for the vector (01010) in chain the corresponding conjunction is x2x4 Similarly, from chain we have taken the "1" components in the vector (00110) and formed the conjunction x3x4 The formulas (5-1), (5-2) and (5-4) were obtained in a similar manner 5.4 Sequential Search for Nested Functions Next we show how it is possible to further decrease the number of calls using a modification of the 23 algorithm described for solving problem Recall that the functions f1 and f2 are nested because E2+  E1+ That is, for any input example x the following is true: f2(x)  f1(x) The last statement means that if we recognize cancer, then we recognize the necessity of biopsy too The property f2(x)  f1(x) means that and if f2(x)=1, then f1(x)=1 (5-7) if f1(x)=0, then f2(x)=0 (5-8) The above realization permits to avoid calls to the expert for determining the right hand sides of (5-7) and (5-8), if one knows the values of the left hand side The above idea can be used to make the dialog with the expert more efficient This was done in this experiment and is described in table Column in table shows the results of using the values of f2 (column 4) and property (5-7) to restore function f1 For instance, for vector #6.2 according to the expert we have f2(00110)=1 Hence, by using (5-7) we should also have f1(00110)=1 without an additional call to the expert Column represents the same situation for f2, if one knows the expert's answers for f1 In this case we use the pair instead of the pair , as was the case before Note that the asterisks "*" in columns and show the necessarily needed calls In order to restore function f1 by using information regarding values of f2 we asked the expert times, instead of the 13 calls we had to use for the independent restoration of function f1 That is, now we were able to use about times less calls In particular, the value f1(10001)=1 for vector #9.2 was inferred from the fact f2(10001)=1 However, to restore function f2 by using f1 we asked the expert 13 times That is, we asked him as many times as during the independent restoration of f2 (i.e., the nested approach was not beneficial in this case) This should not come as surprise because the limits described in the previous sections are upper limits That is, on the average the sequential search is expected to outperform a non-sequential approach, but cases like the last one can still be expected The previous analysis shows that, in this particular application, the most effective way was at first to restore f2 with 13 calls and next to use this function values to restore f1, which required only additional calls Restoration of f2 by using information of values of function f1 required 13 calls, and to restore both functions in E5 would had required 13 + 13 = 26 calls In the sequence of restoration the total amount of calls to restore both functions is 13 + = 20 calls in comparison with 13 + 13=26 calls for independent restoration We should also add to both cases 13 + calls which were needed to restore functions ϕ and ψ Therefore, in total we needed 20 + 17 = 37 and 26 + 17 = 43 calls, respectively Note that we began from 2*2048 and 2*924 calls for both 24 functions Our totals of 37 and 43 calls are about 100 times less than the number of non optimized calls (i.e., 2*2,048) and about 50 times less than the upper limit guaranteed according to Hansel lemma (i.e., the 2*924 calls) 5.5 A Joint Search Approach for Nested Functions Next we study the possibility to decrease the number of calls once more with a joint search approach of f1 and f2 by using the following switching strategy: Step Step Step Ask the expert for the value of f2(x1.1) for vector #1.1 If f2(x1.1)=1, then ask for the first vector of the next chain That is, ask for the value of f2(x2.1) for #2.1 Otherwise, ask for the value of f1(x1.2) If f1(x1.1)=0, then ask for the value of f1(x1.2) for vector #1.2 Otherwise, switch to ask for the value of f2(x1.1) The generalization of the previous steps for arbitrary xik is done with steps A and B Step A is for f1 and step B for f2 These steps are best described as follows: Step A Step B If f1(xi.k)=0 and vector #(i.k) is not the last one in the current chain and f1(xi,k+1)=e (i.e., empty), then ask the expert for the value of f1(xi.k+1) Otherwise, ask for the value for the first vector xi+1.j from the next chain such that f1(xi+1.j) = e If f1(xi.k) = and f2(xi,k) = e, then ask for the value of f2(xi,k) If f1(xi.k) = and f2(xi,k) = 0, then ask for the value of f2(y), where y is the first vector from the same or the next chain such that f2(y)=e If f2(xi.k) = 1, then ask for the first vector y of the next chain such that f2(y) = e If f2(xi.k) = and f1(xi.k) = e, then ask the expert for the first vector y such that f1(y)=e The results of applying this strategy for restoring the two functions f1 and f2 are presented in table 1, columns 10 and 11, respectively The number of calls to restore both f1 and f2 is equal to 22 (see table 1, columns 10 and 11) by asking about values of f1 times and about values of f2 14 times (see also values marked with "*" in columns 10 and 11) Note that the previous algorithm required 22 calls Next in tables and we summarize the various results for interviewing the expert under the different strategies Table represents numbers of calls for different kinds of searches for the functions f1, f2, ψ in E5 and for ϕ in E3 Table represents results for E11 These results summarize the numbers of calls needed for the full restoration of f1 and f2 as functions of 11 binary variables at level In particular, note that the independent search of f1 required 13 calls in E5, 13 calls for ψ and calls for ϕ, i.e., the 30 calls shown in table The sequential search required the same number of calls for f2, but only calls for f1, because we used the same  and ψ functions found for f1 (see also tables and 3) The total amount shown in column "f1 ,f2" (i.e., under column 5) represents the number of calls to restore 25 both functions For the last four hierarchical searches in this column we excluded the non necessary calls for the second restoration of ψ and ϕ, i.e., the 17 calls For instance, independent search required 30 calls to restore both f1 and f2 in E5 ,i.e., a total 43 calls in E11, as shown in this column Next, let us describe the meaning of indexes and in tables and We denote the upper limit of calls for non optimized search as N1 and the analogous amounts for other ways of the search for both functions (columns 5) as Ni (i=2,3,4,5,6) In these terms Index1=N1/Ni (i=2,3,4,5,6) and index2=N2/Ni (i=3,4,5,6) In particular index1 shows that we used 3.2 times less calls than N1 and 2.0 times less calls than N2 in E5, and also 110.7 times less than N1 and 49.9 times less calls than N2 in E11 These interactive experiments demonstrate the potential for achieving significantly high efficacy of the proposed approach for interactive restoration of monotone Boolean functions Table Comparison of results in E5 and E3 ϕ ψ - 32 40 1.6 20 13 26 2.5 1.5 13 13 13 26 2.5 1.5 Sequential search 13 20 3.2 2.0 Joint search - - 22 2.9 1.8 # f1 f2 f1, f2 Non-optimized search (upper limit) 32 32 64 Optimal search (upper limit) 20 20 Independent search 13 Sequential search ways of search 26 index1 index2 Table Comparison of results in E11 # ways of search Non-optimized search (upper limit) f1 f2 f1, f2 index1 index2 4,096 - 2,048 2,048 Optimal search (upper limit) 924 924 1,848 2.22 Independent search 30 30 43 95.2 42.9 Sequential search 30 13 43 95.2 42.9 Sequential search 30 37 110.7 49.9 Joint search - - 39 105.0 47.4 COMPUTATIONAL INTERACTIVE HIERARCHICAL EXPERIMENT II We suppose the same hierarchy of binary features and the same functions ϕ and ψ as in the previous experiment However, now we consider the same problem with different f1 and f2 functions We restore fi as a superposition of the fi, ϕ and ψ functions: fi(x1,x2,x3,x4,x5) = fi(ϕ (w1,w2,w3), ψ(y1,y2,y3,y4,y5), x2, x3, x4, x5)), where the hidden functions are: and f1(x) = x1x2 ∨x1x3 ∨x4 ∨ x5, (6-1) f2(x) = x1x2 ∨x1x3 ∨ x1x4 ∨ x2x4 ∨ x3x4 ∨ x5 (6-2) The offered optimized dialogue to restore f1(x1,x2,x3,x4,x5) really required 12 calls (see table 4, column 3) So we used 1.66 times less calls than Hansel's border (i.e., the 20 calls) and 2.66 times less than in non-optimized dialogue (i.e., the 32 calls) The other hidden functions are x1=ϕ(w1,w2,w3)=(w2 ∨ w1w3) and ψ(y1,y2,y3,y4,y5)=(y1 ∨ y2∨ y3 ∨y4) The dialogue for f2(x1,x2,x3,x4,x5) required 14 calls (see also table 4, column 4) This is by 1.4 times less than Hansel's limit (of 20 calls) and 2.2 times less in comparison with the non-optimized dialogue (of 32 calls) Next, we obtained f1 and f2 from table as being the following DNF: and f1(x) = x2x4 ∨x1x2 ∨x1x4 ∨x1x3 ∨x4∨x3x5 ∨x2x5 ∨x1x5 ∨x5, (6-3) f2(x) = x2x4∨x1x2∨x1x4∨x1x3∨x3x4∨x2x3x5∨x2x5∨x1x5∨x5 (6-4) Simplification of these DNF expressions allowed us to exclude some conjunctions, which are not lower units and obtain (6-1) and (6-2), respectively For instance, in (6-3) x 1x5 is not a minimal lower unit, since x covers it 27 Table Dialogue Sequences for Experiment II # Vector Chain 1.1 (01100) 1.2 (11100) Chain 2.1 (01010) 2.2 (11010) Chain 3.1 (11000) 3.2 (11001) Chain 4.1 (10010) 4.2 (10110) Chain 5.1 (10100) 5.2 (10101) Chain 6.1 (00010) 6.2 (00110) 6.3 (01110) 6.4 (11110) Chain 7.1 (00100) 7.2 (00101) 7.3 (01101) 7.4 (11101) Chain 8.1 (01000) 8.2 (01001) 8.3 (01011) 8.4 (11011) Chain 9.1 (10000) 9.2 (10001) 9.3 (10011) 9.4 (10111) Chain 10 10.1 (00000) 10.2 (00001) 10.3 (00011) 10.4 (00111) 10.5 (01111) 10.6 (11111) Calls Extension ⇒0 f1 f2 ψ ⇒1 0* 0* 0* 0* 1* 1.2;6.3;7.3 6.4;7.4 6.4;7.4 5.1;3.1 0* 0* 1* 1* 1* 2.2;6.3;8.3 6.4;8.4 6.1;8.1 3.1 1* 1* 1* 3.2 7.4;8.4 1* 1* 1* 1* 1* 1* 1 f1~ f2~ f1^ f2^ 10 11 0* 0* 0* 1 1* 1* 1 1* 8.1;9.1 8.2;9.2 1 1* 1 1* 4.2;9.3 6.4;9.4 6.1;9.1 6.2;5.1 1 1* 1 1* 1* 5.2 7.4;9.4 7.1;9.1 7.2;9.2 1 1* 1 1* 0* 1* 1 1* 1 6.2;10.3 6.3;10.4 6.4;10.5 10.6 10.1 7.1 1* 1 0* 1 1* 1 0* 1* 1 1* 1 0* 1* 1* 1 7.2;10.4 7.3;10.4 7.4;10.5 5.6 10.1 10.2 8.2;10.2 1* 1 0* 1* 1* 1 0* 1* 1* 1 1* 1 1* 1 8.2 8.3 8.4 10.6 10.1 10.2 10.5;10.3 9.3 1 1* 1 1 1* 1 0* 1* 1 0* 1* 1 1* 1 9.2 9.3 9.4 10.6 10.1 10.2 10.3 10.4 0* 1 1* 1 0* 1 0* 1* 1 1* 1 1 12 1* 1 1 14 0* 0* 1* 1 12 10.2 10.3 10.4 10.5 10.6 1* 1 1 0 1* 1 10 1* 1 1 20 0 1* 1 28 Below we present the superposition of formulas (6-1)-(6-4) for f1 and f2 as the final formulas with : f1(x) = x1x2∨x1x3∨x4∨x5 = (w2∨w1w3)(y1∨y2∨y3∨y4)∨(w2∨w1w3)x3∨x4∨x5, and f2(x)=(w2∨w1w3)(y1∨y2∨y3∨y4)∨(w2∨w1w3)x3∨(w2∨w1w3∨y1∨y2∨y3∨y4∨x3)x4∨x5 In total, we needed 12 + 12 + = 28 calls to restore f1 and 14 + 12 + = 30 calls to restore f2 independently We actually used 12 + 14 + 12 + = 42 calls instead of the 58 calls needed to restore both functions We used independent, sequential, and joint search to restore these functions Some final results are presented in tables to Table represents the numbers of calls for different kinds of searches in E5 for the functions f1, f2, ψ and in E3 for ϕ Table represents the results for E11 These tables summarize the numbers of calls needed for the full restoration of f1 and f2 as functions of 11 binary variables at level In particular, independent search of f1 required 12 calls for f1 in E5, 12 calls for ψ and calls for ψ That is, the 28 calls shown in table The sequential search required the same number of calls for f1, but only 10 calls for f2, because we used the same ϕ and ψ component functions which were found for f1 The total amount shown in column labeled "f1, f2" in tables and represents the number of calls required to restore both functions For the last four hierarchical searches in this column we excluded non necessary second restoration of ψ and ϕ, i.e., 16 calls For instance, the independent search required 26 calls to restore both f1 and f2 in E5, i.e., the total 42 calls in E11 shown in this column Similarly as in the previous section, index1 shows that we used 3.2 times less calls than N1 and 2.0 times less calls than N2 in E5, and also 113,8 times less calls than N1 and 51.3 times less calls than N2 in E11 Table Comparison of results in E5 and E3 ϕ ψ - 32 40 1.6 20 14 26 2.5 1.5 12 12 10 22 2.7 1.8 Sequential search 14 20 3.2 2.0 Joint search - - 19 3.2 # f1 F2 f 1, f Non-optimized search (upper limit) 32 32 64 Optimal search (upper limit) 20 20 Independent search f1 and f2 12 Sequential search ways of search Table Comparison of Results in E11 29 index1 index2 2.0 # ways of search Non-optimized search (upper limit) f1 f2 f1, f2 index1 index2 4,096 - 2,048 2,048 Optimal search (upper limit) 924 924 1,848 2.22 Independent search f1 and f2 28 30 42 97.5 44.0 Sequential search 28 10 38 107.8 48.6 Sequential search 30 36 113.8 51.3 Joint search - - 36 113.8 51.3 CONCLUDING REMARKS Some computational experiments (see, for instance, (Gorbunov and Kovalerchuk, 1982) and (Triantaphyllou and Soyster, 1995)) have shown that it is possible to significantly decrease the number of questions to an oracle in comparison with the full number of questions (which is equal to n) and also in comparison with a guaranteed pessimistic estimation (formula (2-2) in section 2) for many functions Some close form results were also obtained for a connected problem of retrieval of maximal upper zero (Kovalerchuk and Lavkov, 1984) The results in this paper demonstrate that an interactive approach, based on monotone Boolean functions, has the potential to be very beneficial to interactive machine learning Acknowledgements The authors are very grateful to Dr James F Ruiz, from the Woman's Hospital of Baton Rouge, LA, for his expert assistance in formulating the medical example described in this paper REFERENCES Alekseev, V.B (1988), "Monotone Boolean Functions" Encyclopedia of Mathematics, v 6, Kluwer Academic Publishers, 306-307 Blumer A Ehrenfeucht A Haussler D and Warmuth M.K (1989) Learnability and the VapnikChervonenkis dimension, Journal of the Association of Computing Machinery, 36 (4), 929-965 30 Bongard, M (1967), Pattern Recognition, Moscow, "Nauka" Publ (in Russian, English translation, 1970, by Spartakos Press, NY) Boros, E., P.L Hammer, and T Ibaraki (1994), "Predicting Cause-Effect Relationships from Incomplete Discrete Observations." SIAM Journal on Discrete Mathematics, Vol 7, No 4, 531-543 Cohn, D., L Atlas and R Ladner (1994), "Improving Generalizing with Active Learning," Machine Learning, Vol 15, 201-221 Dedekind R., "Ueber Zerlegungen von Zahlen durch ihre grossten gemeinsamen Teiler", Festschrift Hoch Braunschweig, 1987 u.ges Werke, II pp 103-148 Dietterich, T.C., and R.S Michalski (1983), "A Comparative Review of Selected Methods for Learning from Examples", R.S Michalski, J.G Carbonell, and T.M Mitchell (eds.) Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA, 41-81 Fu, L.M (1993), "Knowledge-Based Connectionism for Revising Domain Theories," IEEE Transactions on Systems, Man, and Cybernetics, v 23, n 1, 173-182 Goldman and Sloan R.H.(1992) The Power of Self-Directed Learning, Machine Learning, Vol 14, 271294 10 Gorbunov, Yu, and Kovalerchuk, B (1982), "An Interactive Method of Monotone Boolean Function Restoration", Journal of Academy of Science of UzSSR, Engineering, v 2, 3-6 (in Russian) 11 Hammer, P.L., and E Boros (1994), "Logical Analysis: An Overview," RUTCOR Research Report, Rutgers University, NJ 12 Hansel, G (1966), "Sur le nombre des fonctions Boolenes monotones den variables" C.R Acad Sci Paris, v 262, n 20, 1088-1090 13 Haussler D (1988) Quantifying inductive bias: AI learning algorithms and Valiant's learning framework, Artificial Intelligence, 36, 177-221 14 Haussler D and Warmuth M.(1993), The Probably Approximately Correct (PAC) and Other Learning Models Chapter in: Foundation of Knowledge Acquisition: Machine Learning, A.L Meyrowitz and S Chipman (Eds), Kluwer Academic Publishers, Norwell, MA, 291-312 15 Hattoru, K and Y Torri, "Effective Algorithms for the Nearest Neighbor Method in the Clustering Problem," Pattern Recognition, Vol 26, No 5, pp 741-746, 1993 31 16 Kamath, A.P., N.K Karmakar, K.G Ramakrishnan, and M.G.C Resende (1992), "A Continuous Approach to Inductive Inference," Math Progr., Vol 57, 215-238 17 Kamath, A.P., N.K Karmakar, K.G Ramakrishnan, and M.G.C Resende (1994), "An Interior Point Approach to Boolean Vector Synthesis," Proceedings of the 36-th MSCAS, 1-5 18 Kamgar-Parsi, B and L.N Kanal (1985), "An Improved Branch-And-Bound Algorithm for Computing k-Nearest Neighbors," Pattern Recognition Letters, Vol 3, 7-12 19 Kurita, T (1991), "An Efficient Agglomerative Clustering Algorithm Using a Heap," Pattern Recognition, Vol 24, No 3, 205-209 20 Kleitman, D (1969), "On Dedekind's problem: the number of monotone Boolean functions" Proc Amer Math Soc 21, 677-682 21 Korobkov V.K (1965) On monotone Boolean functions of algebra logic, In Problemy Cybernetiki, v.13, "Nauka" Publ., Moscow, 5-28 (in Russian) 22 Kovalerchuk, B., and Lavkov, V (1984), "Retrieval of the maximum upper zero for minimizing the number of attributes in regression analysis" USSR Computational Mathematics and Mathematical Physics, v 24, n 4, 170-175 23 Kovalerchuk, B., E Triantaphyllou, and E Vityaev, (1995),"Monotone Boolean Functions Learning Techniques Integrated with User Interaction." In: Proc of Workshop "Learning from examples vs programming by demonstration", 12-th International Conference on Machine Learning, Tahoe City, U.S.A., 41-48 24 Kovalerchuk, B., E Triantaphyllou, and J.F Ruiz (1995), "Monotonicity and Logical Analysis of Data: A Mechanism of Evaluation of Mammographic and Clinical Data," Technical Report, Louisiana State University, Dept of Ind Eng., Baton Rouge, LA 70803-6409 25 Mangasarian, O.L., W.N Street, and W.H Woldberg (1995), "Breast Cancer Diagnosis and Prognosis Via Linear Programming," Operations Research, Vol 43, No 4, 570-577 26 Mitchell T.(1980), The need for biases in learning generalizations, Technical Report CBM-TR-117, Rutgers University, New Brunswick, NJ 27 Murphy P.M and Aha D.W.(1994) UCI repository of machine learning databases Machine-readable data repository, Irvine, CA,University of California, Department of Information and Computer Science 32 28 Natarajan B.K (1989), On learning sets and functions, Machine Learning, 4(1), 123-133 29 Rudeanu, S., (1974), "Boolean functions and equations", North-Holland, NY 30 Shavlik, J.W (1994), "Combining Symbolic and Neural Learning," Machine Learning, Vol 14, 321331 31 Triantaphyllou, E, A.L Soyster, and S.R.T Kumara (1994), "Generating Logical Expressions from Positive and Negative Examples via a Branch-and-Bound Approach," Computers and Operations Research, Vol 21, No 2, 185-197 32 Triantaphyllou, E.(1994), "Inference of a minimum size Boolean function examples by using a new efficient branch-and-bound approach" Journal of Global Optimization, v 5, n 1, 69-94 33 Triantaphyllou, E and A Soyster (1995), "An approach to guided learning of Boolean functions", to appear in: Mathematical and Computer Modeling 34 Vapnik V.N (1982) Estimating of Dependencies Based on Empirical Data, Springer-Verlag, New York, NY 35 Vityaev, E., and A Moskvitin (1993), "Introduction to discovery theory" Program system: DISCOVERY, Logical Methods in Informatics, Computational Systems (Vychislitel'nye sistemy), Institute of Mathematics, Russian Academy of Science, Novosibirsk, n 148, 117-163 (in Russian) 35 Woldberg, W.W., and O.L Mangasarian (1990), "Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology," Proceedings of the National Academy of Sciences of the USA, v 87, n 23, 9193-9196 36 Yablonskii, S (1986), "Introduction to discrete mathematics", Moscow, "Nauka" Publ (in Russian) 37 Zagoruiko, N (1979), "Empirical Forecast", "Nauka" Publ., Novosibirsk (in Russian) 33 ... composition of monotone Boolean functions is itself monotone In other words, the class of all monotone Boolean functions is closed Moreover, the class of all monotone Boolean functions is one of the... set of all Boolean functions That is, there is no closed class of Boolean functions, containing all monotone Boolean functions and distinct from the class of monotone functions and the class of. .. class of all Boolean functions The reduced disjunctive normal form (DNF) of any monotone Boolean function, distinct of and 1, does not contain negations of variables The set of functions {0,

Tiêu đề	Interactive Learning of Monotone Boolean Functions
Tác giả	Boris Kovalchuk, Evangelos Triantaphyllou, Aniruddha A. Deshpande, Eugene Vityadev
Trường học	Louisiana State University
Chuyên ngành	Industrial and Manufacturing Systems Engineering
Thể loại	thesis
Năm xuất bản	1995
Thành phố	Baton Rouge

Định dạng
Số trang	33
Dung lượng	427 KB