This article is about quantification semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a application in querying in the database having the linguistic terms. The calculation is done both to the language values and real numbers. Query results are quite reasonable.
JOURNAL OF SCIENCE OF HNUE Interdisciplinary Science, 2013, Vol 58, No 5, pp 30-38 This paper is available online at http://stdb.hnue.edu.vn SEMANTIC QUANTITATION OF LINGUISTIC VALUES EMBEDDED IN COMPLETE LINEAR HEDGE ALGEBRAS AND QUERYING IN THE LINGUISTIC DATABASE Nguyen Tan An1 and Nguyen Van Quyen2 Faculty of Information Technology, Hanoi National University of Education Department of Post Graduate, Hai Phong University Abstract In fuzzy databases represented by linguistic values, linguistic terms carry qualitative semantics However, when it comes to calculating linguistic terms or when a comparison between linguistic terms is needed, it is necessary to quantify their semantics If the method of quantitation is suitable, the computational efficiency at the next step will be higher When a database is embedded in Complete Linear Hedge Algebras, each base language values may have accompanying emphasis hedges In such cases, the semantics of linguistic terms are quantified to become subintervals of an interval [0, 1] This article is about quantification semantics of linguistic terms embedded in Complete Linear Hedge Algebras and a application in querying in the database having the linguistic terms The calculation is done both to the language values and real numbers Query results are quite reasonable Keywords: Computation with words, semantics of terms, hedge algebras, linguistic representation model Introduction Systems support making decisions increasingly tend to approach human processing information This is how information processing of collected data or information that is none-precise, incomplete, uncertain and/or vague and described by words in natural languages take place The processing of vague data represented by linguistic values usually occurs in following ways: i) Using fuzzy set theory: Each linguistic value is represented by a fuzzy set characterized by a membership function or a membership function plus a no-membership function Received March 2, 2013 Accepted June 5, 2013 Contact Nguyen Tan An, e-mail address: nguyentanan@yahoo.com 30 Semantic quantitation of linguistic values embedded in complete linear hedge algebras ii) Based on the possibility theory: Each language value is given an possibility distribution to receive values in the reference domain iii) With an approach in hedge algebras, each language database is embedded in the hedge algebras Each method involves a series of subsequent problems, such as the integration of fuzzy values, a comparison of fuzzy values, computing the distance between fuzzy values and measuring the similarity between the fuzzy values Therefore the quality of the fuzzy systems when different methods are applied is not the same When the language database is embedded in the hedge algebras, each base language value may have accompanying emphasis hedges While this presents, a convenient way for computing, the semantics of linguistic terms are quantified to become subintervals of an interval [0, 1] This paper presents the embedding language databases in the Complete Linear Hedge Algebra (ComLin-HA) and the application of semantic quantitation in order to query the data in this case This paper include main contences: ComLin-HA, semantic quantitation of the language values, a example of queries in fuzzy databases and the last one, some conclusions 2.1 Content An overview of ComLin-HA 2.1.1 Hedge algebra (HA) Definition 2.1 An algebra X = (X, G, H, ≤), where X is a term domain, G is a set of primary terms, H is a set of linguistic hedges, and ≤ is an semantically ordering relation (SOR) on X, is called a hedge algebra (HA) if it satisfies the following [1]: (A1) Each hedge is either positive or negative the others, including itself; (A2) If terms u and v are independent, then, ∀x ∈ H(u), we have x ∈ / H(v) In addition, if u and v are incomparable, i.e., u ≮ v and v ≮ u, then so are x and y, for every x ∈ H(u) and y ∈ H(v); (A3) If x = hx , then x ∈ / H(hx), and if h = k and hx ≤ kx , then h′ hx ≤ k ′ kx, for ′ ′ all h, k, h , k ∈ H and x ∈ X Moreover, if hx = kx , then hx and kx are independent; (A4) If u ∈ / H(v) and uleqv ((u ≥ v), then u ≤ hv (u ≥ hv) for any h ∈ H 2.1.2 ComLin-HA A ComLin-HA is is presented by 7-tuple AX = (X, G, C, H, , φ, ≤), where: X ∈ Dom(X) with Dom(X) being the set of all terms of X; G = {0; c− ; W ; c+ ; 1} is the set of its primary terms, in which W is the neutral concept positioned in between the two primary terms, i.e we have ≤ c− ≤ W ≤ c+ ≤ 1; C = {c− ; c+ } is called a set of basic words; H is the set of hedges considered as to be unary operations on X H = H − ∪ H + ; 31 Nguyen Tan An and Nguyen Van Quyen The effect of h acting on x is expressed by the fact that either hx ≥ x or hx ≤ x In the case that kx ≥ hx ≥ x or y ≥ hy ≥ ky, for a certain x or y, we say that the effect of k is greater than the effect of h, and write k ≥ h H is divided into H + and H − : H + = {h ∈ H : hc− ≤ c− or hc+ ≥ c+ } H − = {h ∈ H : hc− ≥ c− or hc+ ≤ c+ } In this paper, we denote by H − = {h0 ; h1 ; ; h−q }, h0 < h1 < < hp and + H = {h0 ; h1 ; ; hp }, h0 < h−1 < < h−q where p, q > are the hedges Some time, h0 is replayed by I (identity): ∀x, Ix = x is the operator such that x = supremumH(x); φ is the operator such that φx = inf imumH(x); And ≤ is an order relation on X For example: Let a set of basic words C be {true; f alse}, that mean c+ = true, c− = f alse If a set of hedges H be {Very; More; Identity; Less; Possible}, in which, Identity is the identity operator, not only is a positive, but there is also a negative For short we denote H = {V ; M; I; L; P } It is clear that: P f alse ≥ f alse and P true ≤ true Lf alse ≥ f alse and Ltrue ≤ true Mf alse ≤ f alse and Mtrue ≥ true V f alse ≤ f alse and V true ≥ true Therefore H + = {V ; M; I}; H − = {I; L; P } H(true) = {true; Ltrue; Mtrue; LLtrue; LMtrue; MMtrue, } and H(f alse) = {f alse; Lf alse; Mf alse; LLf alse; LMf alse; MMf alse; } 2.2 Semantic quantitation of linguistic terms 2.2.1 Semantic quantitation of ComLine-HA Definition 2.2 Let X be a HA f : X → [0, 1] is called the function for the semantic quantitation of AX if ∀h, k ∈ H + or ∀h, k ∈ H − , and ∀x, y ∈ X we have: |f (hy) − f (y)| |f (hx) − f (x)| = |f (kx) − f (x)| |f (ky) − f (y)| 2.2.2 Fuzziness measures of term x Definition 2.3 Let X be a HA f m : X → [0, 1] is called the fuzziness measure of term x if (1) f m(c− ) = θ and f m(c+ ) = − θ, in which θ ∈ 0[0, 1] and −q≤i≤q,i=0 f m(hi (u)) = f m(u), u ∈ H(x) 32 Semantic quantitation of linguistic values embedded in complete linear hedge algebras (2) If x is a script concept, i.e H(x) = x, then f m(x) = Hence, f m(0) = f m(W ) = f m(1) = f m(hy) f m(hx) (3) ∀x, y ∈ X, ∀h ∈ H, we have = That means that this ratio f m(hy) f m(y) does not depends on x and y This ratio is denoted by µ(h) and is called the fuzziness measure of hedge h Based on the structure of the algebras, the fuzziness measure fm of term x ∈ X, defined by an axiomatization, has the following properties [2-5]: Based on the structure of the algebras, the fuzziness measure f m of the term x ∈ X, defined by an axiomatization, has the following properties [2-5]: i) f m(hx) = µ(h)f m(x), ∀x ∈ X and f m(x) = 0, ∀x ∈ LDom(X); ii) f m(c− ) + f m(c+ ) = 1; iii) −q≤i≤p,i=0 f m(hi c) = f m(c), where c ∈ c− , c+ ; iv) −q≤i≤p,i=0 f m(hi x) = f m(x), x ∈ X; v) sum−q≤i≤−1 µ(hi ) = α and 1≤i≤p = β, where alpha, β > and α + β = 1, 2.2.3 Algebraic signs of Linguistic terms We observe that the semantic order-based semantics of terms leads to the fact that terms and hedges have the so-called semantic tendencies that can be recognized as follows: - As above presented, for x = c ∈ G, the comparability of hc and c implies that H is partitioned into two sets: H + = {h ∈ H : hc− ≤ c− or hc+ ≥ c+ }, which consists of the hedges that increase the both semantic tendencies of the primary terms; H − = {h ∈ H : hc− ≥ c− or hc+ ≤ c+ }, which consists of the hedges that decrease these semantic tendencies This leads to the interesting concept that the primary terms and hedges have their own “algebraic” sign: sign(c+ ) = +1, sign(c− ) = −1, sign(h) = +1, f or h ∈ H + , and sign(h) = −1, f or h ∈ H − - The comparability of khx and hx implies also that either k increases or decreases the effect of h For instance, having x ≤ hx the inequalities x ≤ hx ≤ khx state that k increases the effect of h and x ≤ khx ≤ hx state that k decreases the effect of h We shall write sign(k, h) = +1, for the former case, and sign(k, h) = −1, for the latter one They are called the relative sign of k with respect to h For example, it can be verified that sign(V, L) = +1 while sign(L, V ) = −1 - If hh′ x = h′ x, we say that the effect of h in the expression hh′ x is proper A string hn h1 c, where hi ∈ H and c ∈ G, is said to be a canonical string representation of x ′ if x = hn h1 c and the effect of all hi s is proper in this expression It was proved that the canonical representation of x is unique, for any term x, and, hence, we may define the length of x, denoted by |x|, which is just the length of the string hm h1 c of x Now, the sign of a linguistic term x, can be defined by Sgn(x) = sign(hm, hm − 1) × × sign(h2 , h1 ) × sign(h1 ) × sign(c) The meaning of Sgn is expressed as follows: 33 Nguyen Tan An and Nguyen Van Quyen (Sgn(hx) = +1) ⇒ (hx ≥ x) and (Sgn(hx) = −1) ⇒ (hx ≤ x) For example, since Sgn(V L_true) = sign(V, L)sign(L)sign(true) = −1, we have V Ltrue ≤ Ltrue On this basis, we have Definition 2.4 Sgn : X → {−1, 0, 1} is defined as follow: +Sgn(hc− ) if hc− < c− (1) Sgn(c− ) = −1 and Sgn(hc− ) = −Sgn(hc− ) if hc− ≥ c− +Sgn(hc+ ) if hc+ < c+ (2) Sgn(c+ ) = +1 and Sgn(hc+ ) = −Sgn(hc+ ) if hc+ ≥ c+ (3) Sgn(h′ hx) = −Sgn(hx) if h′ is negative f or h and hh′ x = hx (4) Sgn(h′ hx) = +Sgn(hx) if h′ is posstive f or h and hh′ x = hx (5) Sgn(h′ hx) = if hh′ x = hx 2.2.4 Semantic heredity – an essential meaning of linguistic hedges An essential property of hedges is the so-called semantic heredity, which states that the terms generated from a given term x by using hedges must inherit or contain the (genetic) core meaning of x own This implies that hedges cannot change the essential meaning of terms expressed in terms of the semantic order relation ≤, i.e it results in the following: + If the meaning of hx and kx is expressed by the order relationship hx ≤ kx, h = k, then any hedges h′ and k ′ cannot change this semantic relationship, that is hx = kx ⇒ h′ hx ≤ k ′ kx + Similarly, if the meaning of x and hx is expressed by either x ≤ hx or hx ≤ x, then x ≤ hx ⇒ x ≤ h′ hx or hx ≤ x ⇒ h′ hx ≤ x Assuming that H − = {h0 ; h−1 ; ; h−q } and H + = {h0 ; h1 ; ; hp }, where h0 = I, the artificial hedge identity, and h0 < h−1 < < h−q and h0 < h1 < < hp , the hedge heredity leads to the following results: + For Sgn(hpx) = −1, H(hpx) ≤ ≤ H(h1 x) ≤ x ≤ H(h−1 x) ≤ ≤ H(h−q x) + For Sgn(hpx) = +1, H(h−q x) ≤ ≤ H(h−1x) ≤ x ≤ H(h1 x) ≤ ≤ H(hp x) with a note that H(h0 x) = H(Ix) = x, as our convention In particular, we have: ≤ H(c− ) ≤ W ≤ H(c+ ) ≤ + The sets H(hj x), j ∈ [−q ∧ p], where [−q ∧ p] = {j| − q ≤ j ≤ p}, constitute a partition of H(x), i.e they are disjoint and H(x) = j∈[−q∧ p] H(hj x) These only such listed properties show already that term-domains of linguistic variables with such qualitative semantics of terms possess a rich order-based structure Therefore, we may observe that hedge algebras are formalized structures of the qualitative semantics of term-domains, noting that the meaning of a term represented in a formalized structure carries much information than by a fuzzy set itself, in general 34 Semantic quantitation of linguistic values embedded in complete linear hedge algebras 2.2.5 Fuzziness measure of terms using fuzziness intervals * Fuzziness interval of x ∈ X For every term x, the fuzziness interval of x ∈ X is a subinterval of [0, 1] of length f m(x), denoted by ℑf m (x), which will be constructed by induction on the length of x as follows: (i) For x of length 1, i.e x ∈ c+ ; c− , ℑf m (c− ) and ℑf m (c+ ) are intervals which constitute a partition of [0, 1] and satisfy the conditions that c− ≤ c+ implies ℑf m (c− ) ≤ ℑf m (c+ ), |ℑf m (c− )| = f m(c− ) and |ℑf m (c+ )| = f m(c+ ), where |ℑ(c+ )| denotes the length of ℑ(x) and the notation U ≤ V means that, ∀x ∈ U, ∀y ∈ V , we have x ≤ y (ii) Suppose that ℑf m (x) has been defined and |ℑf m (x)| = f m(x), ∀x of length k Then, the fuzziness intervals ℑf m (hi x) : i ∈ [−q ∧ p] are constructed so that they constitute a partition of ℑf m (x) and satisfy the conditions that |ℑf m (hi x)| = f m(hi x) and ℑf m (hi x) : i ∈ [−q ∧ p] is a linearly ordered set, whose order is induced by the order of the set {h−q x; h−q+1 x; ; hp x} (Figure 1) The fuzziness intervals of terms of length k are called depth k or k-intervals for short The set of all k-intervals of the terms of length k is denoted by Jk and put J = 1≤k≤∞ Jk which is the set of all fuzziness intervals So, by definition, each fuzziness interval in J is associated with exactly one unique term x and, conversely, each term x is associated exactly one unique fuzziness interval ℑf m (x) of the depth |x|, where |x| denotes the length of x By definition, set I has its own structure defined by the inclusion and ordering relation between the intervals of the same depth That is each fuzziness interval has its own position in this structure and it represents a certain meaning of the term that is associated with it Figure In [2, 4-6], the semantically quantifying mapping (SQM), υ : X → [0, 1], which is induced by the given fuzziness measure or defined by the structure of J, has been examined For each term x ∈ X of length k, υ(x) is defined to be the value in ℑf m (x) which is the common end-point of the (k +1)−intervals ℑ(h1 (x)) and ℑ(h−1 x) of Jk+1 35 Nguyen Tan An and Nguyen Van Quyen is called the semantic value of the term x This value divides internally the interval ℑ(x) in the proportion α : β, if sign(hp x) = +1 or in the proportion β : α, if sign(hp x) = −1 So, based on the structure of J, the SQM υ induced by f m can be computed as follows: Let AX = (X, G, C, H, , φ, ≤) be a free ComLin-HA and f m(c− ), f m(c+ ) and µ(h) be fuzziness measures of the primary terms c− , c+ and hedges h ∈ H, respectively, which satisfy Properties 2) and 5) Then, the mapping υ can be computed recursively as follows: i) υ(W ) = κ = f m(c− ), υ(c−) = κ − αf m(c− ) = βf m(c− ), υ(c+ ) = κ + αf m(c+ ); ii) υ(hj x) = υ(x) + Sgn(hj x){ ji=Sgn(j) µ(hi )f m(x) − ω(hj x)µ(hj )f m(x)}, where ω(hj x) = [1 + Sgn(hj x)Sgn(hp hj x)(β − α)] ∈ {α; β}, ∀j ∈ [−q, p]; − iii) υ(Φc ) = 0, υ( c− ) = κ = υ(Φc+ ), υ( c+ ) = 1, and ∀j ∈ [−q ∧ p], + Sgn(hj x) j−Sgn(j) µ(h )f m(x)} and we have: υ(Φh x) = υ(x) + Sgn(h x){ j j i=Sgn(j) j + Sgn(hj x) µ(hj )f m(x)} υ( hj x) = υ(x) + Sgn(hj x){ i=Sgn(j) * The uncertainty k-similarity relation These two values t[x], s[x] in Dom(X) is called the k−equal, denoted t[x] =k s[x], if one of the following conditions are satisfied [6]: (i) t[x] and s[x] denotes the same symbols; (ii) There exists a similar class δk (u) of similar relationships δk for the k such that Omin,k (t[x]) ⊆ δk (u) and Omin,k (s[x]) ⊆ δk (u) For any two values t[x] and s[x] of Dom(X),standard for testing equation t[x] =k s[x] is the following conditions: - There exists an approximately similar level k, δk u, of X such that If t[x] and s[x] is the scrip value, then we must have t[x], s[x] ∈ δk (u); If only one of two values t[x] and s[x] is the language value, such as t[x], then we must have υ(t[x]), s[x] ∈ δk (u); If the two values t[x] and s[x] are the language values, then we must have υ(t[x]), υ(s[x]) ∈ δk (u); To calculate the approximately same level k, k > arbitrary, we use the fuzzy intervals with degree k ′ > k such that a sufficient number of values of each of the sets Xj,(k) have less at least fuzzy intervals with degree k ′ of its own It is proved that the conditions for k ′ is for p, q > 1, then k ′ > k In contrast, the k ′ > k + j−Sgn(j) 2.3 An example of queries in fuzzy databases Consider a relation scheme of pupils R = {#CHILD, NAME, ANSWER 1, ANSWER 2} (Table 1) To verify this, we assume for simplicity sake that the linguistic attributes have the 36 Semantic quantitation of linguistic values embedded in complete linear hedge algebras same real domain, which is the interval [0, 100], and the same terms-domain, which is modeled by the same linear hedge algebra AX = (X, G, C, H, , φ, ≤), where G = {f alse; true}, C = {0; W ; 1}, H = {M; L}, in which M and L stand for More, Less respectively, and ≤ is a semantics based order To quantify this hedge algebra, we assume that f m(true) = f m(c+) = 0.6, f m(f alse) = f m(c− ) = 0.4, µ(L) = 0.25 , µ(L) = 0.25 Hence, we have α = β = 0.5 Table A relation scheme of pupils #CHILD NAME John Peter William Johnson Mary Claude Martin Mitchell ANSWER More True 65 90 Less True 60 55 90 65 ANSWER 80 True 95 80 70 More True More True True i) Find out who having answer 1: More True and answer 2: More True? ii) Find out who having answer 1: More True and answer 2: True? First, we calculate the semantic values of the following terms noting that since α = beta = 0.5, υA,r (x) is the center of the fuzziness interval ℑr (x) where the subscript r indicates that the respective notations take values in the real domain [a, b] of the attribute A in question, instead of taking in [0, 1]: υA,r (true) = [f m(f alse) + 0.5 × f m(true)] × 100 = [0.4 + 0.5 × 0.6] × 100 = 70 υA,r (Ltrue) = [f m(f alse) + 0.5 × f m(Ltrue)] × 100 = [0.4 + 0.5 × 0.25 × 0.6] × 100 = 47.5 υA,r (Mtrue) = [f m(f alse) + 0.5 × f m(Ltrue) + 0.5 × f m(Mtrue)] × 100 = [0.4 + 0.25 × 0.6 + 0.5 × 0.25 × 0.6] × 100 = 62.5 Now, we calculate certain − intervals of the domain [0, 100] as follows: δ2,r (Mtrue) = θ2,r (true) = ℑr (Mtrue) ∪ ℑr (Ltrue) = (70 − 0.25 × 0.6 × 100, 70 + 0.25 × 0.60 × 100] = (55, 85] δ2,r (Mtrue) = ℑr (Mtrue) = (100 − 0.25 × 0.60 × 100, 100] = (85, 100] δ2,r (Ltrue) = ℑr (LT rue) = (0.40×100, 0.40×100+0.25×0.60×100] = (40, 55] Result for querying 1: William; Result for querying 2: John Conclusion By using the approach of hedge algebra, we can perform complete semantic structure of linguistic terms in term-domains of their respective linguistic variables or linguistic attributes The comparison operations are defined based on the κ − equalities, κ − similar, where k is the length of terms which indicates the degree of these equalities The uncertainty or fuzziness of the uncertain equalities lies in the fuzziness intervals of 37 Nguyen Tan An and Nguyen Van Quyen linguistic terms of length k used to define these equalities Under such semantics of κ − equalities, κ − similar of linguistic data, we can make queries in a fuzzy data base represented by hedge algebra When we see these values clearly, we can see that they are individual cases of fuzzy data REFERENCES [1] Nguyen, C H and Wechler, 1990 Hedge algebras: An algebraic approach to structure of sets of linguistic truth values Fuzzy Sets and Systems 35, pp 281-293 [2] Nguyen, C H and Wechler, 1992 Extended hedge algebras and their application to fuzzy logic Fuzzy Sets and Systems 52, pp 259-281 [3] N.C Ho, 2003 Quantifying Hedge Algebras and Interpolation Methods in Approximate Reasoning Proc of the 5th Inter Conf on Fuzzy Information Processing, Beijing, March 1-4, pp 105-112 [4] N.C Ho, N.V Long, 2007 Fuzziness Measure on Complete Hedge Algebras and Quantitative Semantics of Terms in Linear Hedge Algebras Fuzzy Sets and Systems 158, pp 452-471 [5] N.C Ho, H.V Nam, T.D Khang and L.H Chau, 1999 Hedge Algebras, Linguisticvalued Logic and their Application to Fuzzy Reasoning Inter.J of Uncertainty, Fuzziness and Knowledge-Based System 7, pp 347-361 [6] Nguyen Cat Ho, Vu Minh Loc, Hoang Tung, Nguyen Tan An, 2013 Dependence of fuzzy function in database in relationship with language data J of Technology Science, No 2, pp 137-152 (in Vietnamese) 38 ... itself, in general 34 Semantic quantitation of linguistic values embedded in complete linear hedge algebras 2.2.5 Fuzziness measure of terms using fuzziness intervals * Fuzziness interval of x... sake that the linguistic attributes have the 36 Semantic quantitation of linguistic values embedded in complete linear hedge algebras same real domain, which is the interval [0, 100], and the same... as the integration of fuzzy values, a comparison of fuzzy values, computing the distance between fuzzy values and measuring the similarity between the fuzzy values Therefore the quality of the