Báo cáo toán học: "The Number of Positions Starting a Square in Binary Words" pps

The Number of Positions Starting a Square in Binary Words Tero Harju Department of Mathematics University of Turku, Finland harju@utu.fi Tomi Kärki Department of Mathematics University of Turku, Finland topeka@utu.fi Dirk Nowotka Institute for Formal Methods in Computer Science (FMI) Universität Stuttgart, Germany nowotka@fmi.uni-stuttgart.de Submitted: Sep 3, 2010; Accepted: Dec 14, 2010; Published: Jan 5, 2011 Mathematics Subject Classification: 68R15 Abstract We consider the number σ(w) of positions that do not start a square in binary words w. Letting σ(n) denote the maximum of σ(w) for length |w| = n, we show that lim σ(n)/n = 15/31. 1 Square-free positions and strong words Every binary word with at least 4 letters co ntains a square. A.S. Fraenkel and J. Simp- son [2, 1] studied the number of distinct squares in binary word; see also Ilie [4], where it was shown that a binary word can contain at most 2n − Θ(log n) distinct squares. It has been conjectured that n is an upper bound in this case. On the other hand, in an impressive paper [5] G. Kucherov, P. Ochem and M. Rao proved that the minimum number of occurrences of squares in binary words is asymptotically equal to 0.55080 . . . times the length of the word. Later Ochem and Rao [7] showed that this constant is exactly 103/187. In the present paper we count the minimum number of positions in binary words that starts a square, and we show that asymptotically t his is 16/31 = 0.516 . . For our convenience, we state the result in the dual case, i.e., we count the maximum number of positions tha t are square-free. Related question for borders of cyclic wor ds wa s considered by T. Harju and D. Nowotka [3]. the electronic journal of combinatorics 18 (2011), #P6 1 Several parts of the proofs ar e computer aided, both for searching the strong words (the main concept in the proofs) as well as for checking their compatibilities. We have included the Mathematica code for the search of strong wo rds. We refer to Lothaire [6] for elementary definitions in combinatorics on words. Let A = {a, b, c} be a ternary alphabet, and B = {0, 1} a binary alphabet. For a binary word w = a 1 a 2 · · · a n ∈ B ∗ with a i ∈ B, we say that a position i ∈ {1, 2, . . ., n} starts a square, if a i · · · a i+j−1 = a i+j · · · a i+2j−1 for some j such that i + 2j − 1 ≤ n. Otherwise, the p osition i is square-free in w. Fo r r, s ≥ 1, let σ w (r, s) denote the number of square-free positions i with r < i ≤ r +s in the word w. In order to simplify the treatment, we shall write σ w (u) instead of σ w (r, s) where w = xuv such that |x| = r and |u| = s. Hence while talking about σ w (u) the occurrence of the f actor u in w will be implicitly, and without risk of confusion, assumed. Also, let σ( w) = σ w (w). For an integer n ≥ 1, let σ(n) = max{σ(w) : w ∈ B ∗ , |w| = n} . A word w is said to be strong if for all nonempty prefixes u of w, σ w (u) ≥ |u|/2 . We notice that if w is a strong word, then so is its complement ¯w obtained from w by interchanging the letters 0 and 1. Example 1. The short strong words, beginning with 0, are listed in Table 1. As an example consider the word w = 0100110001001 with |w| = 13. We have σ(w) = 8, and the square-free positions are marked by dots in the following copy w = .0.10.01.100.0.10.0.1. The ratio 8/13 is much bigger than the asymptotic bound 15/31 that will be proved in the sequel. One can easily check that w is a strong word. 0 0110 010001 0100110 01001100 010011000 01 01000 010011 0100111 01001101 010011010 010 01001 011001 0110010 01001110 010011100 011 01100 0100010 0110011 010001100 010011101 0100 01101 0100011 01000110 010001101 0100011001 Table 1: The first 30 short strong words. Using Mathematica (version 7.01.0), one can calculate σ(w) and the ratio σ(w)/|w| using functions Sigma and SigmaRatio defined as Sigma[Str_]:= StringLength[Str]- Length[StringPosition[Str,x__ x__,Overlaps -> True]] , SigmaRatio[Str_,j_]:= (j - Length[Select[StringPosition[Str, x__ x__, Overlaps -> True], #[[1]] < j + 1 &]])/j . the electronic journal of combinatorics 18 (2011), #P6 2 Fo r checking whether a word is strong, one can use Strong[Str_] :=Module[{strong, i}, strong = True; i = 0; While[strong && i < StringLength[Str], i = i + 1; strong = (SigmaRatio[Str, i] >= 1/2)]; strong] . A list of all strong words can be generated by the command StrongList = {"0", "1"}; For[i = 1, i < Length[StrongList], i++, If [Strong[StrongList[[i]] <> "0"], StrongList = Append[StrongList, StrongList[[i]] <> "0"]]; If [Strong[StrongList[[i]] <> "1"], StrongList = Append[StrongList, StrongList[[i]] <> "1"]]]; StrongList . After a computer check, we have that there are only finitely many strong words, the longest of which have length 37. More precisely, we have the following lemma. Lemma 1. (1) There are 382 strong words the longest of whic h has leng th 37. (2) If w is a strong wo rd with |w| ≥ 8, then w begins with 0100 or its complement 1011. The long strong words of length at least 27, starting with the letter 0, are in Table 2. 2 Decompositions A min-factor m(w) of a binary word w is the shortest prefix u of w such that σ w (u) < |u|/2, if it exists. By the above observation, each binary word w with |w| ≥ 38 does have a (unique) min-factor. The min-decomposition of w is the factorization w = w 1 w 2 · · · w r w r+1 , where w i = m(w i · · · w r+1 ) for i = 1, 2, . . ., r and the suffix w r+1 does not possess a min-factor. In particular, w r+1 is strong. The following lemma will be crucial in the sequel. Lemma 2. Assume that w = m(w)w ′ for a suffix w ′ with 010 or 101 a prefix of w ′ . Then the mi n-factor m(w) is a strong word. Proof. In order to show that m(w) is strong, consider the prefix p of length |m(w)| − 1. Then σ w (p) = σ w (m(w)) , (1) since w ′ begins with 010 or 101, and thus the last letter of m(w) starts a square in w. By the definition of m(w), we have σ w (m(w)) < |m(w)|/2 and σ w (p) ≥ | p |/2. Hence, combining these with (1), we obtain (|m(w)| − 1)/2 ≤ σ w (m(w)) < |m(w)|/ 2 , the electronic journal of combinatorics 18 (2011), #P6 3 length strong word 27 010011000100111011000100110 010011000100111011001011100 010011000100111011001011101 010011000100111011001110010 010011101100010011010001100 010011101100010011010001101 28 0100110001001110110001001100 0100110001001110110001001101 0100110001001110110010111001 0100111011000100110100011001 29 01001100010011101100010011000 01001100010011101100010011010 01001100010011101100101110010 01001100010011101100101110011 01001110110001001101000110010 01001110110001001101000110011 30 010011000100111011000100110001 010011000100111011000100110100 010011000100111011001011100110 31 0100110001001110110001001100011 0100110001001110110001001101000 0100110001001110110001001101001 0100110001001110110010111001100 0100110001001110110010111001101 32 01001100010011101100010011000110 01001100010011101100010011010001 33 010011000100111011000100110001101 010011000100111011000100110100010 010011000100111011000100110100011 34 0100110001001110110001001101000110 35 01001100010011101100010011010001100 01001100010011101100010011010001101 36 010011000100111011000100110100011001 37 0100110001001110110001001101000110010 0100110001001110110001001101000110011 Table 2: The long strong words. the electronic journal of combinatorics 18 (2011), #P6 4 which implies that |m(w)| is odd and σ w (m(w)) = (|m(w)| − 1)/2. Hence, since t he last letter of m(w) does not start a square in m(w), we have σ(m(w)) ≥ σ w (m(w)) + 1 = (|m(w)| + 1)/2 . This completes the proof that m(w) is strong. 3 Asymptotic behaviour In this section we consider t he asymptotic behaviour of σ(n)/n, a nd prove the following result as a consequence of Theorems 7 and 9. Theorem 3. We h ave lim σ(n) n = 15 31 . 3.1 Upper bound In the next lemmas, let w = w 1 w 2 · · · w r w r+1 (2) be a min- decomposition of w for r ≥ 2. Lemma 4. Each min-factor w i , for i = 1, 2, . . . , r, is of odd leng th. Proof. Assume that w i is a min-factor of even length n. Let v be the prefix of w i of length n − 1. Then σ w (v) ≤ σ w (w i ) ≤ n 2 − 1 = n − 2 2 < n − 1 2 , which contradicts with the definition of a min-factor. Lemma 5. Let i < r. If |w i+1 | ≥ 9 then w i is strong. Proof. Since w i+1 is a min-factor, by t he definitions, its prefix of length |w i+1 | − 1 is a strong word. Each strong word of length at least eight begins with 010 or 101, and thus the claim follows from Lemma 2. The next lemma relies on computations. Lemma 6. If |w i | = 27 an d |w i+1 | ≥ 31 for i < r, then w i is one o f the fo llowing two strong words, 010011000100111011000100110 or 101100111011000100111011001 . Theorem 7. We h ave lim sup σ(n) n ≤ 15 31 . the electronic journal of combinatorics 18 (2011), #P6 5 Proof. Let w = w 1 w 2 · · · w r w r+1 be the min-decomposition of w. Recall that, for i ≤ r, we have σ w (w i ) < |w i |/2, and that the prefix of length |w i |−1 is strong whenever |w i | > 1. Also, by Lemma 4, |w i | is odd for each i ≤ r. We co nsider the factors w i,i+k = w i w i+1 . . . w i+k , where i + k ≤ r. By symmetry, we can assume that in these considerations w i begins with the letter 0. The other case is obtained by complementing the words in the following considerations. Claim. For all i ≤ r − 3, we have σ w (w i,i+k )/|w i,i+k | ≤ 15/31 for some 0 ≤ k ≤ 2. The claim leaves (some of the) suffixes w r−2 w r−1 w r w r+1 unconsidered. However, since these suffixes a re always bounded by length, the claim of the theorem follows. Fo r the present claim , we obtain the following facts aided by computer checks. Fo r each index j < r, if |w j+1 | > 29, then the word p = 01001 100010011 (or, in the symmetric case, its complement ¯p) is a prefix of w j+1 . Indeed, if |w j+1 | > 29, then w j+1 ≥ 31 by Lemma 4, and its prefix of length 30 is strong. By Table 2, every strong word of length 30 ha s the prefix p or ¯p. By Lemma 2, w j is strong, and after a computer check, we find that if |w j | ≥ 25 then w j must be one of the words in Table 3, where the lengths of the words are at most 31. Therefore if |w j+1 | > 29, then |w j | ≤ 31 . (3) Hence, by the definition of a min-factor, we have σ w (w j,j )/|w j,j | ≤ 15/31. We also find by checking t hro ugh the strong words of length 29, with the condition that w j is a min-factor, that if |w j | = 29 with j < r and σ w j,j+1 (w j ) ≥ 14, then |w j+1 | ≤ 29 . (4) Suppose then that |w i | > 31 for i ≤ r − 3, and that, for all k = 1, . . . , r − i, σ w (w i,i+k ) |w i,i+k | > 15 31 . (A) In particular, by (A) and Lemma 5, the factor w i is strong. Moreover, by (3), we have |w i+1 | ≤ 29. If |w i | = 33, then σ w (w i,i+1 )/|w i,i+1 | ≤ (16 + 14)/(33 + 29 ) = 15/31, which contradicts with the assumption (A). Hence, we have |w i | = 35 or 37. First, let |w i | = 35. By the assumption (A), we have to have |w i+1 | = 29 and σ w (w i+1 ) = 14. By (4), since i ≤ r − 2, also |w i+2 | ≤ 29. But now, σ w (w i,i+2 ) |w i,i+2 | ≤ 17 + 14 + 14 35 + 29 + 29 = 15 31 . the electronic journal of combinatorics 18 (2011), #P6 6 Second, let |w i | = 37. Then, by (A), we have |w i+1 | = 27 or 29. Since i ≤ r − 3, the case |w i+1 | = 29 leads to a contradiction. Namely, by (A) and (4), we must have |w i+2 | ≤ 29. If |w i+2 | ≤ 27, then σ w (w i,i+2 ) |w i,i+2 | ≤ 18 + 14 + 13 37 + 29 + 27 = 15 31 contradicts with (A). On the other hand, if |w i+2 | = 29, then as above |w i+3 | ≤ 29 and σ w (w i,i+3 ) |w i,i+3 | ≤ 18 + 14 + 14 + 14 37 + 29 + 29 + 29 = 15 31 . This is again a contradiction. Hence, it follows that we have the factor w i w i+1 with |w i | = 37 and |w i+1 | = 27. In this case, the computer search finds that there is a unique solution for w i , w i = 0100110001001 110110001001101000110010 starting with 0, and w i+1 is one of the following two words of length 27, w i+1 = 1011000100111 01100101110011 , (i1) w i+1 = 1011000100111 01100101110010 . (i2) These words differ from those in Lemma 6 which means |w i+2 | ≤ 29, and σ w (w i,i+2 ) |w i,i+2 | ≤ 18 + 13 + 14 37 + 27 + 29 = 15 31 . Again, this is a contradiction, and the claim follows. length strong word 25 01001 10001001110110010111 25 10110 01110110001001110110 25 10110 01110110001001101000 25 10110 01110110001001100011 27 10110 0111011000100111011001 31 01001 10001001110110001001100011 31 01001 10001001110110001001101000 31 10110 01110110001001110110010111 Table 3: The set of strong words of length at least 25 preceding the word p = 01001100010011. Notice that as starting letters 0 and 1 are not symmetric, because of the chosen p. Also, there are no words in this list of length 29. the electronic journal of combinatorics 18 (2011), #P6 7 Example 2. In the previous proof for the unique min-factor w i with |w i | = 37 where i = r − 2, the computer search states that w i+1 is equal to either of the following words 10110001001110110010111001101 , 10110001001110110010111001100 . The first one has no continuation, but for the second one, we have two candidates for w i+2 to be a min-factor. These are 01001110110001001101000110010 , 01001110110001001101000110011 . 3.2 Lower bound Fo r the lower bound we construct good words from square-free ternary words using the following morphism. Let h: {α, β, ¯α, ¯ β} ∗ → {0, 1} ∗ be the 31-uniform morphism defined by h(α) = 010011000 1001110110001001101000 , h(β) = 01001 100010011101100010 01100011 , h(¯α) = 1011001 110110001001110110010111 , h( ¯ β) = 1011 001110110001001110110011100 . We have σ h(xy) (h(x)) = 15 = σ(h(x)) − 1 for all different x, y ∈ {α, β, ¯α} except for xy = β ¯α. Taking the complements, we have σ h(xy) (h(x)) = 15 = σ(h(x)) − 1 for all x, y ∈ {α, ¯ β, ¯α} except for xy = ¯ βα. Take then a square-free ternary word w on the alphabet {α, β, ¯α} and change every occurrence of β ¯α by ¯ β ¯α. Denote the new square-free word on the alphabet {α, β, ¯α, ¯ β} by ˆw. We show that the words h( ˆw) satisfy σ(h( ˆw))/|h( ˆw)| > 15/31. Let us first prove the following lemma. Lemma 8. There are no squares u 2 in h( ˆw) such that |u| ≥ 31. Proof. Suppose on the contrary that there is a square u 2 in h( ˆw) where |u| ≥ 31. Since h( ˆw) consists of blocks h(α), h(β), h(¯α) , h( ¯ β) of length 31, we can write u = xvy = x ′ v ′ y ′ , (5) where x = ε is the prefix of the first u up to the beginning of a new block, v = h(r) consists of full blocks, y is a prefix of the block following v such that |y| < 31 and x ′ v ′ y ′ is the corresponding block decomposition for the second occurrence of u, denoted by u ′ in the sequel. Note that x and x ′ may be full blocks, and some or all of v, y, v ′ , y ′ may the electronic journal of combinatorics 18 (2011), #P6 8 be empty, and the corresponding elements in the two decompositions can be of different length. Moreover, h(z) = yx ′ (6) for some letter z ∈ {α, β, ¯α, ¯ β}. (1) Assume |x| ≥ 5. We notice that the word 01000 (resp. 00011, 10111, 11100) occurs in h( ˆw) only as a suffix of h(α) (resp., h(β), h(¯α), h( ¯ β)). Since x is a prefix of u = u ′ and also a suffix of some blo ck, we conclude that x ′ = x, v ′ = v and y ′ = y. Hence, x ′ = x determines y and z uniquely, and the word xv(yx ′ )v is preceded by y. In other words, (yx)v(yx ′ )v = h(zrzr) must occur in h( ˆw). By the block decomposition (5), this implies that zrzr is a factor of ˆw, which contradicts with the square-freeness of ˆw. (2) Assume |x| < 5. Since |u| ≥ 31, we have |vy| ≥ 27. Hence, v contains a prefix 01001100010 or its complement. We notice that 01001100010 (resp. 10110011101) occurs in h( ˆw) only as a prefix of the block h(α) or h(β) (resp. h(¯α) or h( ¯ β)). Hence, we conclude that in u ′ we must have x ′ = x, v ′ = v and y ′ = y. If |y| ≥ 28, then y = y ′ determines x ′ and z uniquely and v(yx ′ )v(y ′ x ′ ) = h(r zrz) is a factor of h( ˆw). We obtain a contradiction as above. On the other hand, if |y| < 28, then |x ′ | ≥ 4 by (6). A suffix x ′ = x of any block with length at least four determines the block uniquely. Hence, the word (yx)v(yx ′ )v = h(zrzr) is a factor of ˆw. Again, this is a contradiction. Now we are ready to prove the lower bound. Theorem 9. We h ave lim inf σ(n) n ≥ 15 31 . Proof. Let ˆw b e as in the previous proof obtained from a square-free ternary word w. Each square u 2 in h( ˆw) satisfies |u| < 31, and thus u 2 must occur inside h(xyz) for some factor xyz ∈ {α, β, ¯α, ¯ β} 3 in ˆw. However, we verify by a computer check that σ h(xyz) (h(x)) = 15 (7) for all factors xyz of ˆw. Hence, combining ( 7 ) with Lemma 8, we conclude that σ h( ˆw) (h(x)) = σ(h(x)) − 1 = 15 for every x ∈ {α, β, ¯α, ¯ β}, which proves the claim. Acknowledgement. Tomi Kärki acknowledges the support of Magnus Ehrnrooth Foun- dation. References [1] A. S. Fraenkel and J. Simpson. How many squares can a string conta in? J. Combin. Theory Ser. A, 82(1):112–120, 1998. [2] A. S. Fraenkel and R. J. Simpson. How many squares must a binary sequence contain? Electron. J. Combin., 2:R2, 1995. the electronic journal of combinatorics 18 (2011), #P6 9 [3] T. Harju and D. Nowotka. Border correlation of binary words. J. Combin. T heory Ser. A, 108(2):331–341, 2004. [4] L. Ilie. A note on the number of squares in a word. Theoret. Co mput. Sci., 380(3):373– 376, 2007. [5] G. Kucherov, P. Ochem, and M. Rao. How many square occurrences must a binary sequence contain? Electron. J. Combin., 10:R12, 2003. [6] M. Lothaire. Combinatorics on words. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 199 7. [7] P. Ochem and M. Rao. Minimum frequencies of occurrences of squares and letters in infinite words. In Mons Days of Theoretical Co mputer Science, Mons, August 2008. the electronic journal of combinatorics 18 (2011), #P6 10 . The Number of Positions Starting a Square in Binary Words Tero Harju Department of Mathematics University of Turku, Finland harju@utu.fi Tomi Kärki Department of Mathematics University of Turku,. definitions in combinatorics on words. Let A = {a, b, c} be a ternary alphabet, and B = {0, 1} a binary alphabet. For a binary word w = a 1 a 2 · · · a n ∈ B ∗ with a i ∈ B, we say that a position. distinct squares in binary word; see also Ilie [4], where it was shown that a binary word can contain at most 2n − Θ(log n) distinct squares. It has been conjectured that n is an upper bound in this

Định dạng
Số trang	10
Dung lượng	119,54 KB