An application of the Goulden-Jackson cluster theorem

Let A be an alphabet and let F be a set of words with letters in A. We show that the sum of all words with letters in A with no consecutive subwords in F, as a formal power series in noncommuting variables, is the reciprocal of a series with all coefficients 0, 1 or -1. We also explain how this result is related to a result of Curtis Greene on lattices with M\"obius function 0, 1, or -1.


Introduction
Let A be an alphabet, and let A * be the free monoid of words made up of letters in A, with the operation of concatenation. Let R be a commutative ring with identity (in our application we may take it to be the polynomial ring Z[t]), and let R A * be the ring of formal sums of elements of A * with coefficients in R, multiplied in the obvious way. Then R A * may be viewed as an algebra of formal power series in the noncommutative variables in A. We write 1 for the empty word, which is also the identity element of R A * , and we write |w| for the length of the word w.
We call a word u a subword of a word v if there exist words p and q such that v = puq. Let F be a set of nonempty words in A * , and let A F be the set of words in A * in which no word in F occurs as a subword. Since 1 ∈ A F , the sum w∈A F w is invertible in R A * . Our main result describes the coefficients of the reciprocal of this sum.
We also explain how Theorem 1.1 is closely related to a result of Curtis Greene [10], that lattices of unions of intervals, ordered by inclusion, have Möbius functions that are 0, 1, or −1.

The Goulden-Jackson Cluster Theorem
Let F be a set of words in A * all of length at least 2. We call the elements of F forbidden words. The cluster theorem allows us to count words in A * by the number of forbidden subwords in terms of certain "marked words" called clusters. Informally, a cluster is a word which is covered by an overlapping collection of marked forbidden subwords. For example, if A = {a} and F = {aaa} then the following are both clusters on the word a 6 : (1) a a a a a a a a a a a a but a a a a a a is not a cluster since the marked subwords don't overlap.
There are several possible formal definitions for marked words and clusters. The definition that we use is the most convenient for explaining the connection between our main result and Greene's theorem on Möbius functions of unions of intervals; it also allows the simple characterization of clusters given in Lemma 2.1 below.
We use the notation [m, n) for the interval of integers { i ∈ Z : m i < n }. We define a marked word (with respect to the alphabet A and the set of forbidden words F ) to be an ordered pair (w, I) where w = α 1 α 2 · · · α n is a word in A * and I is a set of intervals [i, j), with i < j |w|, for which the subword α i α i+1 · · · α j is in F . We call w the underlying word and I the set of intervals of the marked word (w, I). For consistency, we will say that the subword α i α i+1 · · · α j of the word α 1 α 2 · · · α n occurs at the interval [i, j). Note that for each letter α ∈ A, (α, ∅) is a marked word.
The concatenation of two marked words (u, I) and (v, J) is defined by Concatenation of marked words is easily seen to be associative and compatible with projection onto the underlying word.
A marked word is a cluster if its underlying word has length at least 2 and it cannot be expressed as a concatenation of two nonempty marked words. We call a word w a cluster word if it is the underlying word of a cluster. The following characterization of clusters is clear:    If we identify each letter α ∈ A with the marked word (α, ∅) then every marked word can be expressed uniquely as a concatenation of letters and clusters.
We define the cluster polynomial P F,w (t) of a word w by where the sum is over all I for which (w, I) is a cluster. For example, if w = aaab and F = {aa, aab} then the two clusters with underlying word w are [w, I) with If w is not a cluster word then P F,w (t) = 0. We define the cluster generating function for F to be where the sum is over all words w in A * . For a word w ∈ A * , let s F (w) be the number of subwords of w (counted with multiplicities) that are in F . For example, if F = {aa} then s F (a 4 ) = 3.
We now state and prove our form of the Goulden-Jackson cluster theorem. (The original cluster theorem [8] has more general weights, but is commutative; the proofs are essentially the same.) For some applications of the cluster theorem, see Bassino, Clément, and Nicodème [2], Noonan and Zeilberger [12], Wang [14], and Zhuang [15,16].
In particular, for t = 0 we have where A F is the set of words in A * with no subwords in F . Proof. Replacing t by t + 1 in (2) gives the equivalent formula which is easy to prove directly: The coefficient of a word w on the left side of (4) counts marked words with underlying word w, where a marked word (w, I) contributes t |I| . But since every marked word is a unique concatenation of clusters and letters, the right side of (4) also counts marked words, with the same weights.
As an example of Theorem 2.2, let A = {a, b, c} and F = {abc, bcc}. There are three clusters: As another example, take A = {a} and F = {a 3 }. Although it's not hard to compute the cluster generating function directly, an indirect approach is even easier: For n 2, there are n − 2 occurrences of a 3 in a n . So Replacing t with 1 + t, applying (4), and simplifying gives Setting t = −1 in (5) gives a formula that we will derive in another way in section 4.

A recurrence for the cluster polynomial
The cluster polynomial P F,w (t) can be computed by a simple recurrence that will be needed in the proof of the main theorem. In Lemma 3.1 below we require that F be reduced ; that is, no word in F is a subword of another word in F . This condition makes the recurrence simpler, and the case of reduced F is sufficient for the proof of Theorem 1.1. We also require that w be a cluster word; if w is not a cluster word then P F,w (t) = 0.
Lemma 3.1. Suppose that F ⊆ A * is a reduced set of forbidden words, and that w = α 1 α 2 · · · α n is a cluster word with respect to F . Then there exists a positive integer m, polynomials p 1 , p 2 , . . . , p m in t, and positive integers r 2 , r 3 , . . . , r m , with 1 r k k − 1, such that p 1 = t, p k = t(p r k + p r k +1 + · · · + p k−1 ) for 2 k m, and p m = P F,w (t).
Proof. Let the intervals of the forbidden subwords of w be [i 1 , j 1 ), . . . , [i m , j m ), where i 1 < · · · < i m and thus (since F is reduced) j 1 < · · · < j m . Since w is a cluster word, we have i 1 = 1 and j m = n. Let w k be the word α 1 α 2 · · · α j k for 1 k m and let p k = P F,w k (t). Then each w k is a cluster word and w m = w. Since w 1 ∈ F , we have p 1 = t. Now suppose that k > 1. Then j k−1 i k since w is a cluster word. So we may define r k to be the least integer such that j r k i k , and we have 1 r k k − 1. In any cluster on w k , the last interval must be [i k , j k ) and the next-to-last interval [i l , j l ) must satisfy j l i k and thus r k l k − 1, and the contribution to p k from this value of l is tp l . Thus p k = t(p r k + p r k +1 + · · · + p k−1 ).

Counting words without forbidden subwords
The sum of all words in A * with no forbidden subwords is given by (3). Thus we can prove Theorem 1.1 by showing that P F,w (−1) is always 0, 1 or −1.
Let us first look at an important special case: Suppose that every forbidden word has length 2. Then the clusters are of the form (α 1 α 2 · · · α n , I), where α i α i+1 ∈ F for 1 i < n and I = { [i, i + 1) : 1 i < n }, and the cluster polynomial for the word α 1 α 2 · · · α n , when nonzero, is t n−1 . In this case (3) may be written in the following symmetrical form: Corollary 4.1 was first proved by Fröberg [6] (in a somewhat weaker form) and by Carlitz, Scoville, and Vaughan [3], and was applied to various problems of permutation enumeration in Gessel [7]. Many related results can be found in Goulden and Jackson's book [9,Chapter 4].
We can now prove our main result. where for every word w, M (w) is 0, 1, or −1.
Proof. By equation (3) in Theorem 2.2, M (w) = −P F,w (−1), so it is sufficient to show that for every word w, P F,w (−1) is 0, 1, or −1. We will derive this from Lemma 3.1. We may assume without loss of generality that F is reduced. To see this, note that if a word u in F is a subword of another word v in F , then forbidding u as a subword automatically forbids v, so we may remove v from F without changing A F . Changing F in this way may change C F (t) but it will not change C F (−1).
Then setting t = −1 in Lemma 3.1 and applying Lemma 4.2 yields the theorem.

Dotsenko and Khoroshkin [5, Corollary 22] proved a result closely related to
Theorem 4.3. Their interest was in finding exponential generating functions for permutations avoiding consecutive patterns, so they did not consider words with repeated entries. However, their approach, based on earlier work of Anick [1], can be used to prove Theorem 4.3, and gives an explicit, though recursive, description of the values of M (w) in Theorem 4.3. A proof of Theorem 4.3, using algebraic techniques, was given by Iyudu and Vlassopoulos [11,Corollary 4.1]. See also Dotsenko, Vincent Gélinas, and Tamaroff [4, section 1.1], which discusses "Anick chains" and their connection with Tor groups of monomial algebras.
We can obtain a similar (though not obviously equivalent) description of M (w) through a refinement of Lemma 4.2 that takes into account that F is reduced: With the assumptions of Lemma 4.2, suppose that in addition we have r k−1 r k for 2 < k m. Then in the sum on the right side of (8) at most two terms are nonzero. Thus u k is nonzero if and only if exactly one of u r k , . . . , u k−1 is nonzero.
Proof. We proceed by induction on k. The assertion is clearly true for k = 2, since there is only one term in the sum. Now suppose that k > 2 and that among u r k−1 , . . . , u k−2 , at most two are nonzero. Since r k r k−1 , at most two of u r k , . . . , u k−2 are nonzero. If fewer than two are nonzero, then at most two of u r k , . . . , u k−1 are nonzero, and if exactly two of u r k , . . . , u k−2 are nonzero, then by (8) (with k − 1 for k), u k−1 = 0, and the conclusion follows.
Let us call a word w in A * of length greater than 1 salient (with respect to F ) if M (w) = 0. Applying Lemma 4.4 gives an explicit, though recursive, characterization of salient words. First we note that by Theorem 2.2, every salient word must be a cluster word.
Theorem 4.5. Let F be a reduced set of forbidden words, and let w = α 1 α 2 · · · α n be a cluster word. If w ∈ F then w is salient with M (w) = 1. Otherwise, suppose that the last forbidden subword in w has interval [j, n). Then w is salient if and only if there is exactly one salient initial subword w = α 1 α 2 · · · α m of w with j m < n, and in this case M (w) = −M (w ).
Proof. Since F is reduced, the integers r k of Lemma 3.1 satisfy r k−1 r k for 2 < k m. The theorem then follows by setting t = −1 in (7) and applying Lemma 4.4.
As an example of Theorem 4.5, take A = {a} and F = {a 3 }. With the notation of Theorem 4.5, let us call the initial subwords α 1 α 2 · · · α m of w with j m < n the candidates for w. In this example, the cluster words are of the form a n with n 3, and the candidates for a n , with n 3, are a n−1 and a n−2 .
The word a 3 is salient, since it is in F , and M (a 3 ) = 1. The candidates for a 4 are a 3 and a 2 . Only a 3 is salient, so a 4 is salient with M (a 4 ) = −1. The candidates for a 5 are a 4 and a 3 . Since both are salient, a 5 is not. The candidates for a 6 are a 5 and a 4 . Of these only a 4 is salient, so a 6 is salient with M (a 6 ) = 1. In general, we can easily show by induction that for n > 3, if n ≡ 0 (mod 3) then only candidate a n−2 is salient, so a n is salient. If n ≡ 1 (mod 3) then only candidate a n−1 is salient so a n is salient. However, if n ≡ 2 (mod 3) then a n−1 and a n−2 are both salient, so a n is not salient. A similar analysis holds for F = {a k } for any k 2.

Greene's theorem on Möbius functions of lattices
Theorem 4.3 is closely related to a result of Curtis Greene [10], which we will derive from it. We note that Greene's proof of this result used Lemma 4.2.
Greene's Theorem. Let I 1 , I 2 , . . . , I m be nonempty intervals in Z. Let L be the lattice of unions of the I j (including the empty set) ordered by inclusion. Then for all X ⊆ Y in L we have µ(X, Y ) ∈ {−1, 0, 1}, where µ is the Möbius function of L .
To prove Greene's theorem, we first recall a well-known formula for the Möbius function of a lattice, a special case of Rota's cross-cut theorem [13]. We include a short proof for completeness.
Recall that an atom of a lattice is an element that covers the minimal element0.
Lemma 5.1. Let L be a finite lattice, and let A be the set of atoms of L. Then for any x ∈ L, the Möbius function µ(0, x) is given by where the sum is over all subsets B ⊆ A with join x.
Proof. Let f (x) be the sum on the right side of (9). Then y x f (y) = B⊆Ax (−1) |B| , where A x is the set of atoms of L less than or equal to x. Thus y 0 f (y) = f (0) = 1 and if x >0 then y x f (y) = 0. Thus f (x) satisfies the same recurrence that defines µ(0, x), so f (x) = µ(0, x).
Proof of Greene's Theorem. We prove the case in which X =0 = ∅ and Y =1 = I 1 ∪ I 2 ∪ · · · ∪ I m ; the general case follows easily from this case. Let I = {I 1 , I 2 , . . . , I m } be a set of nonempty intervals in Z. Without loss of generality we may assume that I 1 ∪ I 2 ∪ · · · ∪ I m = [1, n), where n 2. Let L be the lattice of unions of the intervals in I . Let I be the set of intervals in I that do not contain smaller intervals in I , so I is the set of atoms of L . By Lemma 5.1, if ∪ I∈I I = [1, n) then µ(0,1) = 0. So we may assume that ∪ I∈I I = [1, n). Again by Lemma 5.1, µ(0,1) depends only on I , so we may now assume that I = I . Then there is an alphabet A, a reduced set of forbidden words F ⊆ A * , and a word w such that I 1 , I 2 , . . . , I m are the intervals of the forbidden subwords of w. (For example, we may take w to be a word of length n with distinct letters and take F to be the set of subwords corresponding to the intervals I 1 , . . . , I m .) By Lemma 5.1, µ(0,1) = I (−1) I , where the sum is over all subsets I of I with union [1, n). Thus by the definition of the cluster polynomial and Lemma 2.1, we have µ(0,1) = P F,w (−1), which by Theorem 4.3 is 0, 1, or −1.