Counting subwords and regular languages (Q1622966)

For a pair of words \(x\) and \(y\) the languages \(L_{x=y}\), \(L_{x<y}\), and \(L_{x \leq y}\) consist of all words \(z\) such that the number of occurrences of \(x\) as possibly overlapping subwords of \(z\) is, respectively, the same, less than, or less than or equal to the number of occurrences of \(y\). Such languages are always deterministic context-free and the paper gives neccessary and sufficient conditions for regularity of these languages. The word \(z\) is said to be bordered by a non-empty word \(x\), or \(x\)-bordered, if \(x\) is a prefix and a suffix of \(z\). If \(y\) is a subword of every \(x\)-bordered word then \(x\) is called interlaced by \(y\). The authors prove that the languages \(L_{x=y}\) and \(L_{x<y}\) are regular if and only if either \(x\) is interlaced by \(y\), or \(y\) is interlaced by \(x\). This criterion can be tested by checking the emptiness of two regular languages. The set of all \(y\)-bordered words \(z\) such that \(x\) is not a subword of \(z\) is the language \((\Sigma^*x\Sigma^*)^c \cap y\Sigma^+ \cap \Sigma^+y\), where \(\Sigma\) is the underlying alphabet, and \(L^c\) is the complement of \(L\) with respect to \(\Sigma^*\). Such construction leads to an algorithm that decides regularity of \(L_{x=y}\) and \(L_{x<y}\) in \(O(|\Sigma||x||y|)\) time. The authors provide a more efficient procedure. They prove that \(y\) is interlaced by \(x\) if and only if \(x\) is a subword of \(yty\) for all words \(t\) of constant length \(C\), where \(C=1\) if \(|\Sigma|\geq 3\), and \(C=3\) if \(|\Sigma|=2\). In the last section the authors consider conditions for finiteness of \(L_{x=y}\) and a generalization for more then two words. The language \(L_{x=y}\) is finite if and only if \(|\Sigma|=1\) and \(x\neq y\) (the proof is omitted in this paper). For the generalized language \(L_{x_1=x_2=\ldots=x_n}\) it is proved that this language is infinite if \(|x_1|=|x_2|=\ldots=|x_n|\). General conditions for finiteness of such languages remain open. For the entire collection see [Zbl 1398.68030].

0 references

reviewed by

Sergey Afonin

0 references

zbMATH Keywords

subwords counting

0 references

regular languages

0 references

finiteness

0 references

MaRDI profile type