Counting subwords and regular languages (Q1622966)
From MaRDI portal
scientific article
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Counting subwords and regular languages |
scientific article |
Statements
Counting subwords and regular languages (English)
0 references
22 November 2018
0 references
For a pair of words \(x\) and \(y\) the languages \(L_{x=y}\), \(L_{x<y}\), and \(L_{x \leq y}\) consist of all words \(z\) such that the number of occurrences of \(x\) as possibly overlapping subwords of \(z\) is, respectively, the same, less than, or less than or equal to the number of occurrences of \(y\). Such languages are always deterministic context-free and the paper gives neccessary and sufficient conditions for regularity of these languages. The word \(z\) is said to be bordered by a non-empty word \(x\), or \(x\)-bordered, if \(x\) is a prefix and a suffix of \(z\). If \(y\) is a subword of every \(x\)-bordered word then \(x\) is called interlaced by \(y\). The authors prove that the languages \(L_{x=y}\) and \(L_{x<y}\) are regular if and only if either \(x\) is interlaced by \(y\), or \(y\) is interlaced by \(x\). This criterion can be tested by checking the emptiness of two regular languages. The set of all \(y\)-bordered words \(z\) such that \(x\) is not a subword of \(z\) is the language \((\Sigma^*x\Sigma^*)^c \cap y\Sigma^+ \cap \Sigma^+y\), where \(\Sigma\) is the underlying alphabet, and \(L^c\) is the complement of \(L\) with respect to \(\Sigma^*\). Such construction leads to an algorithm that decides regularity of \(L_{x=y}\) and \(L_{x<y}\) in \(O(|\Sigma||x||y|)\) time. The authors provide a more efficient procedure. They prove that \(y\) is interlaced by \(x\) if and only if \(x\) is a subword of \(yty\) for all words \(t\) of constant length \(C\), where \(C=1\) if \(|\Sigma|\geq 3\), and \(C=3\) if \(|\Sigma|=2\). In the last section the authors consider conditions for finiteness of \(L_{x=y}\) and a generalization for more then two words. The language \(L_{x=y}\) is finite if and only if \(|\Sigma|=1\) and \(x\neq y\) (the proof is omitted in this paper). For the generalized language \(L_{x_1=x_2=\ldots=x_n}\) it is proved that this language is infinite if \(|x_1|=|x_2|=\ldots=|x_n|\). General conditions for finiteness of such languages remain open. For the entire collection see [Zbl 1398.68030].
0 references
subwords counting
0 references
regular languages
0 references
finiteness
0 references