Counting subwords and regular languages (Q1622966)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Counting subwords and regular languages
scientific article

    Statements

    Counting subwords and regular languages (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    22 November 2018
    0 references
    For a pair of words \(x\) and \(y\) the languages \(L_{x=y}\), \(L_{x<y}\), and \(L_{x \leq y}\) consist of all words \(z\) such that the number of occurrences of \(x\) as possibly overlapping subwords of \(z\) is, respectively, the same, less than, or less than or equal to the number of occurrences of \(y\). Such languages are always deterministic context-free and the paper gives neccessary and sufficient conditions for regularity of these languages. The word \(z\) is said to be bordered by a non-empty word \(x\), or \(x\)-bordered, if \(x\) is a prefix and a suffix of \(z\). If \(y\) is a subword of every \(x\)-bordered word then \(x\) is called interlaced by \(y\). The authors prove that the languages \(L_{x=y}\) and \(L_{x<y}\) are regular if and only if either \(x\) is interlaced by \(y\), or \(y\) is interlaced by \(x\). This criterion can be tested by checking the emptiness of two regular languages. The set of all \(y\)-bordered words \(z\) such that \(x\) is not a subword of \(z\) is the language \((\Sigma^*x\Sigma^*)^c \cap y\Sigma^+ \cap \Sigma^+y\), where \(\Sigma\) is the underlying alphabet, and \(L^c\) is the complement of \(L\) with respect to \(\Sigma^*\). Such construction leads to an algorithm that decides regularity of \(L_{x=y}\) and \(L_{x<y}\) in \(O(|\Sigma||x||y|)\) time. The authors provide a more efficient procedure. They prove that \(y\) is interlaced by \(x\) if and only if \(x\) is a subword of \(yty\) for all words \(t\) of constant length \(C\), where \(C=1\) if \(|\Sigma|\geq 3\), and \(C=3\) if \(|\Sigma|=2\). In the last section the authors consider conditions for finiteness of \(L_{x=y}\) and a generalization for more then two words. The language \(L_{x=y}\) is finite if and only if \(|\Sigma|=1\) and \(x\neq y\) (the proof is omitted in this paper). For the generalized language \(L_{x_1=x_2=\ldots=x_n}\) it is proved that this language is infinite if \(|x_1|=|x_2|=\ldots=|x_n|\). General conditions for finiteness of such languages remain open. For the entire collection see [Zbl 1398.68030].
    0 references
    0 references
    subwords counting
    0 references
    regular languages
    0 references
    finiteness
    0 references

    Identifiers