Grammars for language and genes. Theoretical and empirical investigations. Foreword by Aravind K. Joshi (Q640475)

scientific article; zbMATH DE number 5960115

Language	Label	Description	Also known as
English	Grammars for language and genes. Theoretical and empirical investigations. Foreword by Aravind K. Joshi	scientific article; zbMATH DE number 5960115

Statements

instance of

scholarly article

0 references

title

Grammars for language and genes. Theoretical and empirical investigations. Foreword by Aravind K. Joshi (English)

0 references

published in

Theory and Applications of Natural Language Processing

0 references

publication date

18 October 2011

0 references

review text

The book, which originates from the PhD dissertation of the author, develops and demonstrates a framework for the comparison of different types of grammar formalisms from the point of view of their usefulness for applications, focusing on three areas: statistical parsing, natural language translation, and biological sequence analysis. The text starts by discussing the so-called strong generative capacity of grammar formalisms which is measured not only by the complexity of the generated sets of strings but also by the sets of structural descriptions that the grammar assigns to them (the weak generative capacity of a grammar, on the other hand, simply corresponds to the sets of generated strings and does not take into account how the strings were generated). The study of strong generative capacity is interesting, since a grammar may interface with other, higher level modules of a ``system'' through the structural descriptions of the generated strings. The problem, however, (which the studies presented in the book attempt to overcome) is to define what it means for two structural descriptions to be equivalent, especially when they are produced by different formalisms. To this aim, the second chapter generalizes the so called derivational generative capacity of context-free grammars by introducing the concept of local interpretation functions (which allows the classification of a wide range of grammar formalisms according to their power in various interpretation domains), and the notion of a cover (a situation when one grammar is parsed using another grammar, the cover, and therefore inherits its computational properties) in order to try to ``squeeze'' more generative capacity out of a formalism without changing its weak generative capacity. The next four chapters apply the above sketched theoretical framework in three application areas. Chapter 3 considers the field of statistical parsing by introducing a weighted interpretation domain and showing how very general parsing models can be expressed as weighted grammars. Then, as the reinterpretation of probabilistic context-free grammars (PCFG) as covers of grammars with richer structural descriptions, a probabilistic tree adjoining grammar model is defined and shown to capture the same kinds of dependencies as PCFG models, but in a conceptually simpler way. Chapter 4 deals with machine translation where the strong generative capacity of grammar formalisms is measured with respect to the domain of string pairs and tree pairs. Inversion transduction grammars, synchronous context-free grammars, and synchronous regular form tree adjoining grammars all have different strong generative capacities with respect to the domain of string pairs, while their weak generative capacities are the same. With respect to the domain of tree pairs, the formalisms are classified even more finely. The next two chapters discuss biological sequence analysis. Chapter 5 gives a summary of current research in the application of formal grammars in the area and characterizes the ability of different formalisms to model the molecular structures by their derivational generative capacity, while Chapter 6 explores what is called here the strategy of intersection. Using this method, more strong generative power is obtained out of a grammar formalism by combining multiple grammars into a single system which accepts the intersection of the languages accepted by the components, and which assigns to each string a kind of unification of the structural descriptions assigned to them by the component grammars. The strengths and weaknesses of several variants of this strategy are considered. The book ends with a concluding Chapter 7.

0 references

reviewed by

György Vaszil

0 references

zbMATH Keywords

strong generative capacity

0 references

statistical parsing

0 references

machine translation

0 references

biological sequence analysis