Treebanks. Building and using parsed corpora (Q1414851)

This compilation contains 21 papers on building and using syntactically parsed natural language corpora (so-called treebanks). The topics being covered are the proper choice of the corpus to be annotated, the choice of the kind of annotation to be added (part-of-speech information, phrase or dependency structures, etc.), whether annotation is best done manually or automatically, with which annotation tools and formats, how search can be conducted in annotated corpora, what kind of knowledge can be extracted (i.e., automatically learned) out of them (e.g., stochastic grammars) and how the results are better than extracting (or learning) from non-annotated sources, and finally, how annotated corpora can be used to evaluate current natural language processing tools such as parsers or grammar checkers. The papers presented deal with a variety of languages, including Chinese, Czech, English, French, German, Italian, Japanese, Polish, Portuguese, Spanish, and Turkish. The articles of this volume will not be indexed individually.

0 references

reviewed by

Udo Hahn

0 references

zbMATH Keywords

text corpus

0 references

corpus annotation

0 references

annotation tool

0 references

annotation language