Multiple sequence comparison and consistency on multipartite graphs (Q1894790)

From MaRDI portal





scientific article; zbMATH DE number 778906
Language Label Description Also known as
English
Multiple sequence comparison and consistency on multipartite graphs
scientific article; zbMATH DE number 778906

    Statements

    Multiple sequence comparison and consistency on multipartite graphs (English)
    0 references
    0 references
    0 references
    18 March 1996
    0 references
    The human genome project has generated vast amounts of biological sequence data. While pairwise comparison of such sequences is traditionally done using the so-called dot-matrices, multiple comparison of \(n\) sequences poses the challenging task of identifying commonalities or consensuses in these, using the pairwise comparisons. The authors model this as an \(n\)-dimensional image reconstruction problem. The pairwise comparisons may correspond to projections of `real' \(n\)- dimensional points or `noises'; that is which are not projections in 2- dimensional coordinate planes of \(n\)-dimensional points. One way for the reconstruction is to use a notion of consistency of the pairwise comparisons, and the other extreme is the idea of transitivity. The mathematical model used is that of an \(n\)-partite graph \(G(V_1\cup V_2\cup\cdots \cup V_n, E)\), where the \(V_i\) correspond to the biological sequences consisting of `real points' and `noises' and the edges \((i, j)\), \(i\in V_s\), \(j\in V_t\) represent the pairwise alignments (dots). Consistency is defined through the membership of the edges in triangles. The mathematical consistency problem boils down to determining a maximal consistent subgraph of \(G\). Besides showing that an earlier algorithm by the first author and \textit{P. Argos} [J. Mol. Biol. 218, 33-43 (1991)] may involve \(O(L)\) numbers of iterations, each of complexity \(O(n^3 L^3)\), where \(L\) is the length of the sequences, the present authors give an algorithm with overall complexity \(O(n^3 L^3)\). The transitivity problem is the dual to this and can be tackled similarly. But, what may be useful for biological applications, is the introduction of a hierarchy of consistency classes through the idea of \(k\)-consistency defined here by means of a `\(k\)-mask' of a matrix. The utility of this is illustrated by the study of the protein elongation factor \(Tu\) in drosophila, homo sapiens and ten other species. Three open problems are suggested.
    0 references
    multiple sequence comparison
    0 references
    multipartite graphs
    0 references
    biological sequence data
    0 references
    pairwise comparisons
    0 references
    reconstruction
    0 references
    consistency
    0 references
    transitivity
    0 references
    alignments
    0 references
    algorithm
    0 references
    complexity
    0 references
    protein elongation factor \(Tu\)
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references