Examples of sequence alignment with contiguous, binary and ternary seeds

DOI10.5281/zenodo.10645042Zenodo10645042MaRDI QIDQ6710266

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 10 February 2024

Copyright license: No records found.

Classical sequence alignment algorithms use contiguous chunks of symbols to pre-align short sequences (reads) obtained for a studied organism to a long reference sequence. The use of spaced seeds (when we ignore possible differences between two sequences at some positions) allows researchers to improve the sensitivity of alignment algorithms. In genetics, point mutations have different probabilities. Therefore, it may be reasonable to consider transitional (A - G, C - T) and transversional (all other) mutations separately. In perlotSeeds, we consider the alignment of paired-end reads (Han Chinese South, sequence data, ERR016118) with respect to the Human Reference Genome (Human genome assembly GRCh38.p14). We consider various contiguous seeds, e.g. C32 for the length of 32, and ternary seeds for the given reads length (76), e.g. T1V2 is a seed to allow one transitional and two transversional mismatches. Then, generate a library of records corresponding to a chosen seed. This library is used to find candidate alignments of all reads. We provide statistics (InputStat.zip) related to each library generated, i.e. the number of records having the same signature (generated by the seed). There are also output statistics for all reads and chosen seeds, e.g. outputStatT1V3.zip, when we know how many signatures are generated for each read, how many successful alignments can be done and the best score. More detailed information related to several groups of reads can be found in the ExampleOutput.zip file.

This page was built for dataset: Examples of sequence alignment with contiguous, binary and ternary seeds