Data to form periodic lossless ternary seeds of maximum weight (Part 1) (Q6710265)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Data to form periodic lossless ternary seeds of maximum weight (Part 1) |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Data to form periodic lossless ternary seeds of maximum weight (Part 1) |
Dataset published at Zenodo repository. |
Statements
Data to form periodic lossless ternary seeds of maximum weight. Detailed information can be found in the GitHub project (https://github.com/vtman/perlotSeeds). Codesto generate periodic blocks (binary and ternary) can also be found there. Binary seeds can have only two symbols (0 = do not care = _ or 1 = match = #). The length ofa seedis the number of its elements, weight of a seed is the number of its 1-elements. The goal is to find seeds of maximum weight, so they can be used when there are two strings with a given number of mismatches. It is observed that in many cases these seeds of maximum weight have a periodic structure: the same block is repeated multiple times + its remainder. Blocks for binary seeds can be found with the help of the PerFSeeB project (https://github.com/vtman/PerFSeeB). These blocks have the maximum possible weight. In genetics, we have four symbols in sequences (A, C, G, T). However, the chance of having a pointwise mutation is not the same for any pairs. Atransitionmutation (AGorCT) is often twice higher than atransversionmutation (AC,AT,GC,GT). Transition-constrained seeds use ternary alphabet {#,@,_} where@is for a match or a transition mismatch. To generate ternary seeds, we first need to generate ternary blocks. These ternary blocks can be found when we use binary blocks. However, sometimes, we need to use binary blocks for less than the maximum weight. BinaryDataLevel.zip contains binary blocks (mostly of maximum weight, but 1/5 are for smaller weights (less than one and a couple of blocks than two)). Files T1V1.zip, T1V2.zip,..., and T8V1.zip contain ternary blocks in binary format. T4V2.zip and T7V2.zip are in the other dataset. File bestTernary.zip contains ternary seeds of maximum weight (calculated as the number of # symbols + half of @ symbols)
0 references
9 February 2024
0 references