Simulated wastewater sequencing data for benchmarking SARS-CoV-2 variant abundance estimation

DOI10.5281/zenodo.5307070Zenodo5307070MaRDI QIDQ6698994

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 29 August 2021

Copyright license: No records found.

To evaluate the accuracy of variant abundancepredictions from wastewater sequencing, we built a collection of benchmarking datasets that resemble real wastewater samples. For each variant (B.1.1.7, B.1.351, B.1.427, B.1.429, P.1) we created a series of 33 benchmarks by simulating sequencing reads from a variant genome, as well as a collection of background (non-variant of concern/interest) sequences, such that the variant abundance ranges from 0.05% to 100%. Analogously, we created a second series of benchmarks, simulating reads only from the Spike gene of each SARS-CoV-2 genome. We refer to the first set of benchmarks as whole genome (WG)and to the second set of benchmarks as S-only. We repeated these simulations at different sequencing depths: 100x and 1000x coverage for the whole genome benchmarks, and 100x, 1000x, and 10,000x coverage for the S-only benchmarks.

This page was built for dataset: Simulated wastewater sequencing data for benchmarking SARS-CoV-2 variant abundance estimation