Datasets for testing accuracy of microbial gene tree rooting methods
The following simulated and biological datasets were used in the paper cited below to test the accuracy of various gene tree rooting methods on microbial gene families. The paper contains a detailed description of these datasets.
Simulated datasets: AllSimulatedDatasets.zip
Real biological dataset: RootingEmpiricalDataset.zip
Assessing the Accuracy of Phylogenetic Rooting Methods on Prokaryotic Gene Families
Taylor Wade, L. Thiberio Rangel, Soumya Kundu, Gregory P. Fournier, Mukul S. Bansal.
PLOS One, 15(5): e0232950, 2020.
Simulated datasets for testing DTRL reconciliation algorithms and classification of additive and replacing transfers
The following simulated datasets contain gene trees and species trees where the gene trees were evolved inside the species tree with gene duplications, additive transfers, replacing transfers, and gene losses, using the SaGePhy simulation framework. These datasets were used in the paper cited below to test the accuracy of a heuristic for classifying transfer events inferred through DTL reconciliation as being additive or replacing. The paper contains a detailed description of these datasets.
Simulated datasets: DTRL_simulatedData.zip
On Inferring Additive and Replacing Horizontal Gene Transfers Through Phylogenetic Reconciliation
Misagh Kordi, Soumya Kundu, Mukul S. Bansal.
ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB) 2019; Proceedings, pages 514-523..
COVID-19 dataset for inferring global/international COVID-19 transmission network
This dataset includes the GISAID accession numbers for 2123 downloaded and filtered SARS-CoV-2 sequences (downloaded from GISAID in June 2020) from 59 countries, the specific command used to align those sequences, and the 10 bootstrap phylogenies computed on the aligned sequences using RAxML. This dataset was used to infer the international COVID-19 transmission network using TNet. Following GISAID’s policies, actual sequences have not been included in this dataset. The actual genomic sequences and associated metadata (including country of origin, country of exposure, etc.) can be downloaded from GISAID using the provided accession numbers. The manuscript cited below contains a detailed description of this dataset.
Global COVID-19 dataset: Global_COVID-19_Dataset.zip
Acknowledgement table for the COVID-19 sequence data used the above dataset: gisaid_acknowledgement_table_world.pdf.
TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread
Saurav Dhar, Chengchen Zhang, Ion Mandoiu, Mukul S. Bansal.
IEEE/ACM Transactions on Computational Biology and Bioinformatics; 2021 (in press).