RANGER-DTL: Short for Rapid ANalysis of Gene family Evolution using Reconciliation-DTL, this is a software package for inferring gene family evolution by speciation, gene duplication, horizontal gene transfer, and gene loss. The software takes as input a gene tree (rooted or unrooted) and a rooted species tree and reconciles the two by postulating speciation, duplication, transfer, and loss events. RANGER-DTL implements the algorithms presented in the ISMB 2012, RECOMB 2013, IEEE/ACM TCBB and BMC Bioinformatics papers listed on the publications page and makes it possible to perform rigorous evolutionary analyses of even large gene families with thousands of taxa while accounting for confounding factors such as gene tree uncertainty and multiple optima. It can be downloaded from https://compbio.engr.uconn.edu/software/RANGER-DTL/
HoMer: Short for “Horizontal Multi-gene transfer inference”, HoMer is a software package for inferring instances of horizontal multi-gene transfer (HMGT) during the evolutionary history of a collection of microbial species/strains. An HMGT occurs when multiple genes are horizontally transferred in single horizontal transfer event. HoMer takes as input a rooted species tree, gene ordering information for the species/genomes (leaves) represented in the species tree, and rooted gene trees for all gene families (with at least three genes each) present in the species/genomes under consideration. The software outputs a list of inferred HMGTs for each donor-recipient pair on the species tree, where donors/recipients can be leaves (i.e., given genomes) or internal (i.e., ancestral) edges on the species tree. Further technical details appear in the Molecular Biology and Evolution paper available from the publications page. HoMer can be downloaded from https://compbio.engr.uconn.edu/software/homer/
SEADOG: Short for “Simultaneous Evolutionary Analysis of DOmains and Genes through phylogenetic reconciliation”, this is a software package for simultaneous inference of domain-level and gene-level evolution through a joint phylogenetic reconciliation of domain, gene, and species trees. The software takes as input a rooted or unrooted domain tree, rooted gene trees for the gene families in which the domains of the domain tree occur, and a rooted species tree on the species considered in the analysis, and computes a joint Domain-Gene-Species reconciliation of the domain tree with the gene trees and of the gene trees with the species tree. The software implements the Domain-Gene-Species (DGS) reconciliation model and algorithms described in the IEEE/ACM TCBB and ACM-BCB 2018 papers listed on the publications page. SEADOG can be downloaded from http://compbio.engr.uconn.edu/software/seadog/
SaGePhy: Short for “Simulation framework for Subgene and Gene Phylogenies”, SaGePhy is an easy-to-use, open-source, and platform independent software package for simulating gene family evolution within species trees as well as subgene or protein-domain evolution within one or more gene trees. SaGePhy can generate species trees using a probabilistic birth-death process, generate gene trees within a given species tree using a probabilistic model of gene evolution that allows for gene duplications, horizontal gene transfers, and gene losses, and generate subgene or domain family phylogenies inside one or more gene trees by allowing for subgene duplications, horizontal subgene transfers within and across gene families (and either within or across species boundaries), and subgene losses. SaGePhy implements a number important features not found in other phylogenetic simulation software. Further details are available from the software webpage at http://compbio.engr.uconn.edu/software/sagephy/
RF+: This is a program for computing RF(+) distances between phylogenetic trees. RF(+) distance is designed to more meaningfully compute the Robinson-Foulds distance between two trees that only have a partially overlapping leaf set. The traditional approach for computing Robinson-Foulds distance between two trees that only have a partially overlapping leaf set is to first restrict the two trees to their shared leaf set and then compute their Robinson-Foulds distance. We refer to distances computed in this way as RF(-) distances. In contrast, the RF(+) distance between two arbitrary trees is computed by first optimally completing each tree on the union of the leaf sets of both trees so as to minimize the Robinson-Foulds distance between them, and then reporting the Robinson-Foulds distance between the two completed trees. This software implements the algorithms described in the CPM 2021, AMB 2020, and RECOMB-CG 2018 papers listed on the publications page. RF+ can be downloaded from http://compbio.engr.uconn.edu/software/RF_plus/
TNet: TNet is a phylogeny-based method for reconstructing transmission networks for infectious diseases. It takes as input a phylogeny of the strain (pathogen) sequences sampled from infected hosts and analyzes it to estimate the underlying transmission network. TNet relies on the availability of multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. The method is parameter-free and highly scalable and can be easily applied within seconds to datasets with hundreds of strain sequences and hosts. TNet can be downloaded from http://compbio.engr.uconn.edu/software/tnet/ and algorithmic details are available in the ISBRA 2020 paper listed on the publications page.
TNet-Geo: TNet-Geo is a customised and extended version of the TNet software described above and is designed for geographical transmission network analysis when multiple strain sequences from different infected hosts are available from the different geographic regions (e.g., countries) under consideration.TNet-Geo can be used to estimate the extent of infection spread from one region to another in different time periods.
TreeSolve: TreeSolve is a program for gene tree error-correction. TreeSolve is designed for the error-correction of microbial gene trees (with horizontal gene transfer) but can be easily applied to non-microbial gene trees as well. TreeSolve takes as input a rooted gene tree topology, a known rooted species tree, and a collection of (unrooted) gene tree samples such bootstrap replicates or samples from a posterior distribution, and outputs an error-corrected gene tree topology. TreeSolve works by computing branch support values for the given rooted gene tree based on the given replicates/samples, collapsing weakly supported branches in the input gene tree, and then optimally resolving it based on both the input gene tree samples and the species tree while accounting for horizontal gene transfer, gene duplication, and gene loss. TreeSolve serves a similar purpose as the TreeFix-DTL program described below, but is far more scalable and yields multiple candidate error-corrected gene trees. TreeSolve can be downloaded from http://compbio.engr.uconn.edu/software/treesolve/ and algorithmic details are available in the AlCoB 2020 paper listed on the publications page.
ARTra: This is a program for inferring and distinguishing between additive and replacing horizontal gene transfer events. ARTra uses Duplication-Transfer-Loss (DTL) reconciliation to infer transfer events and then uses a trained machine learning classifier to classify the inferred transfers as additive or replacing. The machine learning classifier uses the error-prone classifications generated by several simple rule-based classification heuristics, along with some additional features, to generate an improved ensemble classification. The machine learning framework and rule-based heuristics used by ARTra are described in the ACM-BCB 2020 paper listed on the publications page. ARTra can be downloaded from http://compbio.engr.uconn.edu/software/ARTra
TreeFix: This is a program for very accurate reconstruction of eukaryotic gene trees. TreeFix takes as input a maximum likelihood gene tree topology, a known species tree, and a multiple sequence alignment for the gene family and outputs a more accurate gene tree topology that has statistically equivalent sequence support and better agreement with the species tree topology. Further technical details and experimental evaluation appear in the Systematic Biology paper listed on the publications page. TreeFix was programmed by Yi-Chieh Wu and can be downloaded from http://compbio.mit.edu/treefix/.
TreeFix-DTL: This is a program for very accurate reconstruction of microbial gene trees (with horizontal gene transfer). Like Treefix above, TreeFix-DTL takes as input a maximum likelihood gene tree topology, a known species tree, and a multiple sequence alignment for the gene family and outputs a more accurate gene tree topology while accounting for horizontal gene transfer, gene duplication, and gene loss. Further technical details and experimental evaluation appear in the Bioinformatics paper listed on the publications page. TreeFix-DTL was programmed by Yi-Chieh Wu and can be downloaded from http://compbio.mit.edu/treefix-dtl/.
TreeFix-TP: This is a program for reconstructing highly accurate transmission phylogenies, i.e., phylogenies depicting the evolutionary relationships between infectious disease strains (viral or bacterial) transmitted between different hosts. TreeFix-TP is designed for scenarios where multiple strain sequences have been sampled from each infected host, and it uses the host assignment of each sequence sample to error-correct a given maximum likelihood phylogeny of the strain sequences. Specifically, given a maximum likelihood phylogeny, the multiple sequence alignment on which the phylogeny was built, and the host assignment for each sequence, TreeFix-TP searches around the maximum likelihood phylogeny to find an alternate error-corrected phylogeny which is equally well-supported by the sequence data and minimizes the number of necessary inter-host transmissions. TreeFix-TP can be downloaded from http://compbio.engr.uconn.edu/software/treefix-tp/
RF-Supertrees: This is a fast and accurate supertree program for rooted phylogenetic trees. It searches for a supertree that minimizes the total (rooted) Robinson-Foulds distance (i.e. symmetric difference) between the supertree and the input trees. RF-Supertrees implements efficient search algorithms described in the paper Robinson-Foulds Supertrees listed on the publications page, and can be downloaded from https://genome.cs.iastate.edu/rfsupertrees.
DupTree: This is a tool box for constructing species phylogenies from genome-scale multi-locus data using gene tree parsimony. The idea is to find the species tree that best reconciles the input gene trees in terms of gene duplications. Joint programming work with Andre Wehe. This toolbox implements the fast local search algorithm described in the RECOMB’07 paper listed on the publications page. DupTree can be downloaded from https://genome.cs.iastate.edu/DupTree.
DupLoss and DeepC: These programs extend on the program DupTree and allow the construction of species phylogenies, from genome-scale multi-locus data, under the duplication-loss and deep coalescence cost models respectively. They implement the fast local search algorithms described in the APBC’10 paper listed on the publications page and are now available as part of the software package iGTP which can be downloaded from https://genome.cs.iastate.edu/igtp/home.
HiDe: HiDe (short for Highway Detection) is a software package for inferring highways of horizontal gene transfer (representing large-scale horizontal transfer of genes) in the evolutionary history of a set of species. HiDe implements the highway detection method described in this 2013 paper listed on the publications page and was programmed by undergraduate summer student Guy Banay under my supervision. HiDe can be downloaded from http://acgt.cs.tau.ac.il/hide/.