Wei Wang, Jack Smith, Hussein Hejase and Kevin Liu. Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

Abstract: Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors.

To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi- parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method.

To demonstrate the flexibility of the new technique, we apply SERES to two different applications – one involving aligned inputs and the other involving unaligned inputs. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods for both applications.

return to ‘Accepted Papers’


Tom Davot, Annie Chateau, Rodolphe Giroudeau and Mathias Weller. On the hardness of approximating the Linearization of Scaffolds sharing Repeated Contigs

Abstract: Solutions to genome scaffolding problems can be represented as paths and cycles in a “solution graph”.
However, when working with repetitions, such solution graph may contain branchings and, thus, they may not be uniquely convertible into sequences.
Having introduced various ways of extracting the unique parts of such solutions, we extend previously known NP-hardness results to the case that the solution graph is planar, bipartite, and subcubic, and show the APX-completeness in this case.

return to ‘Accepted Papers’


Cedric Chauve, Jingxue Feng and Liangliang Wang. Detecting introgression in Anopheles mosquito genomes using a reconciliation-based approach

Abstract: Introgression is an important evolutionary mechanism in insects and animals evolution. Current methods for detecting introgression rely on the analysis of phylogenetic incongruence, using either statistical tests based on expected phylogenetic patterns in small phylogenies or probabilistic modelling in a phylogenetic network context, in which case prior hypothesis on possible introgression events are needed. However, introgression leaves a phylogenetic signal similar to horizontal gene transfer, which suggests that introgression detection can also be approached through the gene tree / species tree reconciliation framework. This allows to account jointly for other evolutionary mechanisms such as gene duplication and gene loss. In this work, we apply this principle to a large dataset of Anopheles mosquito genomes and recover the extensive introgression that occurs in the gambiae complex, although with some variations compared to previous reports, and observe a strong support for an ancient introgression event involving An. christyi.

return to ‘Accepted Papers’


Manuel Lafond, Aniket Mane, Pedro Feijao and Cedric Chauve. The rooted SCJ median with single gene duplications

Abstract: The median problem is a classical problem in genome rearrangements. It aims to compute a gene order that minimizes the sum of the genomic distances to k>=3 given gene orders. This problem is intractable but in the related Single Cut-or-Join and breakpoint rearrangement models. Here we consider the rooted median problem, where we assume one of the given genomes to be ancestral to the median, which is itself ancestral to the other genomes. We show that in the Single Cut or Join model with single gene duplications, the rooted median problem is NP-hard. We also describe an Integer Linear Program for solving this problem, that we apply to simulated data, showing high accuracy of the reconstructed medians.

return to ‘Accepted Papers’


Thomas Dencker, Chris-Andre Leimeister, Michael Gerth, Christoph Bleidorn, Sagi Snir and Burkhard Morgenstern. Multi-SpaM: a Maximum-Likelihood approach to Phylogeny reconstruction using Multiple Spaced-Word Matches and Quartet Trees

Abstract: Word-based or `alignment-free’ methods for phylogeny reconstruction are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate pairwise distances for a set of input sequences, for example from word frequencies, from so-called spaced-word matches or from the average length of common substrings. In this paper, we propose the first word-based phylogeny approach that is based on multiple sequence comparison and Maximum Likelihood. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program Quartet MaxCut is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees calculated with our approach are of high quality.

return to ‘Accepted Papers’


Mukul S. Bansal. Linear-Time Algorithms for some Phylogenetic Tree Completion Problems under Robinson-Foulds Distance

Abstract: We consider two fundamental computational problems that arise when comparing phylogenetic trees, rooted or unrooted, with non-identical leaf sets. The first problem arises when comparing two trees where the leaf set of one tree is a proper subset of the other. The second problem arises when the two trees to be compared have only partially overlapping leaf sets. The traditional approach to handling these problems is to first restrict the two trees to their common leaf set. An alternative approach that has shown promise is to first “complete” the trees by adding missing leaves, so that the resulting trees have identical leaf sets. This requires the computation of an optimal completion that minimizes the distance between the two resulting trees over all possible completions.
We provide optimal linear-time algorithms for both completion problems under the widely-used Robinson-Foulds (RF) distance measure. Our algorithm for the first problem improves the time complexity of the current fastest algorithm from quadratic (in the size of the two trees) to linear. No algorithms have yet been proposed for the more general second problem where both trees have missing leaves. We advance the study of this general problem by proposing a biologically meaningful restricted version of the general problem and providing optimal linear-time algorithms for the restricted version. Our experimental results on biological data sets show that using completion-based RF distances can result in very different evolutionary inferences compared to traditional RF distances.

return to ‘Accepted Papers’


Alexey Zabelkin and Nikita Alexeev. Estimation of the true evolutionary distance under the INFER model

Abstract: Genome rearrangements are evolutionary events that shuffle genomic architectures. Usually the rearrangement distance between two genomes is estimated as the minimal number of rearrangement needed to transform one genome into another, which is usually referred to as the parsimony assumption.

Since in reality the parsimony assumption may or may not hold, the question of estimating the true evolutionary distance (i.e., the actual number of genome rearrangements between the genomes of two species) arises. While several methods for solving this problem have been developed, all of them have their own disadvantages. In the current paper we consider a very general model and provide a flexible estimator as well as the limits of applicability for the most popular estimation methods, such as the maximum parsimony method.

return to ‘Accepted Papers’


Yue Zhang, Chunfang Zheng and David Sankoff. Speciation and rate variation in a birth-and-death account of WGD and fractionation; the case of Solanaceae

Abstract: We derive the mixture of distributions of sequence similarity for duplicate gene pairs generated by repeated episodes of whole gene doubling. This involves integrating sequence divergence and gene pair loss through fractionation, using a birth-and-death process and a mutational model. We account not only for the timing of these events in terms of local modes, but also the amplitude and variance of the component distributions. This model is then extended to orthologous gene pairs, applied to the evolution of the Solanaceae, focusing on the genomes of economically important crops. We assess how consistent or variable fractionation is from species to species and over time.

return to ‘Accepted Papers’

Previous Events

2017: RECOMB-CG in Barcelona, Spain

2016: RECOMB-CG in Montréal, Canada

2015: RECOMB-CG in Frankfurt, Germany

2014: RECOMB-CG in New York – Cold Spring Harbor, US

2013: RECOMB-CG in Lyon – Villeurbanne, France

2012: RECOMB-CG in Niterói, Brazil

2011: RECOMB-CG in Galway, Ireland

2010: RECOMB-CG in Ottawa, Canada

2009: RECOMB-CG in Budapest, Hungary

2008: RECOMB-CG in Paris, France

2007: RECOMB-CG in San Diego, USA

2006: RECOMB-CG in Montréal, Canada

2005: RECOMB-CG in Dublin, Ireland

2004: RECOMB-CG in Bertinoro, Italy

2003: RECOMB-CG in Minneapolis, USA