The Evolution of Centromeric DNA Sequences

Abstract

For most eukaryotic species, the centromere is comprised of millions of base pairs of tandemly repeated deoxyribonucleic acid (DNA) sequences. Centromere function is broadly conserved across eukaryotic phyla, yet centromere DNA presents several unique conundrums for biologists, further complicated by the challenges in studying highly repeated regions of complex genomes. Contrary to the expectation that centromeric sequences would be constrained to maintain centromere function across species, these sequences are among the most rapidly evolving sequences in any given genome. This discordance between functional constraint and sequence divergence, termed the ‘centromere paradox’, appears to defy basic laws of Mendelian inheritance. Multiple genetic mechanisms have been proposed to explain centromeric DNA complexity and rapid evolutionary divergence, taking into consideration the unique chromosome architecture and dynamics of the centromere during both mitosis and meiosis. Stochastic processes affecting sequence evolution and the selective constraint necessary for centromere protein recognition are balanced in an ongoing conflict that ultimately manifests as rapid centromere DNA evolution.