My PhD thesis

G-Banding. Different Colors represent different G+C content

(to be updated..)

Nucleotide content is one of the most obvious and most easily quantifiable features of a genome. Consequently, all reports of new genome sequences include a report on the nucleotide content, i.e., the frequencies of each of the four nucleotide bases in the genome sequence. Despite the ease with which nucleotide content can be measured, however, it has proved to be
much more complicated to explain the evolution of the genomic nucleotide content.

A number of different explanations have been proposed. These include the effects of biased mutational patterns (i.e. neutralist models) such as biased DNA repair during recombination, as well as the effect of natural selection on shaping the genomic nucleotide content (i.e. selectionist models). A well known example for the latter group of theories is the effect of temperature in selecting for higher GC content to enhance the thermostability of the DNA. However, many of these theories do not address the problem of variations in nucleotide content between different regions of a single genome, i.e., intragenomic compositional heterogeneity.

In this study, I investigate the evolutionary behaviour of the genomic nucleotide content in a group of model organisms and show that the current state of the nucleotide content of a genome by itself cannot always explain the entire evolutionary history of that genome and, in fact, the sequence of events happening to a genome regarding its nucleotide content can be much more complicated than how it looks. For example I show that having an unbiased content doesn’t necessary mean a stationary nucleotide substitution model in an organism. In this study I show that the bias in nucleotide content of a genome can change its direction several times and this going back and forth can actually happen very fast; fast enough to trace the results of this ebb and flow within the same genus. I present an example of this evolutionary ebb and flow of genomic nucleotide content within genus Plasmodium. I also discuss that these exceptional behaviours of the nucleotide contents of the genomes can, indeed, affect other features of living organisms such as the substitution rates between different nucleotides as well as their protein content, and consequently they can interfere with phylogenetic studies.

In order to explain the heterogeneities within a single genome, I investigate the relationship between gene length and the degree of nucleotide bias, and find that these two parameters show a significant negative correlation. This lead me to propose a mechanism that can explain heterogeneity in the nucleotide content of the genome, without having to invoke variations in mutational patterns between different parts of the same genome. My proposed model resembles Charlesworth’s “Background Selection” model. My findings shed light on the importance of the evolutionary behaviour of the genomic nucleotide content to be considered in studying different features of an organism as well as studying the evolutionary relationship between the organism of our interest and other related species.

I finished my PhD thesis under the supervision of accomplished scientist, Professor Donal A. Hickey.