Acronym: CHROMO
Collaborators: Carl Vangestel (JEMU-RBINS), Gontran Sonet (JEMU-RBINS), Frederik Hendrickx (RBINS), Steven Van Belleghem (KUL).
Year: 2021
Summary: While the first draft genome assemblies for non-model organisms started to emerge about 10-15 years ago only, substantial progress to improve their quality has been made during the last years. Sequencing of entire genomes at sufficient depth was initially readily achieved with the emergence of short-read NGS technologies but did clearly not allow to reconstruct chromosome level genomes, even with the most sophisticated and computationally intensive assembly algorithms. The most important reason is that genomes are not composed of a random collection of the four nucleotides A,C,T and G, but contain sequences that are repeated multiple to even thousands of times throughout the genome (e.g. transposable elements). These non-unique sequences, often comprising 20% – 90% of the entire genome length, do not allow to uniquely identify sequences located upstream the repeat, and therefore result in highly fragmented or inaccurate genome assemblies. To circumvent these issues, long-read sequencing techniques like PacBio and Oxford Nanopore (ONT) and methods that barcode single physical DNA molecules (10x, Illumina synthetic long-reads) have been developed. These methods clearly improved the current assemblies, but fragmented assemblies remain when the length of the repetitive regions exceeds the lengths of a read. In the current proposal, we propose the in-house application of a recently developed superior genomic technique that allows to resolve genome assemblies to almost chromosome level. The method, called Chromosome Conformation Capture and designed in different ‘flavors’ (3C, HiC, OmniC, MiniC), allows to reconstruct the three-dimensional genomic structure by means of crosslinks. More precisely, crosslinks are made between proximate chromosomal regions and the crosslinked genome is subsequently digested with a restricted enzyme. Restriction ends are subsequently ligated and sequenced. These chimeric sequences are subsequently sequenced and mapped to the fragmented genome assembly. Contigs that contain mappings from the same chimeric reads are then considered to be physically close to each other in the original genome and allow to scaffold existing contigs from draft genome assemblies up to chromosome level. Application of the technique can be outsourced but remains highly expensive. Application of this methodology at the RBINS would strongly reduce the price and, combined with the recently acquired ONT MinIon would allow to obtain genome assemblies of superior quality at a reasonable price. Generation of HiC libraries, even with commercially available kits, is relatively challenging and cumbersome. It can thus be expected that an experimental phase will be required, for which we would like to request support by JEMU. To develop the application, we propose the genome of the dwarf spider Oedothorax gibbosus as a test case. Males of this species are characterized by a unique male dimorphism. Previous research revealed that this dimorphism is controlled by a single locus with two alleles. The major difference between the alleles is an extensive insertion/deletion polymorphism of ~3Mb that contains a key regulatory gene for sexual differentiation. Unfortunately, this extensive insertion/deletion polymorphism appeared to be the most repeat-rich region of the entire genome and hampered a proper assembly and, hence, a full reconstruction of this locus. Application of Hi-C to this study organism is expected to strongly improve the assembly of this region and, thus, to gain unprecedented insight into the evolution of this locus and the evolution of profound morphological variation in general.