A large population of putative non-coding transcripts identified by RNA-seq in Hydra

  • news
A large population of putative non-coding transcripts identified by RNA-seq in Hydra

In a study recently published in BMC Genomics, Yvan Wenger and Brigitte Galliot established a simple and powerful strategy to combine Illumina and 454 reads in order to derive an extensive and accurate RNAseq transcriptome from Hydra vulgaris.

This RNA-seq transcriptome contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. The authors used the Hydra genomic sequences previously published by Chapman et al. (Nature 2010) to compare the RNA-seq and the genome-predicted transcriptomes, and found in the RNA-seq transcriptome 10'597 novel Hydra transcripts, most of them present in genomic contigs but unpredicted. Only 5% of these novel transcripts encode evolutionarily-conserved proteins, whereas the vast majority (7’103) encode short ORFs (<100 aa). At least 767 of those correspond to pseudogenes, whereas 81 match the definition of long non-coding transcripts.

This RNA-seq transcriptome also lacks 11'270 genome-predicted transcripts that correspond either to expressed genes undetected in the RNAseq conditions of this study, or to silent genes that might reflect recent evolutionary events, i.e. cnidarian genes still included in the Hydra’s genome but no longer expressed.

Hence the comparative analysis of RNA-seq and genome-predicted transcriptomes leads to the identification of large populations of novel as well as missing transcripts, whose analysis provides testable predictions in terms of gene regulation and gene evolution.