Jin-Wu Nam1,2,3 and David Bartel1,2,3,*
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
2Howard Hughes Medical Institute
3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
* Corresponding author : David Bartel
Thousands of long non-coding RNAs (lncRNAs) have been found in vertebrate animals, a few of which have known biological roles. To better understand the genomics and features of lncRNAs in invertebrates, we used available RNA-seq, poly(A)-site, and ribosome-mapping data to identify lncRNAs of C. elegans. We found hundreds of long intervening ncRNAs (lincRNAs), which had single- or multi-exonic structures that did not overlap protein-coding transcripts, and about seventy antisense lncRNAs (ancRNAs), which were complementary to protein-coding transcripts. Compared to protein-coding genes, the lncRNA genes tended to be expressed in stage-dependent manner. Approximately 30% of the newly identified lincRNAs showed little signal for sequence conservation and mapped antisense to clusters of 22G or 26G endogenous siRNAs, as would be expected if they serve as templates and targets for these siRNAs. The other 70% tended to be more conserved and included lincRNAs with intriguing expression and sequence features associating them with processes such as dauer formation, male identity, sperm formation, and interaction with sperm-specific mRNAs. Our study provides a glimpse into the lncRNA content of a non-vertebrate animal and a resource for future studies of lncRNA function.