Complete sequencing and characterization of 21,243 full-length human cDNAs
- PMID: 14702039
- DOI: 10.1038/ng1285
Complete sequencing and characterization of 21,243 full-length human cDNAs
Abstract
As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at approximately 58% compared with a peak at approximately 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at approximately 42%, relatively low compared with that of protein-coding cDNAs.
Similar articles
-
Characterization of 954 bovine full-CDS cDNA sequences.BMC Genomics. 2005 Nov 23;6:166. doi: 10.1186/1471-2164-6-166. BMC Genomics. 2005. PMID: 16305752 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis).BMC Genomics. 2008 Oct 14;9:484. doi: 10.1186/1471-2164-9-484. BMC Genomics. 2008. PMID: 18854048 Free PMC article.
-
[Transcriptome and non-coding RNAs: so many mRNA-like non-coding RNAs are really functional?].Tanpakushitsu Kakusan Koso. 2004 Dec;49(16):2521-8. Tanpakushitsu Kakusan Koso. 2004. PMID: 15609715 Review. Japanese. No abstract available.
-
Genome annotation past, present, and future: how to define an ORF at each locus.Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105. Genome Res. 2005. PMID: 16339376 Review.
Cited by
-
Ovarian transcriptomic study reveals the differential regulation of miRNAs and lncRNAs related to fecundity in different sheep.Sci Rep. 2016 Oct 12;6:35299. doi: 10.1038/srep35299. Sci Rep. 2016. PMID: 27731399 Free PMC article.
-
The DMD locus harbours multiple long non-coding RNAs which orchestrate and control transcription of muscle dystrophin mRNA isoforms.PLoS One. 2012;7(9):e45328. doi: 10.1371/journal.pone.0045328. Epub 2012 Sep 21. PLoS One. 2012. PMID: 23028937 Free PMC article.
-
Reductive methylation to improve crystallization of the putative oxidoreductase Rv0765c from Mycobacterium tuberculosis.Acta Crystallogr Sect F Struct Biol Cryst Commun. 2007 Jun 1;63(Pt 6):507-11. doi: 10.1107/S1744309107022506. Epub 2007 May 12. Acta Crystallogr Sect F Struct Biol Cryst Commun. 2007. PMID: 17554174 Free PMC article.
-
Prognostic significance of ANLN in lung adenocarcinoma.Oncol Lett. 2018 Aug;16(2):1835-1840. doi: 10.3892/ol.2018.8858. Epub 2018 May 31. Oncol Lett. 2018. PMID: 30008873 Free PMC article.
-
An omnidirectional visualization model of personalized gene regulatory networks.NPJ Syst Biol Appl. 2019 Oct 11;5:38. doi: 10.1038/s41540-019-0116-1. eCollection 2019. NPJ Syst Biol Appl. 2019. PMID: 31632690 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous