There are a lot of genome information for you to browse, and several tools to search genes, expression etc. More details and introduction of this site are available at "About CGRD" page. Meanwhile, the Tutorials of how to use it are on "Tutorials" page. All pages were tested on Chrome browser(Version 41.0), Firefox browser(Version 37) and Safari browser(Version 7.1). The CGRD is not supported well in low version IE browser. You can try above browsers when CGRD does not work well in your own browser.

Along with the continuously reduce the cost of sequencing and more stable bioinformatics analysis methods, there were more plant genome were published to public. Focus on the Gossypium spp., there were two Gossypium raimondii(D5) genomes were published on Nature Genetics and Nature at 2012, leaded by BGI-Institute of Cotton Research of CAAS and Paterson's Lab respectively. At 2014, Institute of Cotton Research of CAAS published the Gossypium arboreum(A2) on Nature Genetics. At April 20 2015, two genomes of Gossypium hirsutum TM-1 were both published on Nature Biotechnology, leaded by Institute of Cotton Research of CAAS and Cotton Research Institute of Nanjing Agricultural University. As you see, more and more Gossypium spp. were published, which means that the genome research of cotton was developed recent years and it has a profound impact on the cotton biology research. But on the other hand, more repetitive work were done in the cotton genome sequencing, it is worth us to think.

G.arboreum A2(BGI-CGP, 2014)G.raimondii D5(BGI-CGP, 2012)G.raimondii D5(JGI, 2012)G.barbadense AD(GCGI, 2015)
Genome Size(Mb)1,7247757382,574
Protein-coding Gene Count40,13440,97637,50580,876
Protein-coding Transcript Count40,13440,97677,267109,918
Total Gene Length(Mb)96.9101.9121.6263.0
Average Gene Length(bp)2,4142,4863,2433,255
TF Count2,555-2,6324,767
TE Count-277,134/753,866/178,739/270,927 *864,7063,316,389
miRNA Count348---
lncRNA Count---56,392

Notes: The symbol "*" means that the counts of Transposable Elements(TE) in G.raimondii BGI-CGP is refered to the different analysis methods (Protein Mask, De novo prediction, RepeatMasker with Repbase, Tandem Repeat). The symbol "-" means that the related data is not available currently or is not provided in the corresponding publication.

In the gene family section, there are 25,311 gene families within three genomes(G.arboreum BGI-CGP, G.raimondii JGI and G.barbadense GCGI). Those families cover 128,598 cotton genes in total, approximate 5 genes per family, and 33,442 genes in G.arboreum BGI-CGP genome 34,259 genes in G.raimondii JGI genome, 62,698 genes in G.barbadense GCGI genome, which is approximate equal to 1:1:2. The distribution of member count in gene families group can be downloaded at here. The ratios of gene number in one family of three genomes are displayed as follow. As you see, the main peaks of Ga ratio and Gr ratio are 0.25, and there are other three small peaks, while the main peak of Gb is 0.5.

Gene Family Ratio Distribution

In the gene collinearity section, thera are 8,789 collinearity blocks were identified, average 12 pairs per block. The distribution of pairs is showed as follow(This picture just limit the x axis in [0, 100], only 28 blocks have memebers bigger than 100 pairs in it). The detail data source is available at here. The synteny blcoks were gathered by the corresponding chromosome regions according to gene coordinates in collinearity blocks.

Gene Collinearity Pairs Distribution

In the genome pairwise alignment section, we used the LAST program to generate the similarity segments within the 26 pesudo chromosomes (G.raimondii JGI 2012 and G.arboreum BGI-CGP 2014). There are 500,431 alignments at the level that segement size is bigger than 10k bp, 500,431 alignments at the level that segement size is bigger than 1k bp. Considering of the performance of Gbrowse_syn, we just keep the segemnets at 10k bp level and 1k bp level into the database.

Species Gossypium arboreum
Cultivar Shixiya1, SXY1
Description

A total of 41,330 protein-coding genes were identified in the G. arboreum genome, with an average transcript size of 2,533 bp (as determined by GLEAN) and a mean of 4.6 exons per gene. The genome encoded 431 microRNAs (miRNAs), 10,464 rRNAs, 2,289 tRNAs and 7,619 small nuclear RNAs (snRNAs). Among the annotated genes, 85.64% encoded proteins that showed homology to proteins in the TrEMBL database, and 68.71% were identified in InterPro. Over 96% of predicted coding sequences were supported by transcriptome sequencing data, which indicated high accuracy of G. arboreum gene predictions from the genome sequence. Orthologous clustering of the G. arboreum proteome with 3 closely related plant genomes identified 11,699 gene families in common, with 739 gene families that were present specifically in G. arboreum.

Publications Li et. al., Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet, 2014 May 18;46(6):567-572. >>Article
Species Gossypium raimondii
Cultivar CMD 10
Description

The genome contains 40,976 protein-coding genes, with 92.2% of these further confirmed by transcriptome data. Evidence of the hexaploidization event shared by the eudicots as well as of a cotton-specific whole-genome duplication approximately 13-20 million years ago was observed. We identified 2,355 syntenic blocks in the G. raimondii genome, and we found that approximately 40% of the paralogous genes were present in more than 1 block, which suggests that this genome has undergone substantial chromosome rearrangement during its evolution.

Publications Wang K et al., The draft genome of a diploid cotton Gossypium raimondii., Nat Genet, 2012 Aug 26;44(10):1098-1103. >>Article
Species Gossypium raimondii
Cultivar Ulbr
Description

New insight into Gossypium biology is offered by a genome sequence of G. raimondii Ulbr. (chromosome number, 13) with ~8x longer scaffold N50( 18.8 versus 2.3 megabases (Mb)) compared with a draft5, and oriented to 98.3% (versus 52.4%5) of the genome. Across 13 pseudomolecules totalling 737.8Mb, ~350Mb (47%) of euchromatin span a gene-rich 2,059 centimorgan (cM), and ~390Mb (53%) of heterochromatin span a repeat-rich 186cM. Despite having the least-repetitive DNA of the eight Gossypium genome types, G. raimondii is 61% transposable-element-derived. Long-terminal-repeat retrotransposons (LTRs) account for 53% of G. raimondii, but only 3% of LTR base pairs derive from 2,345 full-length elements. The 37,505 genes and 77,267 protein-coding transcripts annotated comprise 44.9Mb (6%) of the genome, largely in distal chromosomal regions.

Publications Paterson AH et al., Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres., Nature, 2012 Dec 20;492(7429):423-427. >>Article
Species Gossypium barbadense
Cultivar 3-79
Description

Cotton fiber is the world's leading natural fiber used in the manufacture of textiles. Gossypium is also the model plant in the study of polyploidization, evolution, cell elongation, cell wall development, and cellulose biosynthesis. G. barbadense L. is an ideal candidate for providing new genetic variations useful to improve fiber quality for its superior properties.

Publications submitted.

There is no more information about Gossypium barbadense genome assembly.

  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01