There are a lot of genome information for you to browse, and several tools to search genes, expression etc. More details and introduction of this site are available at "About CGRD" page. Meanwhile, the Tutorials of how to use it are on "Tutorials" page. All pages were tested on Chrome browser(Version 41.0), Firefox browser(Version 37) and Safari browser(Version 7.1). The CGRD is not supported well in low version IE browser. You can try above browsers when CGRD does not work well in your own browser.

About CGRD

The CGRD(Cotton Genome Resource Database) site is aimming to collect all the available genome data of Gossypium spp. like gene, Transposable Elements, miRNAs, lncRNAs etc. and to provide some basic functions like gene search, sequence alignment etc. Sequence query tools like BLAST and BLAT are available at CGRD. Futhermore, some available RNA-seq experiments in the public databse like NCBI were download and mapped to cotton genomes to generate gene/isoform expression, it is helpful for common biologist to find the expression pattern when the expression data is showed as lines or heatmap. Meanwhile, CGRD also provides some methods to take insight into the relation among different cotton genomes. OrthoMCL program was used to generate the similarity protein clusters, the cluster can be searched by one member or homologous Arabidopsis thaliana gene. The synteny relation between all published cotton genomes is also provided at both gene level(with MCScanX program) and genome level(with LAST program). According to the KEGG Orthology annotation of genes in cotton genomes, the CGRD maps the genes to the elements in KEGG Pathway graph by image and KGML(KEGG Markup Language) file. It is convenient to view the related genes in one KEGG Pathway map. In CGRD, we provide several ways to access the genome data, batch sequence retrieval, genome browser(GBrowse and JBrowse) and the bulk download with FTP server. The CGRD also lists the full tutorials and documents for users to know more details about what CGRD is and how to use it. The FAQs (Frequently Asked Questions) page shows some questions about the usage of CGRD. For developers, CGRD provides some basic and useful APIs(Application programming interfaces) to access all data in CGRD.


Data Collection & Process

All the annotation data and sequences data of cotton genomes were obtained from Cotton Genome Project-Institute of Cotton Research of CAAS, Phytozome site at JGI, unpublished data at GCGI, and CottonGen.org of ICGI. The RNA-seq experiment data was downloaded from the NCBI SRA database. The KEGG Pathway maps and KGML files were downloaded from the KEGG website. All the sequences of gene locus, transcripts, CDS, proteins, TEs, miRNAs, etc. were extracted under the GFF3-formatted annotation file and genome reference by Perl script. Gene Family was created by the OrthoMCL pipeline with all cotton genome proteins and Arabidopsis thaliana proteins. Tophat2 and Cufflinks were used to map RNA-seq reads to cotton genomes and estimate expression level. All the sequences were formatted by formatdb command in BLAST-2.2.26, while the BLAT uses the raw fasta format sequence. All the primary proteins of cotton genomes(Gossypium arboreum BGI-CGP 2014, Gossypium raimondii JGI 2012 and Gossypium barbadense GCGI 2015) were put together and run blastp with parameters e-value cutoff 1e-05, then processed with MCScanX program. All the pesudo-chromosome sequences of Gossypium arboreum BGI-CGP 2014 and Gossypium raimondii JGI 2012 were splitted as single fasta file one by one, then take the pesudo chromosome pair(for example, GaChr1-GaChr1 or GaChr1-GrChr01) as the input of LAST program. So there were 351 chromosome pairs to run, while the redundant synteny blocks were removed in the mirror alignments, which refered to LASTZ program. At last, all the synteny blocks were merged and loaded into gbrowse_syn database. In the data accession section, GBrowse and JBrowse were configured to visualize the gene models and other annotation data of cotton genomes. FTP server was configured to provide bulk download accession. The FTP server contains the annotation file(GFF3 format etc.), sequences(genome, gene locus, TEs, transcripts, proteins etc.) and index file(built by bowtie2 2.2.3 and bwa 0.7.8-r455 aligner).


Tools Summary

  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01
  • image01