PlantGDB

Home

About Us

Frequently Asked Questions

Questions are organized by category. Click a category to expand it, then select a question.
To view all questions in all categories, click [Expand].
If you don't find the answer to your question, please use our feedback form (top).

[Collapse] [Expand]

1. General Questions

What is PlantGDB?

PlantGDB provides sequence data for >70,000 plant species, custom EST assemblies (PUT) for over 150 species, web tools and plant genome browsers, as well as an outreach portal for plant genomics. For more information on PlantGDB, visit our About page or take a brief tour on our Help Home Page.

How can I contact PlantGDB?

Use the 'feedback link' at the top right corner of any PlantGDB web page. We will contact you within 24 hours. You are also welcome to contact any of the PlantGDB contacts listed under About.

I'm having problems viewing the PlantGDB site with my web browser.

PlantGDB has been optimized for use with Firefox 3, Safari, or Internet Explorer 7 / 8. Many advanced features require that Javascript be enabled. If you encounter problems viewing any page at PlantGDB.org, please contact us using our feedback page. Please include a description of what didn't work as expected, and what web browser/operating system you were using. We will do our best to address the problem.

How often is PlantGDB's data updated?

PlantGDB's Public Plant Sequence data is updated every four months, coinciding with every other GenBank Version Release (December, April, and August). Transcript assemblies (PUT) are updated at this time and are typically made available 2-4 weeks after version update.
Genome data at PlantGDB are updated periodically when a new genome assembly becomes available, or when transcript data are significantly increased.
For more information , see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.

How are PlantGDB's data stored and retrieved?

Sequence data and metadata data are stored on our servers in three primary forms: 1) In MySQL databases which store metadata and links to other data types; 2) In multiFASTA-formatted sequence files, for sequence retrieval using FASTACMD; 3) In indices for BLAST and GeneSeqer analysis.
For more information about how to access and download PlantGDB sequence data, see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.

2. Genomes / Genome Browsers

What is the source of PlantGDB's genome data?

For an overview of sources and methods, each GDB has a Data and Methods page, accessed from the left menubar or via the Data and Methods Portal.

We obtain both genome sequence and gene model (annotation) data from original source repositories which may differ for each genome. The data source is identified on the genome browser home page under "Genome/Gene Models". In all cases, we provide both links to the original data source and local copies of all data files used in compiling our genome databases (see next f.a.q. item).
Other data displayed for each genome consists of splice-aligned transcript (EST, cDNA, TSA), splice-aligned related-species protein data, and microarray probe sequences (which are first matched to PUT assemblies and then positioned on the genome). Spliced alignments and probe positioning are carried out at PlantGDB using primary data downloaded from GenBank (transcript), genome repositories (proteins), and PLEXdb (probes).
Specialized datasets (e.g. Genome Survey Sequence assemblies, masking datasets) are obtained from original source databases as specified in the track header. More information can usually be gleaned by clicking on a track glyph and viewing the Description and/or Notes for that sequence.

How does PlantGDB's genome browser differ from GBrowse or Ensembl?

PlantGDB's genome focus is on accurate spliced alignments of transcript to genomes, a critical component of accurate genome annotation. The xGDB genome browser platform used at PlantGDB has unique features that make it useful for viewing and annotating genomes:

All splicing evidence can be viewed online and reproduced using web tools provided at PlantGDB.
A community annotation tool (yrGATE) and gene model incongruence-detection system (GAEVAL) are built in, to facilitate genome annotation.
Each xGDB has powerful BLAST tools and search tools to retrieve upstream sequence for motif analysis.
xGDB supports the DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities.
The complete xGDB code is available as open source software and can be custome-installed on a Linux server.

For more information, see the Genome Browser Help Page.

My web browser timed out when trying to view a genome region.

Likely reasons include: too large a region chosen; or region is very heavily annotated with one track type (typically, EST). In either case, the load on the graphics engine causes a long delay in track display times. Solutions:

Re-enter a set of coordinates that span a narrower region and try again.
If problem remains, try unselecting the EST track type using the track control and re-submit the region request.
If you are unable to solve the problem, please contact us using the Feedback form, describing the region you were attempting to view.

How can I download all the data for a genome?

Each genome has a "Downloads" page, accessible from the left panel on the GDB home page. Or, access it directly using this url: http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation. On this page you will find:

FASTA files containing all the genomic and aligned data from the current GDB version
A GFF2- or GFF3-formatted file containg all annotations and their chromosomal location and features.
The complete MySQL database, in a flat file format that can be used to recreat the database locally.
For some genomes, a 0README file is included to describe special data

I've downloaded all the .pep files for my genome. I want to know where each gene is located on the chromosomes.

The genome coordinate information you want is contained in the gff2 or gff3--formatted file that accompanies each genome annotation (EXAMPLE: Gmax_109_gene.gff3.gz). These files are available in the same location as the other download data: either from each genome page (GDB left menu → Search/Download → Download - Data) (e.g.http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Gm), or from the PlantGDB ftp site (Top Menu → Sequence → FT Server) (ftp://ftp.plantgdb.org/download/Genomes/).

Information on the original GFF (generic feature format) can be found here: http://www.sanger.ac.uk/resources/software/gff/spec.html, and the GFF3 description is described here: http://www.sequenceontology.org/gff3.shtml.

I want to locate a specific coding sequence in the Medicago truncatula genome with its upstream and downstream noncoding surrrounding sequences. I have only a fasta sequence of complete cDNA of my gene of interest.

You could accomplish by using the blast tool and then the "dowload region" tool.

paste your sequence at http://www.plantgdb.org/MtGDB/cgi-bin/blastGDB.pl (accessed via MtGDB left menu -> Blast MtGDB) and select Mt pseudochromosomes as target dataset , blastn as search tool, hit "Run Blast"
based on the top blast result alignment, enter chromosome #, left and right genomic coordinates into the "chr - start- end" inputs at the top of the MtGDB page and click 'Genome Context'
This will display the genome context of the hit region. Use Zoom button if desired to pad the region with additional sequence left and right, or enter desired coordinates above as before.
From the Genome Context submenu click the green "Download" button to load the "Search/Download From Region" page, pre-configured with the current coordinates
Click"Display Genomic Sequence for Download"

Alternatively if you have an accession number for the cDNA, search for it using the MtGDB left menu -> Search ID/Keyword tool

If a result is returned, click "Retrieve Sequences:" to see options for retrieving up and/or downstream sequence.
If no result is returned, then the sequence is either of more recent origin than this GDB version, or else its alignment was insufficiently good to be accepted for display.

I am interested in identifying promoter regions. Can I download genomic sequence upstream of genes?

A. Yes, you can retrieve selected up/downstream sequences using the Search ID/Keyword tool:

From any GDB home page, click Search ID/Keyword on the left side menu
Enter IDs for one or more sequences (either aligned transcripts/proteins or gene models), or a keyword in quotes
Optionally, limit search to relevant data type(s) by clicking appropriate selections under Limit Search
Click Search to retrieve records. This may take up to a minute or more for large searches.
On the results page under Retrieve Sequences, select 5' region, enter desired range, and select whether you want to exclude other overlapping genes
6) Click the Sequence ID column header checkbox to select all sequences for retrieval (or click individual checkboxes to select a subset). [Note: if the retrieval set is too large the program will error out]
Click Retrieve FASTA to retrieve the desired sequences. This may take a minute or more for large datasets

B. If you need to retrieve ALL the upstream or downstream sequences from an annotated genome, you will need to download the genome data from PlantGDB and use appropriate tools on your local machine.

Below is a a step-by-step guide to the process you will need to follow (you will need access to MySQL and NCBI blastall or similar package):

Download the FASTA genome sequence and the genome database .sql from e.g. http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Zm

Create a local MySQL database from the .sql file and write a MySQL query to retrieve the upstream coordinates from each gene model. You will use the table called chr_gene_annotation, and your queries will look something like this:

							
select geneId, chr, r_pos + 1 as f_seq_start, r_pos + 1000 as f_seq_end
from chr_gene_annotation where strand="f";
select geneId, chr, l_pos - 1000 as r_seq_start, l_pos - 1 as r_seq_end 
from chr_gene_annotation where strand="r";

Format the genome FASTA using e.g. formatdb with -o T (see Note below)
Create scripts to retrieve each sequence range as a FASTA file from each genome/chromosome using blastall's fastacmd (http://www.ncbi.nlm.nih.gov/BLAST/docs/fastacmd.html) or equivalent package.
For fastacmd, the following options apply for blastall versions before 2.2.21. [Note that NCBI has recently updated blast to BLAST+ 2.2.23 (View new blast information) and the command line syntax has changed].
- use -d to specify the indexed genome data target
- use -s to specify the chromosome in a multifasta file
- use the -L option to specify the range
- use the -S option to get appropriate strand from f_seq and r_seq if that's important
- use the -o option to give the output file a name according to the geneId (or use some other naming scheme as appropriate)

Example: fastacmd -d /path/to/genome_data -s chr1 -L1000,2000 -S2 -o filename1.fasta

I would like to a global download of all the 3' UTR regions in a genome. Is this possible?

We don't make this data directly available but you can derive it easily from our database tables which are available for download, if you have access to MySQL and a scripting language.
First download the appropriate genome MySQL database from http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation, e.g. Zm for maize. Once you create the database locally you can derive the coordinate as follows:

Find the table that stores gene model information; it is named either chr_gene_annotation (for chromosome-based browsers) or gseg_gene_annotation (for BAC or scaffold-based browsers).
The relevant columns are chr (or gseg_gi), l_pos, r_pos, CDSstart, CDSstop and strand.
A query such as the following will build a tabular output featuring the 3'UTR chr/coordinates, length and direction:

	mysql>SELECT geneID, chr, IF(strand="f", CDSstop, l_pos) AS left_position,
	IF(strand="f", r_pos, CDSstop) AS right_position, 
	IF(strand="f", r_pos-CDSstop, CDSstop-l_pos) AS length, strand
	FROM chr_gene_annotation;

Once you have the coordinates you can build a script to retrieve the data from the genome sequence (which is also available from the same download page referenced above), using fastacmd or perl, python or similar scripting language.

How can I view PlantGDB's alignments in another genome browser?

DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities. Several PlantGDB genome have DAS-served data - see DAS Services for details.

For more information on DAS, see the Genome Browser Help Page.

3. PlantGDB Sequences & PUT Assemblies

What is the source of PlantGDB's sequence data?

PlantGDB downloads GenBank and UniProt sequence data approximately every four months, corresponding to every other GenBank Release. Sequence data is parsed according to a database schema, and individual sequence files are filtered to detect vector and repeat sequence. When you download FASTA-formatted sequence data from PlantGDB, you may see differences in the masking of repeat or vector regions, but the sequence is otherwise identical.

What are PUTs?

PUT = PlantGDB-assembled Unique Transcript. PlantGDB regularly assembles transcript sequences (EST and cDNA) and TSA (Transcriptome Shotgun Assemblies) for species with >10,000 sequences in GenBank, as well as by request for smaller or combined datasets. The resulting sequence assemblies (PUTs) are made available for search, download, BLAST, and spliced alignment using GeneSeqer.
PUT assemblies include both contigs (comprising multiple sequences) and singletons. They are named according to version number, genus_species, and sequence number.

For more information visit the EST Assembly Page (Home>Left Menu>EST Assembly).

What are TSAs? This is a new sequence type at GenBank.

TSAs are Transcriptome Shotgun Assemblies, and are computationally drived from a combination of ESTs and short reads submitted to the Short Read Archive. The submitter of the TSA sequences is responsible for their generation, not NCBI, and all sequences in a TSA must originate with the submitter. You can read more about the TSA submission process here: http://www.ncbi.nlm.nih.gov/genbank/TSA.html.

Where available, PlantGDB uses TSAs as part of its PUT assembly. You can read more about the PUT assembly process here: http://www.plantgdb.org/prj/ESTCluster/PUT_procedure.php

How do I download sequence from PlantGDB?

You can download sequence for any plant species by going to the Download portal (Home>Download>Sequence). Enter Genus/species and click 'Search'. (For popular species, use the shortcut "Featured Species" on the Home Page left menubar.)

To download PUT assemblies, go to the EST contig Download portal (Home>EST Assembly>Download)

To download large datasets, visit our ftp site at ftp.plantgdb.org where you can download all PUT assemblies or plant sequences using ftp.

How can I assess PUT directionality?

There are two ways you can assess PUT directionality:

A) Evaluate the PUT's tblastn orientation to top hit protein:

Download the "Similar Proteins" table from our Download Portal
Path: Home Page -> Download -> EST Assemblies -> [click a species directory] -> current version -> [genus_species.Similar.Protein.txt]
Example: http://www.plantgdb.org/download/download.php?dir=/Sequence/ESTcontig/Actinidia_chinensis/current_version)
Search the PUT ID of interest and check columns 8 and 9 (start/end of query sequence) - if start>end, then orientation is reverse w/respect to tblastn hit protein.

B) If PUT is splice-aligned to a genome in our Genomes list, view the PUT alignment details and assess its direction of transcription (from GeneSeqer analysis) versus its strand.

Open a Search ID/Keyword window in any genome browser (e.g. http://www.plantgdb.org/ZmGDB/):
Path: Home Page -> Genomes -> Search ID/Keyword (left menu) ->[Search page] paste PUT ID (e.g. PUT-1-171a-Zea_mays-10395) -> Click 'Search' -> [Result page] Click highlighted PUT ID -> [Record page] Click 'GeneSeqer Alignment' ->[output].
Click the "?" icon on the [Record page] next to the GeneSeqer link for hints on how to interpret the GeneSeqer output.

What is the current sequence count at PlantGDB?

PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).

How can I view or download the ESTs mapped to a PUT?

If you visit the Download page for any species, you can retrieve files named as:

Genus_species.PUT_member.txt
Genus_species.alignment.txt

Which both provide the mapping of the ESTs to a PUT.

Alternatively, from the "Search" page, e.g.

http://www.plantgdb.org/search/display/data.php?Seq_ID=PUT-157a-Oryza_sativa-6232

You can view or retrieve the EST components of an individual PUT

Return to top

What is the current sequence count a PlantGDB?

PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).

I was unable to find my species of interest in the Public Plant Sequences Download Portal

PlantGDB's taxonomic conventions will always reflect NCBI's current naming system since our data source is GenBank. Check the current taxonomic name for your species using GenBank's Taxonomy browser. It is possible that the genus and/or species name has changed.

Return to top

New & Noteworthy

Click below or view all news | Twitter

What comes after PlantGDB? (July 1, 2015): PlantGDB's NSF funding has ended and the website is no longer being updated. However, you can check out a related project from the Brendel Lab, xGDBvm, a virtual environment for genome annotation in the cloud. xGDBvm instances are now available to registered users of iPlant Atmosphere (3-14-2012)
New Location for PlantGDB (July 23, 2012): The Brendel Group is now located at Indiana University, and the PlangGDB server has been migrated as well, and it is now hosted at the Indiana University School of Informatics and Computing. We don't anticipate any disruption in service, but if you encounter any problems with the website we encourage you to contact us using our Feedback Form. (3-14-2012)
BrGDB - Brassica rapa chromosome-based genome browser (Mar. 16)
StGDB - Solanum tuberosum new genome browser (Mar. 16, 2012): StGDB, a new genome database for the potato (Solanum tuberosum) is now available at PlantGDB (Genomes→Other→StGDB). Based on the JGI draft genome, VcGDB includes 14,971 protein-coding loci and 15,285 protein-coding transcripts on 413 scaffolds. Other data displayed include splice-aligned cDNAs, EST and PUTs, and splice-aligned related species proteins. (3-14-2012)
BrGDB - Brassica rapa chromosome-based genome browser (Mar. 16, 2012): BrGDB, the genome browser for rapeseed (Brassica rapa) is now updated at PlantGDB (Genomes→Other→BrGDB). Based on the BRAD draft genome, BrGDB includes 41,019 protein-coding transcripts on 10 chromosomes plus unanchored scaffolds (concatenated with 200 N spacer as "chr11"). Other data displayed include splice-aligned cDNAs, EST and PUTs, and splice-aligned related species proteins. (3-16-2012)
VcGDB - Volvox carteri new genome browser (Mar. 14, 2012): VcGDB, a new genome database for the multicellular green alga Volvox (Volvox carteri) is now available at PlantGDB (Genomes→Other→VcGDB). Based on the JGI draft genome, VcGDB includes 14,971 protein-coding loci and 15,285 protein-coding transcripts on 413 scaffolds. Other data displayed include splice-aligned cDNAs, EST and PUTs, and splice-aligned related species proteins. (3-14-2012)
Medicago genome updated (Feb 27, 2012): The Medicago truncatula (barrel medic) genome browser MtGDB has been updated to the new assembly / annotation version 3.5 using data deposited at phytozome. New transcript / protein spliced alignments and gene quality estimates have been calculated as well.
Rice genome updated (Feb 27, 2012): The Oryza sativa (rice) genome browser OsGDB has been updated to the new assembly / annotation version 7 using data deposited at phytozome. New transcript / protein spliced alignments and gene quality estimates have been calculated as well.
Cassava genome updated (Feb 27, 2012): The Manihot esculenta (cassava) genome browser MeGDB has been updated to the v4 assembly / v4.1 annotation that is available at phytozome. New transcript / protein spliced alignments and gene quality estimates have been calculated as well.
Populus annotation updated (Feb 27, 2012): The Populus trichocarpa (poplar) genome browser PtGDB has been updated to annotation version 2.2 using data deposited at phytozome. Genome assembly and spliced-alignments are unchanged.
Mimulus annotation updated (Feb 27, 2012): The Mimulus guttatus genome browser MgGDB has been updated to annotation version 4.3 using data deposited at phytozome. Genome assembly and spliced-alignments are unchanged.
Chlamydomonas annotation updated (Feb 27, 2012): The Chlamydomonas reinhardtii genome browser CrGDB has been updated to annotation version 4.3 using data deposited at phytozome. Genome assembly and spliced-alignments are unchanged.
Brachopodium annotation updated (Feb 13, 2012): The Brachypodium distachyon genome browser BdGDB has been updated to annotation version Bradi1.2 using data deposited at phytozome. Genome assembly and spliced-alignments are unchanged.
PlantGDB NSF Grant (Jan 31, 2012): The PlantGDB website is currently being managed under a new NSF Genome Research Grant, IOS-1126267 (IPGA: Characterization, Modeling, Prediction, and Visualization of the Plant Transcriptome.) Read more about IPGA.
PlantGDB at Maize Genetics Conf. (Jan 31, 2012): PlantGDB will be represented at the 54th Annual Maize Genetics Conference, March 15-18, 2012, in Portland, Oregon USA. Check out poster P56 "Discovery, annotation and expression analysis of arginine/serine (SR) proteins in maize using the Plant Genome Database PlantGDB". (1-28-2012)
GenBank Release 187 (Jan 31): GenBank Release 187.0 sequence data (close date 12-15-2011) have been processed and are available for downloading at PlantGDB. Twenty-three new or updated transcript assemblies have been created. Please note that indexing the new sequence data has been postponed due to infrastructure issues. (2-27-2012)
New Add Track feature for Genome Browsers Dec 15, 2011: Users can add their own genome features (aligned transcripts, gene models, etc) using our User Add Track tool. All that's needed is a properly formatted gff3 file. This feature is currently available at ZmGDB and AtGDB/ and will be available soon for all GDB. (12-15-2011).

What's Coming?

Genome updates (1-31-2012): We are in the process of prioritizing genomes at PlantGDB for update. In addition, newly-available plant genomes are being prioritized for inclusion in PlantGDB's Genome Browser suite. Our goal is to maintain current data for at least 25 genomes.

PlantGDB

About Us

Related Links

Frequently Asked Questions

New & Noteworthy

What's Coming?

Loading Help Page...Thanks for your patience!

Loading Video...Thanks for your patience!

Loading Image...Thanks for your patience!