PlantGDB Download Portal
This page describes the sequence data that are used for PUT assembly (detailed information on PUT assembly procedure here).
Raw sequences (Click here for detailed information on data source):
- Species.mRNA.EST.fasta: All the mRNA sequences downloaded from GenBank EST division.
- Species.mRNA.PLN.fasta: All the mRNA sequences downloaded from GenBank PLN division.
- Species.mRNA.HTC.fasta: All the mRNA sequences downloaded from GenBank HTC division.
The above sets of sequences are combined for the subsequent assembly step.
Contamination and repetitive elements (Click here for detailed information on identifying contamination and repetitive elements):
- Species.mRNA.VectorContaminants.fasta: mRNA sequences contaminated by cloning vector sequences. This set of sequence is excluded from the subsequent assembly step.
- Species.mRNA.BacterialContaminants.fasta: mRNA sequences contaminated by E. coli sequences. This set of sequence is excluded from the subsequent assembly step.
- Species.mRNA.Repetitive.fasta: mRNA sequences derived from known repetitive elements. This set of sequence is excluded from the subsequent assembly step.
- Species.mRNA.ContaminantsRepeatsFree.fasta: mRNA sequences that do not contain any E. coli, vector, or known repetitive elements. This set of sequence is used for the subsequent assembly step.
PolyA tail (Click here for detailed information on identifying polyA tails):
- Species.mRNA.MaskPolyA.fasta: This file is identical to the above "Species.mRNA.ContaminantsRepeatsFree.fasta" file, except that any potential polyA tails are masked by replacing the As by Xs.
- Species.mRNA.Short.fasta: mRNA sequences whose lengths are shorter than 50 bp after trimming off polyA tails. This set of sequence is excluded from the subsequent assembly step.
- Species.mRNA.TrimPolyA.fasta: This file contains the remaining mRNA sequences with masked polyA tails trimmed off. This set of sequence is used for the subsequent assembly step.
Duplicates (Click here for detailed information on removal of duplicates)
- Species.mRNA.Subsequence.fasta: mRNA sequences that are identified as near-identical sub-strings of other longer mRNA sequences. This set of sequences is excluded from the subsequent assembly step.
- Species.mRNA.PUTmember.fasta: mRNA sequences that are assembled into the final PUTs.
Files for Download
- File or Directory Name / Date / File Size
- Silene_latifolia.mRNA.BacterialContaminants.fasta.bz2 / Feb-2-2012 / 14 bytes
- Silene_latifolia.mRNA.EST.fasta.bz2 / Feb-2-2012 / 698.49 KB
- Silene_latifolia.mRNA.NonRepetitiveSequence.fasta.bz2 / Feb-2-2012 / 9.99 MB
- Silene_latifolia.mRNA.PLN.fasta.bz2 / Feb-2-2012 / 38.33 KB
- Silene_latifolia.mRNA.PUTmemberSequence.fasta.bz2 / Feb-2-2012 / 9.67 MB
- Silene_latifolia.mRNA.RepetitiveSequence.fasta.bz2 / Feb-2-2012 / 5.08 KB
- Silene_latifolia.mRNA.Subsequence.fasta.bz2 / Feb-2-2012 / 496.83 KB
- Silene_latifolia.mRNA.TSA.fasta.bz2 / Feb-2-2012 / 10.3 MB
- Silene_latifolia.mRNA.VectorContaminants.fasta.bz2 / Feb-2-2012 / 14 bytes