Pages: |
Sequence information input filesSLiMFinder currently takes protein sequences in two formats FASTA and UniProt. For full functionality it is advised that user enter proteins in these formats. FASTA formats should if possible follow the the description guidelines followed by the Uniprot database. Other formats are accepted yet certain visualisations may not be produced and other anomalies may occur.NB. SLiMFinder is optimised for, and requires, at least three sequences for analysis. (Analysis for two sequences will be added at a future date.) Exaples of acceptable input formats are available here: UniProt format FASTA format Fasta format detailsSLiMFinder is quite flexible about the precise details of the fasta format file used for sequence input. However, to get maximum utility and allow differentiation of sequences and source databases, it is recommended to use downloads from UniProt.The basic requirement for FASTA sequences is that descriptions should be on one line that starts ">" and is followed by one or more lines containing the actual sequence. The first word in each description should be unique. For example. Most databases and homology search results etc. can be downloaded in FASTA format. UniProt FormatThe UNIPROT format is a highly complex format described in detail at:http://ca.expasy.org/sprot/userman.html The format contains one or more protein entries separated by a "//" end of entry delimiter. Each entry is created from several lines each structured in a defined way but always beginning with a two character line code. Currently the SLiMFinder only uses the "Identification", "Sequence data" and the "Feature Table data" fields from this format. "Identification" obviously contains the protein name which is then used to identify the protein throughout the process. "Sequence data" contains the amino acid sequence of the protein which is used for various parts of the analysis. The "Feature Table data" is used in an integral way so will now be discussed in detail in the masking section. Feature Table field The Feature Table field allows the format to state the position and give information about the whereabouts of areas of interest in the proteins. There are a large number of defined keywords to describe these features and the features used for masking by SLiMFinder are given in the table below. Code Description
For example: Any one of these key names can be specified as a region to be masked by the SLiMFinder program. For example all features with the key names "TRANSMEM" can be removed from the dataset before the before the sequences are searched for motifs using the masking feature of SLiMFinder Motif Options In this dialogue it possible to change options relating to the search for motifs, for example to adjust the statistical weights used to discover motifs. It also possible to adjust the BLAST parameters, and filtering for terminal motifs and alpha helices. Output Options In this dialogue it is possible to adjust the output options. It is possible to set the output to contain particular regular expressions. It is possible to set the thresholds for the various filtering parameters. It is possible to set the number of motifs displpayed, the threshold for displaying of motifs, and set the minimum information content of the motifs. Finally it is possible set the Conservation Masking parameters. Gopher can be used to generate orthologues, the consmask option is a flag for the use of relative local conservation. These require the user to select a Search database from which to filter from. Job Submission Once interactive masking is complete, the Submit button will submit the dataset to the SLiMFinder queue. The status of the queue is available on the front page of the site Bioware. |