SLiMSearch Help Pages

Pages:

Help:

Help Overview

Main Help
  • QuickStart. Just the basics to get going with the server.
  • Sequence Input. Details of the necessary sequence input requirements of the program.
  • Motif Input. Details of the necessary motif input requirements of the program.
  • Masking. Step by step information on how to mask your input dataset.
  • SLiMBuild. Additional SLiMBuild motif space options.
  • SLiMChance. Additional SLiMChance significance/filtering options.
  • Output. Quick overview of the output of the SLiMSearch server. (See example for details.)
  • References. A summary of the papers underpinning the server.
  • FAQ. Some Frequently Asked Questions and their answers.
Walkthrough
Walkthrough of example SLiMSearch analysis with screenshots.

Manual - (PDF)
Manual for the standalone version of SLiMSearch in pdf format. May contain details which may not be obvious from the server implentation of the program.

User-group
SLiMFinder user-group for additional community support.

Example Input
Example UniProt input file for proteins containing the Dynein light chain interaction motif.

Example Output
Fully functional results page corresponding to example input run with default options

References and Citations
Papers to cite when using results in publications.

Standard analysis

  1. Input Uniprot accession numbers and Get sequences to retrieve entries.

  2. Check entries have been retrieved correctly.

  3. Paste motifs to search into box.

  4. Select/check masking options.

  5. Click Submit job to submit your job to the Bioware queue.

  6. Monitor progress or bookmark the page and revisit later.

  7. Explore the results. Write down the job ID to retrieve results again later.

  8. Publish work. Cite SLiMSearch. Live long and prosper.

Sequence information input files

NB. For custom (UniProt) datasets, the SLiMSearch server is limited to 100 sequences. Evolutionary filtering is not available for genomic datasets.

Examples of acceptable input formats are available here:
UniProt format
FASTA format


Two methods of sequence selection entry are possible:

UniProt Ids
The right hand panel of the inputs section allows a list of UniProt Ids to be entered. These Ids will be fetched and used as the input data for SLiMSearch. For full functionality (including RLC masking) it is advised that user enter proteins data in this way.

or

Human proteome datasets
The SLiMSearch server also permits searching of the UniProt Human proteome. This can also be limited to specific subsets (nuclear, cytoplasmic, transmembrane) based on keywords in UniProt.

Motif input

Once a dataset has been selected, the user must input a set of motifs to search. The SLiMSearch server takes a list of motifs, typed or pasted directly into the text box. Motifs themselves are constructed from a number of regular expression elements, which are mostly standard but with a couple of additional elements to represent "3of5" motifs:

ElementDescription
ASingle fixed amino acid.
[AB]Ambiguity, A or B. Any number of options may be given, e.g. [ABC] = A or B or C.
<R:m:n>At least m of a stretch of n residues must match R, where R is one of the above regular expression elements (single or ambiguity).
<R:m:n:B>Exactly m of a stretch of n residues must match R and the rest must match B, where R and B are each one of the above regular expression elements (single or ambiguity). E.g. <F:1:2:[DE]> will match [DE]F, or F[DE].
[^A]Not A.
X or .Wildcard positions (any amino acid).
.{m,n}At least m and up to n wildcards.
R{n}n repetitions of R, where R is any of the above regular expression elements.
^Beginning of sequence
$End of sequence
(R|S)Match R or S, which are both themselves recognizable regular expressions. These motifs are not currently supported by the SLiMChance statistics and, as such, any motifs in this format with be first split into variants, e.g. (R|S)PP would be split into RPP and SPP and each searched separately.

SLiMSearch accepts the same input formats as CompariMotif, including a plain list of regular expressions and output from SLiMDisc or SLiMFinder. Because the focus of SLiMSearch is short linear motifs, the maximum number of consecutive wildcards allowed by the server is nine. Motifs must have at least two defined (i.e. non-wildcard) positions.

Masking Options

Disorder masking

SLiMs tend to occur in disordered regions of proteins. The SLiMSearch server uses IUPRED (Dosztanyi et al. 2005) to predict regions of disorder with a relaxed score cut-off of 0.2. Residues predicted to be "intrinsically ordered" are masked out. This can be toggled on/off.

Conservation masking

By default conservation masking is used, metazoan orthologues are retrieved and masking of underconserved residues is carried out. For more details see:

Davey NE, Shields DC & Edwards RJ (2009):
Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 25(4): 443-50.
[Bioinformatics.] [PubMed]



Feature masking

Defined UniProt features can be maskeds, areas such as transmembrane regions, protein domains and inaccessible residues can be masked as they are areas which have a low likelihood of containing motifs.

The same mechanism used for masking these region also allow the user identify specific regions of the proteins in which to confine the search , for example the user may wish to look at motifs which are occurring in the cytoplasmic regions of a set of proteins or may have prior knowledge of a region possibly containing a functional motif.

There are two types of feature masking in SLiMSearch: Inclusive masking and Exclusive masking:
  • Inclusive masking will remove all of the sequence except the segment specified.
  • Exclusive masking will remove the segment specified.

Inclusive masking is preformed first. This means that exclusively masked regions appearing inside an inclusively masked region will be removed.

Masking is based on UniProt features:
TOPO_DOMTopological domain. These are converted into cytoplasmic and extracellular for masking.
TRANSMEMExtent of a transmembrane region.
DOMAINExtent of a domain, which is defined as a specific combination of secondary structures organized into a characteristic three-dimensional structure or fold.

Output Description

An example results output for a Dynein Light Chain binding protein dataset is available here and in the screenshot walkthrough.

References and Citations

When using SLiMSearch results in a publication, please cite this webserver. (The publication is currently under review.)

In addition, SLiMSearch uses the following underlying software:

  • Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ (1990): Basic local alignment search tool.
    J. Mol. Biol. 215:403-410.

If using IUPRED disorder prediction to mask input sequences and/or filter results, please cite:

  • Dosztanyi Z, Csizmok V, Tompa P & Simon I (2005): IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21:3433-3434.

If using Relative Local Conservation masking, in addition to the GOPHER citations (below) please cite:

  • Davey NE, Shields DC & Edwards RJ (2009): Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 25(4): 443-50. [Bioinformatics.] [PubMed]

If using alignments of homologous proteins, generated by GOPHER, please cite:

  • Davey NE*, Edwards RJ*, Shields DC (2007): The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res. 35(Web Server issue):W455-9. [Nucleic Acids Res.] [PubMed]
    *Joint first authors
  • Edgar RC (2004): MUSCLE: a multiple sequence alignment method with reduced time and space complexity.
    BMC Bioinformatics 5:113.

If using the SLiMChance evolutionary filtering, please cite:

Q. Why can I only use conservation masking for UniProt entries downloaded through your site?
A. To save time, the server re-uses GOPHER alignments that have been made before. These are recognised by the accession number of the proteins. It is therefore vital to ensure that the same accession number always corresponds to the same protein sequence. Conservation masking of custom sequences can be performed using the downloadable version.