Pages: |
Help:Help OverviewMain Help
Walkthrough of example SLiMFinder analysis with screenshots. Manual - (PDF) Manual for the standalone version of SLiMFinder in pdf format. May contain details which may not be obvious from the server implentation of the program. User-group SLiMFinder user-group for additional community support. Example Input Example UniProt input file for proteins containing the Dynein light chain interaction motif. Example Output Fully functional results page corresponding to example input run with default options CompariMotif Help for the accessory application, CompariMotif, for comparing SLiMFinder results with known motifs. References and Citations Papers to cite when using results in publications. Standard analysis
Sequence information input filesNB. SLiMFinder is optimised for, and requires, at least three sequences for analysis. (Analysis for two sequences will be added at a future date.)Examples of acceptable input formats are available here: UniProt format FASTA format Two methods of entry are possible: Raw protein sequences SLiMFinder currently takes protein sequences in two formats FASTA and UniProt long format. FASTA formats should if possible follow the the description guidelines followed by the Uniprot database. Other formats are accepted yet certain visualisations may not be produced and other anomalies may occur. or UniProt Ids The right hand panel of the inputs section allows a list of UniProt Ids to be entered. These Ids will be fetched and used as the input data for SLiMFinder. For full functionality (including RLC masking) it is advised that user enter proteins data in this way. Fasta format detailsSLiMFinder is quite flexible about the precise details of the fasta format file used for sequence input. However, to get maximum utility and allow differentiation of sequences and source databases, it is recommended to use downloads from UniProt.The basic requirement for FASTA sequences is that descriptions should be on one line that starts ">" and is followed by one or more lines containing the actual sequence. The first word in each description should be unique. For example. Most databases and homology search results etc. can be downloaded in FASTA format. UniProt FormatThe UNIPROT format is a highly complex format described in detail at:http://ca.expasy.org/sprot/userman.html The format contains one or more protein entries separated by a "//" end of entry delimiter. Each entry is created from several lines each structured in a defined way but always beginning with a two character line code. Currently the SLiMFinder only uses the "Identification", "Sequence data" and the "Feature Table data" fields from this format. "Identification" obviously contains the protein name which is then used to identify the protein throughout the process. "Sequence data" contains the amino acid sequence of the protein which is used for various parts of the analysis. The "Feature Table data" is used in an integral way so will now be discussed in detail in the masking section. Feature Table field The Feature Table field allows the format to state the position and give information about the whereabouts of areas of interest in the proteins. There are a large number of defined keywords to describe these features and the features used for masking by SLiMFinder are given in the table below. Code Description
For example: Any one of these key names can be specified as a region to be masked by the SLiMFinder program. For example all features with the key names "TRANSMEM" can be removed from the dataset before the before the sequences are searched for motifs using the masking feature of SLiMFinder Masking OptionsBy default, sequence masking is on. The unchecking masking will switch of all masking.Disorder maskingSLiMs tend to occur in disordered regions of proteins. The SLiMFinder server uses IUPRED (Dosztanyi et al. 2005) to predict regions of disorder with a relaxed score cut-off of 0.2. Residues predicted to be "intrinsically ordered" are masked out. This can be toggled on/off using the dismask option.Because disorder masking utilises a per-residue score, there are often single residues that are just above/below the threshold in a region that is otherwise (dis)ordered. Regions can therefore be smoothed out using the minregion option, which stipulates the minimum number of consecutive residues that must have the same disorder state. (Dis)ordered regions smaller than this are assimilated into the neighbouring regions, starting with the smallest (1aa regions) and working up until all regions are large enough; within each region size, the sequence is traversed from N-terminal to C-terminal. Conservation maskingBy default conservation masking is used, metazoan orthologues are retrieved and masking of underconserved residues is carried out. For more details see:
Davey NE, Shields DC & Edwards RJ (2009): Feature maskingIf the entry format is UniProt then defined features can be maskeds, areas such as transmembrane regions, protein domains and inaccessible residues can be masked as they are areas which have a low likelihood of containing motifs.The same mechanism used for masking these region also allow the user identify specific regions of the proteins in which to confine the search , for example the user may wish to look at motifs which are occurring in the cytoplasmic regions of a set of proteins or may have prior knowledge of a region possibly containing a functional motif. There are two types of feature masking in SLiMFinder: Inclusive masking and Exclusive masking:
Inclusive masking is preformed first. This means that exclusively masked regions appearing inside an inclusively masked region will be removed. Masking is based on UniProt features, examples are:
Custom feature masking by caseIf you cannot mask features directly from UniProt entries, regions of uploaded sequences can be masked by using upper or lower case. Simply make sure all regions to be masked are in one case and all regions to be searched in the other and enter "Upper" or "Lower" in the casemask box. This will mask out the specified case. Note that this option requires uploading sequences and cannot therefore be used in conjunction with conservation masking. To use both, please download the SLiMFinder application or contact us.Additional masking optionsFor more details of these options, please refer to the SLiMFinder Manual.
SLiMBuild OptionsA number of different options can be set to control how SLiMFinder controls the motif search space during SLiMBuild motif construction. More details can be found in the SLiMFinder Manual. Motif Search Options
Motif Occurrence Options
Ambiguity Options
UPC/BLAST Options
Special Options
SLiMChance/Filtering OptionsFurther options control the SLiMChance significance algorithm, which assesses the significance of motifs returned by SLiMBuild. Additional filtering options can also customise which motifs will be returned. More details can be found in the SLiMFinder Manual. SLiMChance Options
Motif Filtering Options
Output DescriptionAn example results output for a Dynein Light Chain binding protein dataset is available here and in the screenshot walkthrough.Example VisualisationsReferences and CitationsWhen using SLiMFinder results in a publication, please cite the main PLoS One paper:
In addition, SLiMFinder uses the following underlying software:
If using IUPRED disorder prediction to mask input sequences and/or filter results, please cite:
If using Relative Local Conservation masking, in addition to the GOPHER citations (below) please cite:
If using alignments of homologous proteins, generated by GOPHER, please cite:
If using the SigV advanced SLiMChance statistics,please cite:
Contributing LabsThis server is hosted by the Clinical Bioinformatics group led by Prof. Denis Shields in the Conway Institute of Biomolecular and Biomedical Research. The tools have been developed by Rich Edwards (currently at The University of Southampton) and Norman Davey (currently at EMBL Heidelberg). This project is a collaboration between 3 institutions, Conway Institute of Biomolecular and Biomedical Research at University College Dublin (Dublin, Ireland), School of Biological Science at University of Southampton (Southampton, England) and European Molecular Biology Laboratories (Heidelberg, Germany). Q.
Why can I only use conservation masking for UniProt entries downloaded through your site?
Q.
If significance is Sig <= 0.05 why is probcut set to 0.99 by default?
Q.
Why are the results slightly different for the server when running benchmark data from the original papers?
Q.
Why do my results not have any motifs with variable-length wildcards?
|