CompariMotif Help Pages

Pages:

Delimited Amino Acid Frequency File

CompariMotif can take a delimited file which has two columns. The first should be the single letter amino acid (or nucleotide) code, and the second the frequency. E.g.:

AA  FREQ
A   0.074
R   0.042
N   0.044
D   0.059
C   0.033
E   0.058
Q   0.037
G   0.074
H   0.029
I   0.038
L   0.076
K   0.072
M   0.018
F   0.040
P   0.050
S   0.081
T   0.062
W   0.013
Y   0.033
V   0.068
      

Note that the order of the amino acids does not matter but the file must have column headings in the first row. In principle, the following should not affect the reading of the file:

  • Column headers: these can be anything.
  • Number of columns: be default, the second column with contain frequencies. If there are more than two columns in a file and one of them has the header "FREQ", this column will be used.
  • Alphabet: Additional frequencies not used by CompariMotif (e.g. X or -), or a "Total" column, will be ignored.
  • Frequency sum: Frequencies for the appropriate amino acids will be rescaled to sum to 1.0.

However, for simplicity and safety, it is recommended to use a two column whitespace-delimited file like the example, and have all frequencies sum to 1.0.

NB.One consequence of this flexibility is that an AA frequency file can be read in for nucleotide data without warnings or errors. If the above frequency table was read in for a comparison using dna=T, only the A, G, C and T frequencies would be used. This would be rescaled to sum 1.0 and give the equivalent of reading in the following file (which is obviously not correct):

NT  FREQ
A   0.305
C   0.136
G   0.305
T   0.255
      

© RJ Edwards (2012). Last modified 13th August 2012.