SADMAMA

Significance Assessment of the Difference in MAtrix MAtches

SADMAMA (SM) was originally designed to address the question of whether one set of sequences has more and/or better binding sites of a particular transcription factor than the other. The binding sites are modeled as matches to a, possibly gapped, position weight matrix (PWM) which is presumed to be known.

However, SM can also be used to assess/scan PWM sites in a single set. For example, you can set the second set of sequences to a reference genomic file in which case SM will compare the sites in the first set to the reference file. Alterantively, you can use SM to simply output all the sites above a certain threshold (specified as a percentage or as a constant), or to print the top k sites in each of the input set sequences. Finally, you can combine the latter option with a wrapper script that calls SM many times with the option "-permutePWM" (as well as "-loadMM" to save time) to assign significance relative to permutations of the PWM applied to the same input set.

For more details please refer to the paper.

You are welcomed to download the C source code of the current version of SM. The older version that was used for the tests mentioned in the paper is also available though not recommended.

If SADMAMA helps your research we'd appreciate it if you can cite our paper:

Uri Keich, Hong Gao, Jeffrey S Garretson, Anand Bhaskar, Ivan Liachko, Justin Donato and Bik K Tye.
Computational detection of significant variation in binding affinity across two sets of sequences with application to the analysis of replication origins in yeast. BMC Bioinformatics 2008, 9:372.

Version history:

12 May 2008: Initial version as described in the paper.
17 February 2010: "Gapped version" accepts gapped motifs and added "background symmetrization" option.
10 April 2011: Fixed a memory leak bug that affected only MC bootstrap tests.