An R package for controlling the FDR in imperfect matches to an incomplete database

Implementation of procedures for estimating the FDP in a list of imperfect matches between products and a database of sources. A special case of such matches are PSMs constructed when identifying tandem mass spectra by searching a peptide database. For more information see "Controlling the FDR in imperfect matches to an incomplete database" by Uri Keich and William Stafford Noble.

However, SM can also be used to assess/scan PWM sites in a single set. For example, you can set the second set of sequences to a reference genomic file in which case SM will compare the sites in the first set to the reference file. Alterantively, you can use SM to simply output all the sites above a certain threshold (specified as a percentage or as a constant), or to print the top k sites in each of the input set sequences. Finally, you can combine the latter option with a wrapper script that calls SM many times with the option "-permutePWM" (as well as "-loadMM" to save time) to assign significance relative to permutations of the PWM applied to the same input set.

You are welcomed to download the C source code of the current version of SM. The older version that was used for the tests mentioned in the paper is also available though not recommended.

If SADMAMA helps your research we'd appreciate it if you can cite our paper:

Uri Keich, Hong Gao, Jeffrey S Garretson, Anand Bhaskar, Ivan Liachko, Justin Donato and Bik K Tye.
Computational detection of significant variation in binding affinity across two sets of sequences with application to the analysis of replication origins in yeast. BMC Bioinformatics 2008, 9:372.

