### Compile
	make


### While SADMAMA (SM) was originally developed to compare the binding sites between two sets it can be used to assess the sites in one set.
	For example, you can assign a reference genomic set as the second set of sequences. Alternatively you can use a dummy set as the second one.
	Input set files as well as the genomic training file should be in FASTA format.
	

### If you want to just print all scores in the first set above a certain threshold use:
	$SADMAMA_PATH/sadmama.ukx -w PWMs_file -i1 input_set1 -i2 input_set2 -t genomic_file_for_learning_model_or_threshold -printScoresGTT first_set -siteThresholdLearnedFrom 1e-4 both_strands nullTrainFile -tests -- -pwmPC 0.05 -m 4 both_strands -siteNullScore avg_strands -o output_file
	###	The threshold above is learned from the training file but you can use a hard coded threshold as well using -siteThresholdFile.


### For printing the top, say, 2 scores from each sequence in the first set use:
	$SADMAMA_PATH/sadmama.ukx -w PWMs_file -i1 input_set1 -i2 input_set2 -t genomic_file_for_learning_model_or_threshold -printTopKperSeq 2 first_set -tests -- -pwmPC 0.05 -m 4 both_strands -siteNullScore avg_strands -o output_file


### SM has many options for statistical tests comparing the two sets. Most of those should be ignored unless you are sure you know what you're doing.
	Safer options include the hypergeometric and Mann-Whitney tests (see the paper for more details and tests):

	$SADMAMA_PATH/sadmama.ukx -w PWMs_file -i1 input_set1 -i2 input_set2 -t genomic_file_for_learning_model_or_threshold  -siteThresholdLearnedFrom 1e-4 both_strands nullTrainFile -tests freqScoresGTT Hypergeometric -- -pwmPC 0.05 -m 4 both_strands -siteNullScore avg_strands -o output_file

	$SADMAMA_PATH/sadmama.ukx -w PWMs_file -i1 input_set1 -i2 input_set2 -t genomic_file_for_learning_model_or_threshold  -siteThresholdLearnedFrom 1e-4 both_strands nullTrainFile -tests freqScoresGTT Hypergeometric + qualityScoresGTT  MannWhitney -- -pwmPC 0.05 -m 4 both_strands -siteNullScore avg_strands -o output_file


### To use the bootstrap tests:
	$SADMAMA_PATH/sadmama.ukx -w PWMs_file -i1 input_set1 -i2 input_set2 -t genomic_file_for_learning_model_or_threshold -siteThresholdLearnedFrom 1e-4 both_strands nullTrainFile  -tests freqScoresGTT MC + qualityScoresGTT MC MC -- -numRandomSets 1000 -set1RandTrainFile _BOTH_ -set2RandTrainFile _BOTH_ -MCmodel bootstrap 10 protectSites -pwmPC 0.05 -m 4 both_strands -siteNullScore avg_strands -o output_file_name



### Example of a file of PWMs (you can have as many matrices as you like in a single PWM file following this format):

>Oriscan (from Breier's gb-2004-5-4-r22-s2.xls)
0.40    0.10    0.07    0.43
0.23    0.07    0.13    0.57
0.33    0.00    0.03    0.63
0.37    0.00    0.10    0.53
0.03    0.00    0.03    0.93
0.13    0.00    0.00    0.87
0.10    0.03    0.00    0.87
0.90    0.03    0.03    0.03
0.07    0.23    0.00    0.70
0.47    0.00    0.47    0.07
0.00    0.00    0.00    1.00
0.00    0.07    0.00    0.93
0.07    0.00    0.00    0.93
0.37    0.00    0.00    0.63
0.03    0.13    0.73    0.10
0.07    0.10    0.17    0.67
0.23    0.07    0.13    0.57
<

>gapped_PWM: the format is "... min_gap max_gap"
	0.50    0.02    0.02    0.46
	0.00    0.06    0.76    0.18
	... 0 4
	0.04    0.10    0.24    0.62
	0.09    0.18    0.07    0.67
<



### If you plan on running SADMAMA many times, then consider saving time by using the options:
	-saveMM <filename>	(save the estimated Markov model to <filename> in the first run)
	-loadMM <filename>	(in subsequent runs load previously saved Markov model from <filename>)
	
	Note that if you use '-loadMM <filename>' you should remove the '-t genomic_file_for_learning_model' as well as the '-m 4 both_strands' parameters or you will receive an error


