go to the top page
The CONSENSUS is a method that identifies the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments. Information about the position and orientation of the binding sites within the fragments is not needed. The method compares the "information content" of a large number of possible binding sites alignments to arrive at a matrix representation of the binding site pattern.CONSENSUS
References: G.Z. Hertz and G.D. Stormo, Bioinformatics, 15, pp.563-577 (1999),consensus options setting
home pages: http://ural.wustl.edu/
abstract: available here
About the OUTPUT
The program prints two different lists of matricesMultiple EM for Motif Elicitation
MEME discovers one or more motifs in a collection of DNA sequences by using the technique of expectation maximization (EM) to fit a two-component finite mixture model to the set of sequences. The algorithm estimates how many times each motif occurs in each sequence in the datasets and outputs the alignment of the occurences of the motif. Patterns with variable-length are splitted by MEME into two or more motifs. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences and description for each motif.home pages: http://meme.sdsc.edu/meme/ abstract: available herememe options setting
Default Numbers of Sites for each Motif | ||
---|---|---|
type of distribution | minimum sites | maximum sites |
one occurrence per sequence | n | n |
zero or one occurrence per sequence | sqrt(n) | n |
any number of repetitions per sequence | sqrt(n) | min(5*n, 50) |
GIBBS sampler
The Gibbs sampler stochastically examines candidate alignments in an effort to find the best alignment as measured by the maximum a posteriori (MAP) log-likelihood ratio. This algorithm finds an optimized local alignment model for N sequences in N linear time, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. Usually Gibbs sampler exists in two modes: Bernuolli sampler and Site sampler. The first one proposes an initial "quesstimate" about the number of elements of each motif type, although say nothing about the distribution of the elements through the dataset. In the case of Site sampler each sequence must contains one motif. Melina provides Bernoulli sampler as a default.References: Lawrence, C.E., Altschul, S.F., Bogouski, M.S., Liu, J.S., Neuwald, A.F., and Wooten, J.C., Science, 262,gibbs sampler options setting
pp.208-214, (1993)
home pages: http://bayesweb.wadsworth.org/gibbs/ abstract: available here
>BLOCKS 0.05 0.35 0.35 0.25
assigns 5% probability to 0 sites per sequence, 35% to 1 site etc. The values will be normalized, so it is not necessary that they add to 1.CORESEARCH
CORESEARCH is a program for identifying potential functional elements like protein binding sites in DNA sequences, solely from nucleotide sequence data. The algorithm is based on a search for n-tuples (number of motif elements), which occur at least in a minimum percentage of the sequences with no or one mismatch, which may be at any position of the motif. In contrast to functional motifs, random motifs show no preferred pattern of mismatch locations within the motif or in the conservation extended beyond the motif. Selection is carried out by maximization of the information content first for the n-tuple motif, then for a region containing the motif and finally for the complete binding site.References: Wolfersetter, F., Kornelie, F., Hermann, G., and Werner, T., CABIOS,12, pp.71-81, (1996)coreserch options setting
home pages: http://www.gsf.de/biodv/coresearch.html abstract: available here