School of Mathematics and Statistics F07
University of
Sydney NSW 2006
Australia
+61 2 9351 2307
uri@maths.usyd.edu.au

Bioinformatics. My current research is motivated by, and mostly concentrated in, biological sequence analysis. It ranges from collaborative research on actual biological problems such as the mapping of sequence motifs involved in DNA replication initiation, through the development of tools for the discovery and analysis of sequence motifs, to the design of novel computational approach for analysis of classical statistical tests. The following couple of examples of projects I am working on will give you a better idea what I mean.
Motif Finding - The identification of transcription factor binding sites is an important step in understanding the regulation of gene expression. To address this need, many motif-finding tools (or finders) have been described that can find short sequence motifs given for example only an input set of sequences. The motifs returned by these tools are evaluated and ranked according to some measure of statistical over-representation, the most popular of which is based on the information content, or entropy. Our interest here lies mainly in analyzing the statistical significance of a finder's output. This important area has lagged considerably behind the extensive development of the finders. While our goal is to design a reliable and usable significance analysis, we also show how such analysis can be leveraged to improve the actual motif finding process. Joint work with my former Ph.D. student Niranjan Nagarajan (now a Senior Research Scientist at the Genome Institute of Singapore) and my current (Cornell University) Ph.D. student Patrick Ng.
Study of Replication Origins - DNA replication is a fundamental process essential for cell proliferation. While the proteins involved in initiating DNA replication are essentially conserved from yeast to humans, the implicated sequence motifs that these conserved factors interact with are poorly understood outside of S. cerevisiae (baker's yeast). In collaboration with Cornell molecular biologist Bik Tye and her post-doctoral researcher Ivan Liachko we are mapping replication origins in other yeast species hoping to gain an understanding of the evolution of DNA replication origins. Joint work with Anand Bhaskar (a former Cornell student now a Ph.D. candidate at UC Berkeley).
Computational Statistics - Our search for an efficient and accurate computation of motif significance led us to develop a new approach for exact tests (exact tests are ones where the significance of the test is evaluated directly from the underlying distribution rather than using an approximation). Borrowing ideas from large-deviation theory, the underlying mechanism of our approach is the exact numerical calculation of the exponentially shifted characteristic function of the test statistic. We use this approach so far to develop faster exact algorithms for the classical multinomial goodness-of-fit test and the Mann-Whitney test. Joint work with Niranjan Nagarajan.
Statistical
Methods in Bioinformatics: Semester 2, 2009
STAT
2911 - Probability and Statistical Models (Adv): Semester 1, 2009
GIMSAN – a novel tool for de novo motif finding that includes a reliable significance analysis
SADMAMA – computational tool for detection of significant variation in binding affinity across two sets of sequences
The FAST package – Fourier transform based Algorithms for Significance Testing of ungapped multiple alignments
csFFT/sFFT – computing the p-value of the information content (entropy score) of a sequence motif
BagFFT – computing the exact p-value of the llr statistic for multinomial goodness-of-fit test
GibbsILR – a Gibbs sampler based motif finder
Ph.D. in Mathematics, Courant Institute, New York
University
Thesis title: Stationary Approximations to
Non-Stationary Stochastic Processes.
Advisor: Prof. H . P.
McKean
M.Sc. in Mathematics, Department of Mathematics, Technion -
Israel Institute of Technology
Thesis title: A Generalization
of the "Ahlswede Daykin Inequality".
Advisor: Prof.
R. Aharoni
B.Sc. in Computer Science and Mathematics, Hebrew University of Jerusalem
Liachko I., Bhaskar A., Li C., Chung S.C.C., Tye B.K., and Keich U. A Comprehensive Genome-Wide Map of Autonomously Replicating Sequences in a Naive Genome. Submitted.
Oliver H.F., Orsi R.H., Ponnala L., Keich U., Wang W., Sun Q., Cartinhour S.W., Filiatrault M.J., Wiedmann M., and Boor K.J. Peering inside a killer's tool kit: mapping L. monocytogenes coding and noncoding RNAs. Submitted.
Nagarajan N., Keich U. Reliability and efficiency of algorithms for computing the significance of the Mann-Whitney test. Computational Statistics. In Press.
Ng P., Keich U. Factoring local sequence composition in motif significance analysis. Genome Informatics, 1:15-26, 2008.
Keich U., Gao H., Garretson JS., Bhaskar A., Liachko I., Donato J., Tye B. Computational detection of significant variation in binding affinity across two sets of sequences with application to the analysis of replication origins in yeast. BMC Bioinformatics, 9:372, 2008. (paper).
Ng P., Keich U. GIMSAN: a Gibbs motif finder with significance analysis. Bioinformatics, 24(19):2256-7, 2008.
Keich U., Ng P. A conservative parametric approach to motif significance analysis. Genome Informatics, 19:61-72, 2007. (preprint)
Nagarajan N., Keich U. FAST: Fourier transform based Algorithms for Significance Testing of ungapped multiple alignments. Bioinformatics, 24(4):577-8, 2008.
Ng P., Nagarajan N., Jones N., and Keich U. Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone. Bioinformatics, 22(14):e393-401, ISMB 2006. (preprint)
Nagarajan N., Ng P., Keich U. Refining motif finders with E-value calculations. Proceedings of the 3rd RECOMB Satellite Workshop on Regulatory Genomics, Singapore 2006. (preprint)
Keich U., Nagarajan N. A fast and numerically robust method for exact multinomial goodness-of-fit test. Journal of Computational and Graphical Statistics, , 15(4):779-802, 2006. (preprint)
Nagarajan N., Jones N., and Keich U. Computing the p-value of the information content from an alignment of multiple sequences. Bioinformatics, Vol. 21, Suppl 1, ISMB 2005, i311-i318. (preprint)
Buhler J., Keich U., Sun Y. Designing Seeds for Similarity Search in Genomic DNA. Journal of Computer and System Sciences, Volume 70, Issue 3, May 2005, Pages 342-363. (preprint)
Keich U., and Nagarajan N. A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic. Proceedings of the 4th International Workshop on Algorithms in Bioinformatic (WABI 2004), September 2004, Bergen, Norway. (preprint)
Keich U. sFFT: a faster accurate computation of the p-value of the entropy score. Journal of Computational Biology, Volume 12, Number 4, May 2005, Pages 416-430. (preprint)
Zhi D., Keich U., Pevzner P., Heber S., and Tang H. Checking for base-calling errors in repeats. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(1):54-64, (2007). (preprint)
Keich U., Li M., Ma B., and Tromp J. On Spaced Seeds for Similarity Search. Discrete Applied Mathematics, 138(3):253--263. 2004. (preprint)
Buhler J., Keich U., Sun Y. Designing Seeds for Similarity Search in Genomic DNA. Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB-2003), April 2003, Berlin, Germany. (preprint)
Eskin E., Keich U., Gelfand M.S., Pevzner P.A. Genome-Wide Analysis of Bacterial Promoter Regions. Proceedings of the Pacific Symposium on Biocomputing (PSB-2003), January 2003, Kaua'i, Hawaii. (preprint)
Keich U., and Pevzner, P.A. Finding motifs in the twilight zone. Bioinformatics, Vol. 18 (2002), Issue 10, 1374-1381. (preprint)
Keich U., and Pevzner P.A. Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics, Vol. 18 (2002), Issue 10, 1382-1390. (preprint)
Keich U. and Pevzner P.A. Finding motifs in the twilight zone. Proceedings of the Sixth Annual International Conference on Research in Computational Molecular Biology (RECOMB-2002), April 2002, Washington DC, USA, ACM Press. (preprint)
Keich U., A Stationary Tangent - the Discrete and Non-smooth Cases. Journal of Time Series Analysis, March 2003, vol. 24, no. 2, pp. 173-192(20). (preprint)
Cwikel M. and Keich U., Optimal decompositions for the K-functional for a couple of Banach lattices. Arkiv för Matematik, 39 (2001), No. 1, 27-64. (preprint)
Keich U., A Possible Definition of A Stationary Tangent. Stochastic Processes and Their Applications, 88 (2000), No. 1, 1-36. (preprint)
Keich U., Krein's Strings, the Symmetric Moment Problem, and Extending a Real Positive Definite Function., Communications on Pure and Applied Mathematics, 52 (1999), no. 10, 1315-1334. (preprint)
Keich U., On Lp Bounds for Kakeya Maximal Functions and the Minkowski Dimension in R2., Bulletin of the London Mathematical Society, 31 (1999), 213-221. (preprint)
Keich U., Absolute Continuity Between the Wiener and Stationary Gaussian Measures., Pacific Journal of Mathematics, Vol. 88 (1999), No. 1, 95-108. (preprint)
Keich U., The Entropy Distance Between the Wiener and Stationary Gaussian Measures., Pacific Journal of Mathematics, Vol. 88 (1999), No. 1, 109-128. (preprint)
Aharoni R. and Keich U, A Generalization of the Ahlswede Daykin
Inequality., Discrete Mathematics , 152 (1996), 1-12.