Power of Pattern Counting in Molecular Sequence Analysis

Event Date: 

Wednesday, March 17, 2010 - 3:15pm

Event Date Details: 

Refreshments served at 3:00 PM

Event Location: 

  • South Hall 5607F

Dr. Fengzhu Sun (USC computational biology)

Title: Power of Pattern Counting in Molecular Sequence Analysis

Abstract: Pattern counting is frequently used in the analysis of one or multiple sequences. For individual sequences, the identification of binding sites of transcription factors (TF) and other regulatory regions, referred as motifs, is of fundamental importance. Despite the extensive studies on motif discovery, few studies have been carried out on the power of detecting enriched patterns when motifs are present in the sequence. We present theoretical results and a web implementation for calculating the power of motif discovery using pattern counting.

Several statistics based on pattern counting have been used for sequence or genome comparison. However, their usefulness and limitations are rarely studied. We provide both simulation and theoretical studies of a widely used statistic D2 for sequence comparison and show that it has limited or no power in many situations. We also provide alternative more powerful statistics for sequence comparison based on pattern counting.

Joint work with Ku SY, Luan YH, Reinert G, Waterman MS, and Zhai ZY.

1. Zhai ZY, Ku SY, Luan YH, Reinert G, Waterman MS, Sun FZ (2009) The Power of Detecting Enriched Patterns: An HMM Approach.  Journal of Computational Biology, accepted.

2. Reinert G, Chew D, Sun FZ, Waterman MS (2009) Alignment free sequence comparison (I): statistics and power. Journal of
Computational Biology 16:1-20.

3. Wan L, Reinert G, Sun FZ, Waterman MS (2009) Alignment free sequence comparison (II): theoretical power of comparison
statistics. In preparation.