Characterising DNA/RNA signals with crisp hypermotifs: A case study on core promoters

Carey Pridgeon, David Corne

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A common way to characterise important and conserved signals in nucleotide sequences, such as transcription factor binding sites, is via the use of so-called consensus sequences or consensus patterns. A well-known example is the so-called "TATA-box" commonly found in eukaryotic core promoters. Such patterns are valuable in that they offer an insight into basic molecular biology processes, and can support reasoning regarding the understanding, design and control of these processes. However it is rare for such patterns to be accurate; instead they represent a very approximate characterisation of the signal under study. At the opposite extreme, we may instead characterise such a signal via a neural network, or a high-order Markov model, and so on. These have better sensitivity and specificity, but are unreadable, and consequently unhelpful for conveying an understanding of the underlying molecular biology processes that could support insight or reasoning. We describe a simple pattern language, called crisp hypermotifs (CHMs), that leads to highly readable patterns that can support understanding and reasoning, yet achieve greater sensitivity and specificity than the commonly used approaches to crisply characterise a signal. We use evolutionary computation to discover high-performance CHMs from data, and we argue that CHMs be used in place of classical consensus motifs, and justify that by presenting examples derived from a large dataset of mammalian core promoters. We provide CHM alternatives to the well-known core promoter TATA-box and Initiator patterns that have better sensitivity and specificity than their classical counterparts. © Springer-Verlag Berlin Heidelberg 2007.

Original languageEnglish
Title of host publicationEvolutionary Computation, Machine Learning and Data Mining in Bioinformatics - 5th European Conference, EvoBIO 2007, Proceedings
Pages227-235
Number of pages9
Volume4447 LNCS
Publication statusPublished - 2007
Event5th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics - Valencia, Spain
Duration: 11 Apr 200713 Apr 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4447 LNCS
ISSN (Print)0302-9743

Conference

Conference5th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Abbreviated titleEvoBIO 2007
Country/TerritorySpain
CityValencia
Period11/04/0713/04/07

Fingerprint

Dive into the research topics of 'Characterising DNA/RNA signals with crisp hypermotifs: A case study on core promoters'. Together they form a unique fingerprint.

Cite this