TY - JOUR
T1 - Sliding window discretization
T2 - A new method for multiple band matching of bacterial genotyping fingerprints
AU - Austin, Brian
AU - Dawyndt, Peter
AU - Gyllenberg, Mats
AU - Koski, Timo
AU - Lund, Tatu
AU - Swings, Jean
AU - Thompson, Fabiano L.
PY - 2004/11
Y1 - 2004/11
N2 - Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the other hand, the application of more profound and objective mathematical methods - such as the minimization of stochastic complexity - for the classification of bacterial genotyping fingerprints data is hampered by the prerequisite that such methods only act upon vectorized data. In this paper we introduce a new method, coined sliding window discretization, for the transformation of genotypic fingerprint patterns into binary vector format. In the context of an extensive amplified fragment length polymorphism (AFLP) data set of 507 strains from the Vibrionaceae family that has previously been analysed, we demonstrate by comparison with a number of other discretization methods that this new discretization method results in minimal loss of the original information content captured in the banding patterns. Finally, we investigate the implications of the different discretization methods on the classification of bacterial genotyping fingerprints by minimization of stochastic complexity, as it is implemented in the BinClass software package for probabilistic clustering of binary vectors. The new taxonomic insights learned from the resulting classification of the AFLP patterns will prove the value of combining sliding window discretization with minimization of stochastic complexity, as an alternative classification algorithm for bacterial genotyping fingerprints. © 2004 Society for Mathematical Biology. Published by Elsevier Ltd. All rights reserved.
AB - Microbiologists have traditionally applied hierarchical clustering algorithms as their mathematical tool of choice to unravel the taxonomic relationships between micro-organisms. However, the interpretation of such hierarchical classifications suffers from being subjective, in that a variety of ad hoc choices must be made during their construction. On the other hand, the application of more profound and objective mathematical methods - such as the minimization of stochastic complexity - for the classification of bacterial genotyping fingerprints data is hampered by the prerequisite that such methods only act upon vectorized data. In this paper we introduce a new method, coined sliding window discretization, for the transformation of genotypic fingerprint patterns into binary vector format. In the context of an extensive amplified fragment length polymorphism (AFLP) data set of 507 strains from the Vibrionaceae family that has previously been analysed, we demonstrate by comparison with a number of other discretization methods that this new discretization method results in minimal loss of the original information content captured in the banding patterns. Finally, we investigate the implications of the different discretization methods on the classification of bacterial genotyping fingerprints by minimization of stochastic complexity, as it is implemented in the BinClass software package for probabilistic clustering of binary vectors. The new taxonomic insights learned from the resulting classification of the AFLP patterns will prove the value of combining sliding window discretization with minimization of stochastic complexity, as an alternative classification algorithm for bacterial genotyping fingerprints. © 2004 Society for Mathematical Biology. Published by Elsevier Ltd. All rights reserved.
U2 - 10.1016/j.bulm.2004.02.004
DO - 10.1016/j.bulm.2004.02.004
M3 - Article
SN - 1522-9602
VL - 66
SP - 1575
EP - 1596
JO - Bulletin of Mathematical Biology
JF - Bulletin of Mathematical Biology
IS - 6
ER -