TY - JOUR

T1 - Non-parametric change-point estimation using string matching algorithms

AU - Johnson, Oliver

AU - Sejdinovic, Dino

AU - Cruise, Richard James Randon

AU - Piechocki, Robert

AU - Ganesh, Ayalvadi

PY - 2014/12

Y1 - 2014/12

N2 - Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.

AB - Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.

U2 - 10.1007/s11009-013-9359-2

DO - 10.1007/s11009-013-9359-2

M3 - Article

SN - 1387-5841

VL - 16

SP - 987

EP - 1008

JO - Methodology and Computing in Applied Probability

JF - Methodology and Computing in Applied Probability

IS - 4

ER -