Abstract
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.
Original language | English |
---|---|
Title of host publication | 2016 12th IAPR Workshop on Document Analysis Systems (DAS) |
Publisher | IEEE |
Pages | 399-404 |
Number of pages | 6 |
ISBN (Electronic) | 9781509017928 |
DOIs | |
Publication status | Published - 13 Jun 2016 |