Abstract
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.
| Original language | English |
|---|---|
| Title of host publication | 2016 12th IAPR Workshop on Document Analysis Systems (DAS) |
| Publisher | IEEE |
| Pages | 399-404 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781509017928 |
| DOIs | |
| Publication status | Published - 13 Jun 2016 |