A comparison of unsupervised abnormality detection methods for interstitial lung disease

Matt Daykin, Mathini Sellathurai, Ian Poole

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.

LanguageEnglish
Title of host publicationMedical Image Understanding and Analysis
Subtitle of host publicationMIUA 2018
EditorsMark Nixon, Sasan Mahmoodi, Reyer Zwiggelaar
PublisherSpringer
Pages287-298
Number of pages12
ISBN (Electronic)9783319959214
ISBN (Print)9783319959207
DOIs
StatePublished - 21 Aug 2018
Event22nd Conference on Medical Image Understanding and Analysis 2018 - Southampton, United Kingdom
Duration: 9 Jul 201811 Jul 2018

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
Volume894
ISSN (Print)1865-0929

Conference

Conference22nd Conference on Medical Image Understanding and Analysis 2018
Abbreviated titleMIUA 2018
CountryUnited Kingdom
CitySouthampton
Period9/07/1811/07/18

Fingerprint

Pulmonary diseases
Lung
Principal component analysis
Minimum Covariance Determinant
Patch
Support vector machines
Pathology
Isolation
Kernel Principal Component Analysis
Support Vector Machine
Estimator
Normality
Opacity
Medical imaging
Dimensionality
Novelty Detection
Fibrosis
Outlier Detection
Medical Imaging
Receiver Operating Characteristic Curve

Cite this

Daykin, M., Sellathurai, M., & Poole, I. (2018). A comparison of unsupervised abnormality detection methods for interstitial lung disease. In M. Nixon, S. Mahmoodi, & R. Zwiggelaar (Eds.), Medical Image Understanding and Analysis: MIUA 2018 (pp. 287-298). (Communications in Computer and Information Science; Vol. 894). Springer. DOI: 10.1007/978-3-319-95921-4_27
Daykin, Matt ; Sellathurai, Mathini ; Poole, Ian. / A comparison of unsupervised abnormality detection methods for interstitial lung disease. Medical Image Understanding and Analysis: MIUA 2018. editor / Mark Nixon ; Sasan Mahmoodi ; Reyer Zwiggelaar. Springer, 2018. pp. 287-298 (Communications in Computer and Information Science).
@inproceedings{c21393472c0a47f897401c03a70e085b,
title = "A comparison of unsupervised abnormality detection methods for interstitial lung disease",
abstract = "Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.",
author = "Matt Daykin and Mathini Sellathurai and Ian Poole",
year = "2018",
month = "8",
day = "21",
doi = "10.1007/978-3-319-95921-4_27",
language = "English",
isbn = "9783319959207",
series = "Communications in Computer and Information Science",
publisher = "Springer",
pages = "287--298",
editor = "Mark Nixon and Sasan Mahmoodi and Reyer Zwiggelaar",
booktitle = "Medical Image Understanding and Analysis",

}

Daykin, M, Sellathurai, M & Poole, I 2018, A comparison of unsupervised abnormality detection methods for interstitial lung disease. in M Nixon, S Mahmoodi & R Zwiggelaar (eds), Medical Image Understanding and Analysis: MIUA 2018. Communications in Computer and Information Science, vol. 894, Springer, pp. 287-298, 22nd Conference on Medical Image Understanding and Analysis 2018, Southampton, United Kingdom, 9/07/18. DOI: 10.1007/978-3-319-95921-4_27

A comparison of unsupervised abnormality detection methods for interstitial lung disease. / Daykin, Matt; Sellathurai, Mathini; Poole, Ian.

Medical Image Understanding and Analysis: MIUA 2018. ed. / Mark Nixon; Sasan Mahmoodi; Reyer Zwiggelaar. Springer, 2018. p. 287-298 (Communications in Computer and Information Science; Vol. 894).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A comparison of unsupervised abnormality detection methods for interstitial lung disease

AU - Daykin,Matt

AU - Sellathurai,Mathini

AU - Poole,Ian

PY - 2018/8/21

Y1 - 2018/8/21

N2 - Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.

AB - Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.

UR - http://www.scopus.com/inward/record.url?scp=85052867121&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-95921-4_27

DO - 10.1007/978-3-319-95921-4_27

M3 - Conference contribution

SN - 9783319959207

T3 - Communications in Computer and Information Science

SP - 287

EP - 298

BT - Medical Image Understanding and Analysis

PB - Springer

ER -

Daykin M, Sellathurai M, Poole I. A comparison of unsupervised abnormality detection methods for interstitial lung disease. In Nixon M, Mahmoodi S, Zwiggelaar R, editors, Medical Image Understanding and Analysis: MIUA 2018. Springer. 2018. p. 287-298. (Communications in Computer and Information Science). Available from, DOI: 10.1007/978-3-319-95921-4_27