Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

Nafi, Nasik Muhammad

Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

dc.contributor.author	Nafi, Nasik Muhammad
dc.date.accessioned	2019-04-22T15:06:50Z
dc.date.available	2019-04-22T15:06:50Z
dc.date.graduationmonth	May
dc.date.issued	2019-05-01
dc.description.abstract	Data sets for visual anomaly detection are often stratified such that every stratum or batch in the data set suffers from imbalance of different magnitude. A common approach to this detection task is to use supervised inductive learning from labeled or partially labeled image data to simultaneously solve the task of segmenting the anomaly and classifying it. Many representations and algorithms for these learning tasks exhibit some preference (inductive bias) towards balanced data from each class and thus perform better with balanced data sets than imbalanced. Such representations and algorithms are sensitive to not only the aggregate degree of class imbalance but its within-stratum variation. This includes learning representations such as deep learning for intermediate visual features. Several oversampling-based techniques have been proposed to mitigate the skewness of the data. However, most of the synthetic oversampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE) or Adaptive Synthetic Sampling (ADASYN) are suitable only for the low dimensional data which limits their application in visual anomaly detection. Recently, deep generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN) have been established as effective approaches to augment high-dimensional image data. However, the literature lacks a detailed study of the learning process in a data set augmented to cope with variable imbalance across strata. We carried out an experiment to analyze the training phase and the final classifier performance when the more imbalanced batch is augmented using different approaches to achieve the same data ratio as the less imbalanced batch. We identified the classification on merged batches as baseline and compared the performance of the classifier on data sets augmented by simple oversampling, an adaptation of SMOTE, and a GAN-based generative model. Our results indicate that the GAN-based augmentation is capable of avoiding overfitting and leads to better performance.
dc.description.advisor	William H. Hsu
dc.description.degree	Master of Science
dc.description.department	Department of Computer Science
dc.description.level	Masters
dc.identifier.uri	http://hdl.handle.net/2097/39692
dc.language.iso	en_US
dc.publisher	Kansas State University
dc.rights	© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Variability of class imbalance
dc.subject	Sampling versus generative
dc.subject	Data augmentation
dc.subject	Visual anomaly detection
dc.subject	Generative adversarial network
dc.subject	Over-sampling and under-sampling
dc.title	Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: NasikMuhammadNafi2019.pdf
Size:: 13.35 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -