Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

dc.contributor.authorNafi, Nasik Muhammad
dc.date.accessioned2019-04-22T15:06:50Z
dc.date.available2019-04-22T15:06:50Z
dc.date.graduationmonthMayen_US
dc.date.issued2019-05-01
dc.date.published2019en_US
dc.description.abstractData sets for visual anomaly detection are often stratified such that every stratum or batch in the data set suffers from imbalance of different magnitude. A common approach to this detection task is to use supervised inductive learning from labeled or partially labeled image data to simultaneously solve the task of segmenting the anomaly and classifying it. Many representations and algorithms for these learning tasks exhibit some preference (inductive bias) towards balanced data from each class and thus perform better with balanced data sets than imbalanced. Such representations and algorithms are sensitive to not only the aggregate degree of class imbalance but its within-stratum variation. This includes learning representations such as deep learning for intermediate visual features. Several oversampling-based techniques have been proposed to mitigate the skewness of the data. However, most of the synthetic oversampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE) or Adaptive Synthetic Sampling (ADASYN) are suitable only for the low dimensional data which limits their application in visual anomaly detection. Recently, deep generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN) have been established as effective approaches to augment high-dimensional image data. However, the literature lacks a detailed study of the learning process in a data set augmented to cope with variable imbalance across strata. We carried out an experiment to analyze the training phase and the final classifier performance when the more imbalanced batch is augmented using different approaches to achieve the same data ratio as the less imbalanced batch. We identified the classification on merged batches as baseline and compared the performance of the classifier on data sets augmented by simple oversampling, an adaptation of SMOTE, and a GAN-based generative model. Our results indicate that the GAN-based augmentation is capable of avoiding overfitting and leads to better performance.en_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computer Scienceen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/39692
dc.language.isoen_USen_US
dc.subjectVariability of class imbalanceen_US
dc.subjectSampling versus generativeen_US
dc.subjectData augmentationen_US
dc.subjectVisual anomaly detectionen_US
dc.subjectGenerative adversarial networken_US
dc.subjectOver-sampling and under-samplingen_US
dc.titleGenerative versus sampling-based approaches to variability of class imbalance in visual anomaly detectionen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NasikMuhammadNafi2019.pdf
Size:
13.35 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: