A sequential naïve Bayes classifier for DNA barcodes

Date

2015-03-04

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

DNA barcodes are short strands of 255–700 nucleotide bases taken from the cytochrome c oxidase subunit 1 (COI) region of the mitochondrial DNA. It has been proposed that these barcodes may be used as a method of differentiating between biological species. Current methods of species classification utilize distance measures that are heavily dependent on both evolutionary model assumptions as well as a clearly defined “gap” between intra- and interspecies variation. Such distance measures fail to measure classification uncertainty or to indicate how much of the barcode is necessary for classification. We propose a sequential naïve Bayes classifier for species classification to address these limitations. The proposed method is shown to provide accurate species-level classification on real and simulated data. The method proposed here quantifies the uncertainty of each classification and addresses how much of the barcode is necessary.

Description

Keywords

Naïve Bayes classifier, DNA barcoding, Phylogenetic analysis, Sequential analysis, Species classification, Species discovery

Citation