Image-based deep learning approaches for plant phenotyping


Journal Title

Journal ISSN

Volume Title



The genetic potential of plant traits remains unexplored due to challenges in available phenotyping methods. Deep learning could be used to build automatic tools for identifying, localizing and quantifying plant features based on agricultural images. This dissertation describes the development and evaluation of state-of-the-art deep learning approaches for several plant phenotyping tasks, including characterization of rice root anatomy based on microscopic root cross-section images, estimation of sorghum stomatal density and area based on microscopic images of leaf surfaces, and estimation of the chalkiness in rice exposed to high night temperature based on images of rice grains. For the root anatomy task, anatomical traits such as root, stele and late metaxylem were identified using a deep learning model based on Faster Region-based Convolutional Neural Network (Faster R-CNN) with the pre-trained VGG-16 as backbone. The model was trained on root cross-section images of roots, where the traits of interest were manually annotated as rectangular bounding boxes using the LabelImg tool. The traits were also predicted as rectangular bounding boxes, which were compared with the ground truth bounding boxes in terms of intersection over union metric to evaluate the detection accuracy. The predicted bounding boxes were subsequently used to estimate root and stele diameter, as well as late metaxylem count and average diameter. Experimental results showed that the trained models can accurately detect and quantify anatomical features, and are robust to image variations. It was also observed that using the pre-trained VGG-16 network enabled the training of accurate models with a relatively small number of annotated images, making this approach very attractive in terms of adaptations to new tasks. For estimating sorghum stomatal density and area, a deep learning approach for instance segmentation was used, specifically a Mask Region-based Convolutional Neural Network (Mask R-CNN), which produces pixel-level annotations of stomata objects. The pre-trained ResNet-101 network was used as the backbone of the model in combination with the feature pyramid network (FPN) that enables the model to identify objects at different scales. The Mask R-CNN model was trained on microscopic leaf surface images, where the stomata objects have been manually labeled at pixel level using the VGG Image Annotator tool. The predicted stomata masks were counted, and subsequently used to estimate the stomatal area. Experimental results showed a strong correlation between the predicted counts/stomatal area and the corresponding manually produced values. Furthermore, as for the root anatomy task, this study showed that very accurate results can be obtained with a relatively small number of annotated images. Working on the root anatomy detection and stomatal segmentation tasks showed that manually annotating data, in terms of bounding boxes and especially pixel-level masks, can be a tedious and time-consuming job, even when a relatively small number of annotated images are used for training. To address this challenge, for the task of estimating chalkiness based on images of rice grains exposed to high night temperatures, a weakly supervised approach was used, specifically, an approach based on Gradient-weighted Class Activation Mapping (Grad-CAM). Instead of performing pixel-level segmentation of the chalkiness in rice images, the weakly supervised approach makes use of high-level annotations of images as chalky or not-chalky. A convolutional neural network (e.g., ResNet-101) for binary classification is trained to distinguish between chalky and not-chalky images, and subsequently the gradients of the chalky class are used to determine a heatmap corresponding to the chalkiness area and also a chalkiness score for a grain. Experimental results on both polished and unpolished rice grains using standard instance classification and segmentation metrics showed that Grad-CAM can accurately identify chalky grains and detect the chalkiness area. The results also showed that the models trained on polished rice cannot be transferred between polished and unpolished rice, suggesting that new models need to be trained and fine-tuned for other types of rice grains and possibly images taken under different conditions. In conclusion, this dissertation first contributes to the field of deep learning by introducing new and challenging tasks that require adaptations of existing deep learning models. It also contributes to the field of agricultural image analysis and plant phenotyping by introducing fully automated high-throughput tools for identifying, localizing and quantifying plant traits that are of significant importance to breeding programs. All the datasets and models trained in this dissertation have been made publicly available to enable the deep learning community to use them and further advance the state-of-the-art on the challenging tasks addressed in this dissertation. The resulting tools have also been made publicly available as web servers to enable the plant breeding community to use them on images collected for tasks similar to those addressed here. Future work will focus on the adaptation of the models used in this dissertation to other similar tasks, and also on the development of similar models for other tasks relevant to the plant breeding community, to the agriculture community at large.



Image processing, Crop science, Deep learning, Neural network

Graduation Month



Doctor of Philosophy


Department of Computer Science

Major Professor

Doina Caragea