Understanding hidden neuron activations using structured background knowledge and deductive reasoning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A central challenge in Explainable AI (XAI) is accurately interpreting hidden neuron activations in deep neural networks (DNNs). Accurate interpretations help demystify the black-box nature of deep learning models by explaining what the system internally detects as relevant in the input. While some existing methods show that hidden neuron activations can be human-interpretable, systematic and automated approaches leveraging background knowledge remain underexplored. This thesis introduces a novel model-agnostic post-hoc XAI method that integrates a Wikipedia-derived concept hierarchy of approximately 2 million classes as background knowledge and employs OWL-reasoning-based Concept Induction to generate explanations. Our approach automatically assigns meaningful class expressions to neurons in the dense layers of Convolutional Neural Networks, outperforming prior methods both quantitatively and qualitatively. In addition, we argue that understanding neuron behavior requires not only identifying what activates a neuron (recall) but also examining its precision—how it responds to other stimuli, which we define as the neuron's error margin, enhancing the granularity of neuron interpretation. To visualize these findings, we present ConceptLens, an innovative tool that visualizes neuron activations and error margins. ConceptLens offers insights into the confidence levels of neuron activations and enables an intuitive understanding of neuron behavior through visual bar charts. Together, these contributions offer a holistic approach to interpreting DNNs, advancing the explainability and transparency of AI models.