Deep learning with constraints for answer-agnostic question generation in legal text understanding

Date

2021-08-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The aim of this dissertation is to develop constraint-based methods that extend and improve on current deep learning neural networks such as transformers and sequence-to-sequence (seq2seq) models, for the problem of question generation based on the analysis of the text of legal agreements, particularly privacy policies. A privacy policy is a legally binding agreement between a customer and service provider. This dissertation focuses on analyzing a privacy policy document to generate questions that capture entities and the relationships between them. Another area of focus is the generation of constraints based on domain knowledge and their application to the deep learning network during the question generation process. A possible use case of this research is development of test corpus for question answering systems in the privacy domain because the shortage of sufficiently large corpora poses a key challenge in the development of question answering and question generation systems.

Question generation is the task of generating an interrogative sentence based on some text. Current approaches to question generation use sequence-to-sequence models with additional information like answers, positions of the answers, part-of-speech details, named entity tags among others. The idea behind such approaches is that these models can benefit from additional information about the text (i.e., sentence or paragraph). Recently, transformer-based approaches that offer the benefit of attention mechanism have also been used for generating questions. Transformers have achieved state-of-the-art results in many natural language processing tasks including text classification, machine translation, language understanding, co-reference resolution, and summarization. However, the contribution of transformers towards a task like question generation has not been as significant.

This research tries to find ways of improving existing approaches by injecting domain knowledge, modeled as a combination of logical and linguistic constraints, into these deep learning models during the training and validation phases. This work also explores design and implementation of different kind of constraints that can better direct the deep learning model towards the expected output, which in this case refers to syntactically and semantically correct and relevant questions. Another contribution of this research is the creation of custom labels for named entities in the privacy policy domain. Results show that adding some form of domain specific constraints improves the performance of the aforementioned models as compared to the performance of state-of-the-art models on the test bed used in this work. For the given test bed, constrained seq-to-seq approaches perform better than the constrained transformer-based approach.

Description

Keywords

Natural language processing, Deep learning, Legal text, Privacy policies

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Computer Science

Major Professor

William H. Hsu

Date

2021

Type

Dissertation

Citation