Deep learning with constraints for answer-agnostic question generation in legal text understanding

dc.contributor.authorLamba, Deepti
dc.date.accessioned2021-08-10T20:38:13Z
dc.date.available2021-08-10T20:38:13Z
dc.date.graduationmonthAugusten_US
dc.date.issued2021-08-01
dc.date.published2021en_US
dc.description.abstractThe aim of this dissertation is to develop constraint-based methods that extend and improve on current deep learning neural networks such as transformers and sequence-to-sequence (seq2seq) models, for the problem of question generation based on the analysis of the text of legal agreements, particularly privacy policies. A privacy policy is a legally binding agreement between a customer and service provider. This dissertation focuses on analyzing a privacy policy document to generate questions that capture entities and the relationships between them. Another area of focus is the generation of constraints based on domain knowledge and their application to the deep learning network during the question generation process. A possible use case of this research is development of test corpus for question answering systems in the privacy domain because the shortage of sufficiently large corpora poses a key challenge in the development of question answering and question generation systems. Question generation is the task of generating an interrogative sentence based on some text. Current approaches to question generation use sequence-to-sequence models with additional information like answers, positions of the answers, part-of-speech details, named entity tags among others. The idea behind such approaches is that these models can benefit from additional information about the text (i.e., sentence or paragraph). Recently, transformer-based approaches that offer the benefit of attention mechanism have also been used for generating questions. Transformers have achieved state-of-the-art results in many natural language processing tasks including text classification, machine translation, language understanding, co-reference resolution, and summarization. However, the contribution of transformers towards a task like question generation has not been as significant. This research tries to find ways of improving existing approaches by injecting domain knowledge, modeled as a combination of logical and linguistic constraints, into these deep learning models during the training and validation phases. This work also explores design and implementation of different kind of constraints that can better direct the deep learning model towards the expected output, which in this case refers to syntactically and semantically correct and relevant questions. Another contribution of this research is the creation of custom labels for named entities in the privacy policy domain. Results show that adding some form of domain specific constraints improves the performance of the aforementioned models as compared to the performance of state-of-the-art models on the test bed used in this work. For the given test bed, constrained seq-to-seq approaches perform better than the constrained transformer-based approach.en_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeDoctor of Philosophyen_US
dc.description.departmentDepartment of Computer Scienceen_US
dc.description.levelDoctoralen_US
dc.identifier.urihttps://hdl.handle.net/2097/41629
dc.language.isoen_USen_US
dc.subjectNatural language processingen_US
dc.subjectDeep learningen_US
dc.subjectLegal texten_US
dc.subjectPrivacy policiesen_US
dc.titleDeep learning with constraints for answer-agnostic question generation in legal text understandingen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DeeptiLamba2021.pdf
Size:
874.42 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: