On improving natural language processing through phrase-based and one-to-one syntactic algorithms

K-REx Repository

Show simple item record

dc.contributor.author Meyer, Christopher Henry
dc.date.accessioned 2008-12-19T19:28:26Z
dc.date.available 2008-12-19T19:28:26Z
dc.date.issued 2008-12-19T19:28:26Z
dc.identifier.uri http://hdl.handle.net/2097/1096
dc.description.abstract Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility. The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081. The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score. The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed. en
dc.language.iso en_US en
dc.publisher Kansas State University en
dc.subject Artificial Intelligence en
dc.subject Natural language processing en
dc.subject Japanese en
dc.subject Machine translation en
dc.subject Contextual syntax en
dc.subject Phrase-based translation en
dc.title On improving natural language processing through phrase-based and one-to-one syntactic algorithms en
dc.type Thesis en
dc.description.degree Master of Science en
dc.description.level Masters en
dc.description.department Department of Computing and Information Sciences en
dc.description.advisor William H. Hsu en
dc.subject.umi Artificial Intelligence (0800) en
dc.subject.umi Computer Science (0984) en
dc.subject.umi Language, Modern (0291) en
dc.date.published 2008 en
dc.date.graduationmonth December en

Files in this item


Files Size Format View

This item appears in the following Collection(s)

Show simple item record