Compiling Java in linear nondeterministic space

Date

2018-05-01

Authors

Donnoe, Joshua

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Shannon’s and Chomsky’s attempts to model natural language with Markov chains showed differing gauges of language complexity. These were codified with the Chomsky Hierarchy with four types of languages, each with an accepting type of grammar and au- tomaton. Though still foundationally important, this fails to identify remarkable proper subsets of the types including recursive languages among recursively enumerable languages. In general, with Rice’s theorem, it is undecidable whether a Turing machine’s language is re- cursive. But specifically, Hopcroft & Ullman show that the languages of space bound Turing machines are recursive. We show the converse also to be true. The space hierarchy theorem shows that there is a continuum of proper subsets within the recursive languages. With Myhill’s description of a linear bounded automata, Landweber showed that they accept a subset of the type 1 languages including the type 2 languages. Kuroda expanded the definition making the automata nondeterministic and showed that nondeterministic linear space is the set of type 1 languages. That only one direction was proven deterministically but both nondeterministically, would suggest that nondeterminism increases expressiveness. This is further supported by Savitch’s theorem. However, it is not without precedent for predictions in computability theory to be wrong. Turing showed that Hilbert’s Entschei- dungsproblem is unsolvable and Immerman disproved Landweber’s belief that type 1 lan- guages are not closed under complementation. Currently, a major use of language theory is computer language processing including compilation. We will show that for the Java programming language, compilability can be computed in nondeterministic linear space by the existence of a (nondeterministic) linear bounded automaton which abstractly computes compilability. The automaton uses the tra- ditional pipeline architecture to transform the input in phases. The devised compiler will attempt to build a parse tree and then check its semantic properties. The first two phases, lexical and syntactical analysis are classic language theory tasks. Lexical analysis greedily finds matches to a regular language. Each match is converted to a token and printed to the next stream. With this, linearity is preserved. With a Lisp format, a parse tree can be stored as a character string which is still linear. Since the tree string preserves structural information from the program source, the tree itself serves as a symbol table, which normally would be separately stored in a readable efficient manner. Though more difficult than the previous step, this will also be shown to be linear. Lastly, semantic analysis, including typechecking, and reachability are performed by traversing the tree and annotating nodes. This implies that there must exist a context-sensitive grammar that accepts compilable Java. Therefore even though the execution of Java programs is Turing complete, their compilation is not.

Description

Keywords

Computability, Compilation, Turing, Automata, Context, Sensitivity

Graduation Month

May

Degree

Master of Science

Department

Department of Computer Science

Major Professor

Torben Amtoft

Date

2018

Type

Thesis

Citation