Your lexical analyzer should output each token identified from the inputted MINI-L program. Each token should appear on a separate line of output, and the tokens should appear in the output in the same order as they appear in the inputted MINI-L program. To facilitate grading, the tokens must be outputted in the format described in the table below.
There are two types of lexical errors that your lexical analyzer should catch. They are described below.
Note: for this phase of the project, even syntactically incorrect MINI-L programs may still be parsed successfully into a list of tokens. The next phase of this project is where syntax errors will be captured.
The following table describes the different kinds of tokens that may be outputted by your lexical analyzer. Comments and whitespace should be ignored by your lexical analyzer (you should not output any tokens for these).
Lexical Pattern in the Inputted MINI-L Program | Token that Should Be Outputted |
program | PROGRAM |
beginprogram | BEGIN_PROGRAM |
endprogram | END_PROGRAM |
integer | INTEGER |
array | ARRAY |
of | OF |
if | IF |
then | THEN |
endif | ENDIF |
else | ELSE |
while | WHILE |
loop | LOOP |
endloop | ENDLOOP |
read | READ |
write | WRITE |
and | AND |
or | OR |
not | NOT |
true | TRUE |
false | FALSE |
- | SUB |
+ | ADD |
* | MULT |
/ | DIV |
= | EQ |
<> | NEQ |
< | LT |
> | GT |
<= | LTE |
>= | GTE |
identifier (e.g., "aardvark", "BIG_PENGUIN", "fLaMInGo_17", "ot73r") | IDENT XXXX [where XXXX is the identifier itself] |
number (e.g., "17", "101", "90210", "0", "8675309") | NUMBER XXXX [where XXXX is the number itself] |
; | SEMICOLON |
: | COLON |
, | COMMA |
( | L_PAREN |
) | R_PAREN |
:= | ASSIGN |
Your lexical analyzer should catch two different types of lexical errors. If any such error is encountered during parsing of a MINI-L program, your lexical analyzer should terminate immediately after reporting the error message. The error message must include information about the line number and column position number within the line of the token associated with the error. The details are below.
Error Type 1: Unrecognized Symbol
Your lexical analyzer should report an error and terminate if an unrecognized symbol is encountered that is
outside of a comment. For example, consider the following MINI-L program:
1. program test; 2. n : integer; 3. beginprogram 4. read n; 5. n := n + 1? 6. write n; 7. endprogramIn the above program, the "?" symbol at line 5 (which is outside of a comment) is not defined in the MINI-L language. Thus, your lexical analyzer should output an "unrecognized symbol" error when it encounters the "?" (along with line number and position number information of the problematic symbol). For example:
Error at line 5, column 14: unrecognized symbol "?"
Error Type 2: Invalid Identifier
Your lexical analyzer should report an error and terminate if an invalid identifier is encountered.
This can occur if the identifier starts with a digit or an underscore, or if the identifier
ends with an underscore. For example, consider the following two MINI-L programs:
1. program test; 2. 2n : integer; 3. beginprogram 4. endprogram
1. program test; 2. n_ : integer; 3. beginprogram 4. endprogramIn both of the above programs, the identifier declared at line 2 is invalid. Thus, in both of these cases, your lexical analyzer should output an "invalid identifier" error when it encounters either the "2n" or the "n_". For example, in the first program above:
Error at line 2, column 0: identifier "2n" must begin with a letterAnd in the second program above:
Error at line 2, column 0: identifier "n_" cannot end with an underscore