Output Format for Lexical Analyzer

Your lexical analyzer should output each token identified from the inputted MINI-L program. Each token should appear on a separate line of output, and the tokens should appear in the output in the same order as they appear in the inputted MINI-L program. To facilitate grading, the tokens must be outputted in the format described in the table below.

There are two types of lexical errors that your lexical analyzer should catch. They are described below.

Note: for this phase of the project, even syntactically incorrect MINI-L programs may still be parsed successfully into a list of tokens. The next phase of this project is where syntax errors will be captured.

List of Tokens

The following table describes the different kinds of tokens that may be outputted by your lexical analyzer. Comments and whitespace should be ignored by your lexical analyzer (you should not output any tokens for these).

Lexical Pattern in the Inputted MINI-L Program	Token that Should Be Outputted
Reserved Words
program	PROGRAM
beginprogram	BEGIN_PROGRAM
endprogram	END_PROGRAM
integer	INTEGER
array	ARRAY
of	OF
if	IF
then	THEN
endif	ENDIF
else	ELSE
while	WHILE
loop	LOOP
endloop	ENDLOOP
read	READ
write	WRITE
and	AND
or	OR
not	NOT
true	TRUE
false	FALSE
Arithmetic Operators
-	SUB
+	ADD
*	MULT
/	DIV
Comparison Operators
=	EQ
<>	NEQ
<	LT
>	GT
<=	LTE
>=	GTE
Identifiers and Numbers
identifier (e.g., "aardvark", "BIG_PENGUIN", "fLaMInGo_17", "ot73r")	IDENT XXXX [where XXXX is the identifier itself]
number (e.g., "17", "101", "90210", "0", "8675309")	NUMBER XXXX [where XXXX is the number itself]
Other Special Symbols
;	SEMICOLON
:	COLON
,	COMMA
(	L_PAREN
)	R_PAREN
:=	ASSIGN

Lexical Errors to Catch

Your lexical analyzer should catch two different types of lexical errors. If any such error is encountered during parsing of a MINI-L program, your lexical analyzer should terminate immediately after reporting the error message. The error message must include information about the line number and column position number within the line of the token associated with the error. The details are below.

Error Type 1: Unrecognized Symbol

Your lexical analyzer should report an error and terminate if an unrecognized symbol is encountered that is outside of a comment. For example, consider the following MINI-L program:

1. program test;
2. n : integer;
3. beginprogram
4.    read n;
5.    n := n + 1?
6.    write n;
7. endprogram

In the above program, the "?" symbol at line 5 (which is outside of a comment) is not defined in the MINI-L language. Thus, your lexical analyzer should output an "unrecognized symbol" error when it encounters the "?" (along with line number and position number information of the problematic symbol). For example:

Error at line 5, column 14: unrecognized symbol "?"

Error Type 2: Invalid Identifier

Your lexical analyzer should report an error and terminate if an invalid identifier is encountered. This can occur if the identifier starts with a digit or an underscore, or if the identifier ends with an underscore. For example, consider the following two MINI-L programs:

1. program test;
2. 2n : integer;
3. beginprogram
4. endprogram

1. program test;
2. n_ : integer;
3. beginprogram
4. endprogram

In both of the above programs, the identifier declared at line 2 is invalid. Thus, in both of these cases, your lexical analyzer should output an "invalid identifier" error when it encounters either the "2n" or the "n_". For example, in the first program above:

Error at line 2, column 0: identifier "2n" must begin with a letter

And in the second program above:

Error at line 2, column 0: identifier "n_" cannot end with an underscore