cs152 Project Phase 1: Lexical Analyzer Generation Using flex

Start Date: 1/09/24
Due Dates:1/26/24

Grade Weight: 10% of total course grade
This project can be completed in groups of 3
Google Docs Group Signup Sheet

All projects can be turned in up to 1 week late. Each day the project is late, 3% will be deducted per day for up to 21%. After a week, projects will not be accepted.

Overview

For this first part of the class project, you will use the flex tool to generate a lexical analyzer for a high-level source code language you create. Create a language specfication for your custom programming language syntax, and create a name for your custom programming language (e.g. XYZ-L). The specifications would be written informally using code examples.

After designing your custom programming language syntax, you will create a lexical analyzer to parse your programming language. The lexical analyzer should take as input a program written in your custom language (i.e. XYZ-L), parse it, and output the sequence of lexical tokens associated with the program.

Here is an example language specification for a made up language MINI-L. To reiterate, you will be designing your own programming language, so your language specification will differ from the one we give you. You do not need to write a syntax diagram. That will be for Phase 2.

[The example MINI-L language specification is described in detail here.]
[The example MINI-L language output format for your lexical analyzer is described here.]

Here is some informal code examples for MINI-L:


Flex

Flex is a tool for generating lexical analyzers. Lexical analyzers scan text (a sequence of characters) and look for lexical patterns in the text. Flex requires an input file specifying a description for a lexical analyzer to generate. From this description, flex will automatically create a C program for you (called lex.yy.c) that will perform the lexical analysis.

In our department, flex is installed and can be used on "bolt".

[A brief introduction to flex can be found here.]
[For detailed information on flex here.]

For the complete flex documentation, run the command info flex from bolt.


Detailed Requirements

CS152 Language Specification Example
The following tasks will need to be performed to complete this phase of the project.
  1. Create your own programming language, and create a name for your programming language.
  2. Write the specification for a flex lexical analyzer for your custom programming language. For this phase of the project, your lexical analyzer need only output the list of tokens identified from an inputted program. Your programming language needs to have at the very least the following features:

    • Integer scalar variables (e.g. int var)
    • One Dimensional Arrays of Integers (e.g. int array[100])
    • Assignment statements (e.g. var = 100)
    • Arithmetic operators (+,-,*,/,%)
    • Associative parenthesis e.g. (a + b) * 2
    • Comparison operators (<,==,>,<=,>=,!=)
    • While Loops
    • "break" and "continue" loop control statements
    • If-then-else statements
    • Read and Write statements
    • Comments
    • Functions with multiple parameters and return exactly one single scalar result
    • There should be at least one function called main, and programs can have multiple functions.

    You may OPTIONALLY implement (not required, but may be helpful for more complicated programs):

    • Logical (boolean) AND and OR e.g. a < 10 || b > 5
    • Logical negation e.g. !(c > 20)
    • Binary (0b1010), Octal (0741), and/or Hexadecimal number formats (0xcafebabe).
    • Else-If clauses
    • For loops

    Example: write the flex specification in a file named mini_l.lex.

    Your lexer should have the following functionality:
    • The lexer should have correct rules for detecting valid, invalid identifiers based on what they defined in their language specification
    • The lexer should detect and remove comments correctly (e.g. //This is a comment should parse as one token)
    • The lexer should detect and report proper error messages for invalid identifiers and unrecognized symbols (e.g. 2a is an error)
    • Error messages should include line and column numbers and an informative, helpful error message.

  3. Run flex to generate the lexical analyzer for MINI-L using your specification.
    Example: execute the command flex mini_l.lex. This will create a file called lex.yy.c in the current directory.
  4. Compile your lexical analyzer. This will require the -lfl flag for gcc.
    Example: compile your lexical analyzer into the executable lexer with the following command: gcc -o lexer lex.yy.c -lfl. The program lexer should now be able to convert an inputted program into the corresponding list of tokens.
What you do NOT need to implement
You are implementing a simple toy compiler at the end, and you do not need to implement the complex features of real compilers. The MIL interpreter (which we will be using to compile your program) in the end only supports some basic operations, and does not implement the complex functionality of real compilers.
Language Examples
Create language examples to go with your language design. You need to create language examples for the following pieces of code:

Example Usage

Suppose your lexical analyzer is in the executable named lexer. Then for the MINI-L program fibonacci.min, your lexical analyzer should be invoked as follows:

cat fibonacci.min | lexer

The list of tokens outputted by your lexical analyzer should then appear as they do here. The tokens can be printed to the screen (standard out).

For program primes.min, the outputted tokens should look like this.

For program break.min, the outputted tokens should look like this.