MiniLangPP Compiler

Custom programming language compiler built from scratch with lexical analysis, parsing, and code generation

C++ Compiler Design Programming Languages Systems Programming

Project Overview

MiniLangPP is a custom programming language compiler built entirely from scratch in C++. This project demonstrates a deep understanding of compiler construction principles, including lexical analysis, syntax parsing, semantic analysis, and code generation. The compiler translates source code written in the MiniLangPP language into executable machine code.

The project showcases advanced computer science concepts including formal language theory, automata, parsing algorithms, and code optimization techniques. It serves as a comprehensive example of systems programming and compiler engineering.

Key Features

Lexical Analysis

Complete tokenization with support for keywords, identifiers, operators, and literals.

Syntax Parsing

Recursive descent parser with proper error handling and recovery mechanisms.

Semantic Analysis

Type checking, scope resolution, and symbol table management.

Code Generation

Assembly code generation with basic optimization techniques.

Error Reporting

Comprehensive error messages with line numbers and suggestions.

Standard Library

Built-in functions for I/O operations and basic data manipulation.

MiniLangPP Syntax

Here's an example of MiniLangPP syntax:

// Variable declarations
int x = 10;
float pi = 3.14159;
string message = "Hello, World!";

// Function definition
function factorial(int n) -> int {
    if (n <= 1) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

// Main program
function main() -> int {
    int result = factorial(5);
    print("Factorial of 5 is: " + toString(result));
    return 0;
}

Implementation Details

The compiler is implemented in modern C++ with the following architecture:

// Lexer - Tokenization
class Lexer {
    std::vector<Token> tokenize(const std::string& source);
    Token nextToken();
    bool isKeyword(const std::string& word);
};

// Parser - Syntax Analysis
class Parser {
    AST* parseProgram();
    AST* parseStatement();
    AST* parseExpression();
    void error(const std::string& message);
};

// Code Generator - Assembly Output
class CodeGenerator {
    void generate(AST* tree);
    void emitInstruction(const std::string& instruction);
    void optimizeCode();
};

Compiler Phases:

  • Lexical Analysis: Finite state automaton for token recognition
  • Syntax Analysis: Recursive descent parser with LL(1) grammar
  • Semantic Analysis: Symbol table and type checking
  • Code Generation: Three-address code intermediate representation
  • Optimization: Basic peephole and constant folding optimizations

Usage

# Compile the compiler
g++ -std=c++17 -o minilangpp *.cpp

# Compile a MiniLangPP program
./minilangpp program.mlpp -o output.asm

# Assemble and link (using NASM and GCC)
nasm -f elf64 output.asm -o output.o
gcc output.o -o program
./program

Command Line Options:

  • -o <file> - Specify output file
  • -v - Verbose compilation output
  • -O - Enable optimizations
  • --ast - Print abstract syntax tree
  • --tokens - Print token stream

Future Enhancements

  • Advanced Optimizations: Loop unrolling, dead code elimination
  • Debugging Support: Debug symbol generation and GDB integration
  • Object-Oriented Features: Classes, inheritance, polymorphism
  • Standard Library Expansion: File I/O, networking, data structures
  • IDE Integration: Syntax highlighting and IntelliSense support
  • Cross-Platform Support: ARM and x86 code generation