beluga

a standard C compiler

C

beluga is a standard C compiler being developed based on an earlier version of lcc. It supports C90 (to be precise, ISO/IEC 9899:1990) as its ancestor does and is planned to extend the coverage to C99 (and C11 finally).

Compared to its parent, beluga carefully implements the language standard and thus provides production-quality diagnostics including caret diagnostics, range highlighting, typedef preservation and macro expansion tracking:

screenshot for enhanced front-end features

The generated code is not highly optimized, but satisfactory enough for daily use. (This is a hobby project; never easy for me alone to catch up production compilers like gcc and clang+llvm.)

beluga currently produces assembly output for x86 only (and uses an assembler from the target system). Thanks to its origin, however, it can be readily retargeted to other platforms. Support for 64-bit machines (like x86-64) requires new significant features to be implemented and is one of most important goals of this project.

Also I'm redesigning each part of the compiler aiming for better structure (e.g., see below for an integrated preprocessor) and have a plan to completely replace the back-end interface and implementation to ease adoptation of more ambitious optimization techniques mainly based on a CFG.

An integrated preprocessor

The preprocessor formerly developed as a separate executable under the name of sea-canary, has been integrated into the compiler. It reads source code and delivers tokens (not characters) to the compiler proper via a token stream (not via a temporary file). It is fairly fast, is correct enough to pass many complicated test cases, produces highly compact output and has rich diagnostics. For example, it catches, with -Wtoken-paste-order option, code that subtly depends on an unspecified evaluation order of the ## operator like this:

#define concat(x, y, z) x ## y ## z
concat(3.14e, -, f)    /* non-portable */

and, due to the line mapper shared by the compiler, it pinpoints problematic spots as precisely as possible:

range highlighting on sub-expression from macro expansion

The current version conforms to C90, but supports features like empty arguments and variadic macros introduced in C99 and widely used now.

running beluga

How to install

This package provides no automated way to install; beluga is not a big compiler and manual installation is not quite difficult.

I wrote this guide based on installation onto my Gentoo Linux (x86-64) machine. Installing on non-Linux systems probably requires meaningful changes on the package; for example, headers to replace a part of the standard headers may vary.

In this document, the term beluga, depending on the context, refers to the compiler implementation or the whole package including a preprocessor and a driver.

Installation directory

Before installation, you need to determine directories into which built executables, supporting libraries and headers are placed. In this document, it is assumed that:

As indicated by the paths, this means system-wide or global installation. Local installation is also possible by simply changing those paths to your local ones (always use absolute paths; e.g., /home/username/var/bin/ instead of ~/user/var/bin/ or ./var/bin/).

Some of this configuration has to be put into the following files that are incorporated to build beluga's driver, bcc:

If you find a directory under bcc/ to contain configuration to meet your needs, changing the simbolic link bcc/host to point to that directory will save you labor.

beluga.h for my system looks like:

"/usr/local/bin/beluga",
"-U__GNUC__",
/* "-D__STRICT_ANSI__", */
"-D_POSIX_SOURCE",
"-D__i386__",
"-D__unix__",
"-D__linux__",
"-D__gnuc_va_list=va_list",
"$1",
"--include-system=/usr/local/lib32/bcc/include",
"--include-system=/usr/local/lib32/bcc/gcc/include",
"--include-system=/usr/local/include",
"--include-system=/usr/local/lib32/bcc/gcc/include-fixed",
"--include-system=/usr/include",
"--target=x86-linux",
"-v",
"-o", "$3",
"$2",

The first line invokes beluga installed in /usr/local/bin/ and the rest specifies options to it. beluga.h is processed by #include and thus C comments exclude unnecessary lines. "$1", "$2" and "$3" have special meanings and will be replaced by user-provided options, an input file name and an output file name respectively. For example, when you run bcc with these options:

-I/path/to/headers -o foo.o -c foo.c

options to the preprocessor (-I/path/to/headers in this example) substitute for $1, foo.c does for $2. $3 is replaced by a generated temporary name to pass the result to the assembler.

Note that options for system header paths (starting with --include-system= above) follow $1 to let the preprocessor inspect user-provided paths given by -isystem first (if any); the driver translates -isystem into --include-system to deliver to the preprocessor.

beluga takes advantage of an assembler and a linker from the target system and you have to ensure the driver be able to access them by giving proper paths in as.h and ld.h.

This shows an example of as.h:

"/usr/bin/as",
"--32",
"$1",
"-o", "$3",
"$2",

The --32 option is to force the assembler to accept x86 assembly code on a x86-64 system; beluga is a 32-bit compiler while my system is 64-bit.

ld.h looks complicated:

"/usr/bin/ld",
"-m", "elf_i386",
"-dynamic-linker", "/lib/ld-linux.so.2",
"/usr/lib32/crt1.o",
"/usr/lib32/crti.o",
"/usr/local/lib32/bcc/gcc/32/crtbegin.o",
"-L/usr/local/lib32/bcc/gcc/32",
"-L/usr/lib32",
"-L/lib32",
"-L/usr/x86_64-pc-linux-gnu/lib",
"-L/usr/lib",
"$2",
"-lc",
"/usr/local/lib32/bcc/gcc/32/crtend.o",
"/usr/lib32/crtn.o",
"-o", "$3",
"$1",
"/usr/local/lib32/bcc/xfloat.o",

In the linking phase, a set of start-up code and supporting libraries are linked to build an executable, which explains why there are many options to ld. The last one, xfloat.o is a support object file for compiler-provided float.h.

Search paths for system headers, necessary start-up files and paths to system libraries can be inspected by running an existing compiler (for example, gcc) with an option to display program invocations as in:

gcc -v hello.c

Configuration macros

When building beluga, it is necessary to define several macros properly to select optional features and to pass environmental information.

The following macros are used by ocelot that beluga depends on:

The macros used in common include:

Macros for the preprocessor proper are:

Besides SYSTEM_HEADER_DIR used in build-time, there are two other ways to set search paths for system headers. One is, as already explained, giving --include-system options in beluga.h, and the other is using environmental variables CPATH and C_INCLUDE_PATH in run-time. beluga searches for system headers in the order:

These all specify system header paths. -I options to the driver exists for non-system header paths, and they are searched first before looking in system paths. beluga does its best to ignore redundant paths and to keep non-system paths from overriding system ones; for example, -I /usr/include is silently ignored when bcc is built with --include-system=/usr/include.

Lastly, this macro is for the driver(bcc):

When passing a C string with the -D option, do not forget to escape double quotes with backslashes; for instance, -DTMP_DIR=\"var/tmp/\".

Building beluga

A usual setting to build beluga on a Unix-like machine is to run make on the project root as follows:

CFLAGS="-DMEM_MAXALIGN=4 -DHAVE_COLOR -DHAVE_ICONV -DSHOW_WARNCODE -DHAVE_REALPATH" make

If you are on a x86-64 machine, it is necessary to add -m32 to both CFLAGS and LDFLAGS:

CFLAGS="-DMEM_MAXALIGN=4 -DHAVE_COLOR -DHAVE_ICONV -DSHOW_WARNCODE -DHAVE_REALPATH -m32" LDFLAGS="-m32" make

(Make sure that your system is able to build binaries for x86. For example, running yum install glibc-devel.i686 libgcc.i686 on Fedora-based distros and sudo apt-get install gcc-multilib on Ubuntu Linux brings necessary components.)

Successful build of beluga generates two executables, bcc and beluga in the build/ directory.

Copying files

The generated executables have to be copied into the directory you decided to make use of. Assuming you are on the project root,

cp build/{bcc,beluga} /usr/local/bin/

will do that. (Of course, ensure you have proper permission, e.g., by letting sudo run that command.)

Also copy a support object and headers to override existing ones:

mkdir -p /usr/local/lib32/bcc
cp build/xfloat.o /usr/local/lib32/bcc/
cp -Lr build/include /usr/local/lib32/bcc/

beluga utilizes and therefore needs to refer to existing libraries and headers for them. In order to avoid hard-coding a path to existing resources, it is useful to create a symbolic link to them, which /usr/local/lib32/bcc/gcc is for; for instance:

ln -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3 /usr/local/lib32/bcc/gcc

on my machine. This path to gcc's resources was also obtained from gcc -v.

We have finished to install beluga. By compiling a small program that uses standard headers,

bcc -W -Wall hello.c

you can examine your installation. Or by adding CC=bcc when triggering make to build beluga, you can see beluga compile itself.

Any troubles?

If you encounter any problem while installing beluga, let me know so that I can help. The version of your distro and simple description of the problem would be enough.

Repository

You can browse the repository through the web, or clone it as follows:

git clone https://github.com/mycoboco/beluga.git

If you want to contribute to the project, simply fork it and send me pull requests.

Recent commits are:and more.

Issue tracker

beluga uses github as its issue tracker. If you'd like to file a bug or ask a question, do not hesitate to post a new issue; a self-describing title or a short text to deliver your idea would be enough. Even if issues I have posted are all written in English, nothing keeps you from posting in Korean.

Open issues are:and more.

Try it out

You can play with sea-canary and the front-end of beluga below; both were built by beluga itself. Not all common headers you can find on a Unix-like system are available in the sandbox; only 15 standard headers provided by C90 are allowed to be #included. The code you give this site is not saved on the server.

For simplicity's sake, both sea-canary and beluga for this try-it-out are configured to stop after 5 errors encountered (with saying "too many errors"), and are restricted to handle up to about 2MB-sized code. These are, of course, limitations only for this page and not from the implementations per se.

License

I do not wholly hold the copyright of this package; beluga is written based on lcc whose copyright is specified in LICENSE.lcc.


To parts I added or modified (that include an integrated preprocessor) and the programs I wrote from the scratch the following copyright applies:

Copyright (C) 2015-2017 by Jun Woong.

The package was written so as to conform with the Standard C published by ISO 9899:1990 and ISO 9899:1999.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Questions or suggestions?

I'd be glad to hear your opinion. Do not hesitate to contact me via email (woong.jun at gmail.com).