Compiler Architecture

From Open Watcom

Jump to: navigation, search

The architecture of Open Watcom compilers is not unusual. Like many other compilers, the compilers are separated into a language-specific front end and an architecture-specific back end. They are often referred to in shorthand; for instance cfe is the C Front End, and cg is the Code Generator or back end.

Unlike some other compilers, however, Open Watcom does not separate the front and back ends into separate executables that communicate through intermediate files. There also isn't a formalized intermediate language that would be used for this purpose. Instead, the back end is structured as a library with callable public interface.

Currently there are three front ends available: C, C++, F77. There is no reason why other languages could not be supported, such as Pascal or Ada. Around 1996, Watcom worked on a Java compiler using the same technology. The F77 front end is very old, with a lineage going back to the 1970s. The C compiler originated in the early 1980s and like the F77 front end, it was originally written in a language called WSL (Watcom Systems Language) and later converted into C. The C++ front end is far more modern and better written and was initially developed around 1991.

Contrary to what many people belive, the Watcom back end was never x86 specific, and in fact x86 was not even the first target. Remnants of Waterloo C, which targeted MVS on IBM S/370, may still be found in the Open Watcom source code. In the 1990s, Watcom added support for RISC architectures and developed an Alpha AXP code generator, intended for use with Windows NT. A very incomplete PowerPC port also exists, as well as a MIPS R3000 port. A code generator for Sun SPARC was started a long time ago but never got very far.


C Front End

The cfe lives in the bld/cc directory (C Compiler). It is an old school front end using a recursive-descent parser and a hand-written lexer. It has been recently modernized to support a number of useful C99 features. Compared to most other compilers, it is very fast and provides good diagnostics.

The cfe supports pre-compiled headers, although only macros and declarations can be used at this time, not function definitions. The data structures involved are processed in pchdr.c to convert all pointers into offsets before being written out to disk, and back to pointers when read back. Pre-compiled headers provide a significant performance boost especially when using the notoriously bloated Win32 API include files.

The preprocessor is implemented primarily in cmac1.c and cmac2.c. It is not written as cleanly as it could be. Also note that the memory management is in some cases unnecessarily complex. The compiler was initially written to run on 16-bit x86 platforms which posed an unique challenge. While the compiler can no longer be built to run on a 16-bit platform, some of the complexities still remain. On the upside, the fact that the compiler was developed to run on a memory- and performance-constrained host greatly contributes to its speed.

The top level parsing code is located in cdecl1.c. It calls other functions to parse declarations and function bodies. Program code is parsed in cstmt.c; this module processes blocks, conditionals, loops, etc.

The front end originally used quads (result/operator/operand/operand) to represent source code. For Watcom 11.0, it was converted to use a tree representation. The tree representation enables easy constant folding as well as certain semantic checks; folding is implemented in cfold.c.

Initialization of data objects is handled in cdinit.c. This code still uses quads, primarily because there no advantage was expected from switching to tree-based representation.

Once all code is parsed, checked for errors, and found acceptable, the back end will be called. The interface to the cg may be found in cgen.c. Communication with the back end is bracketed by calls to BEInit and BEFini. Note that the front end provides callback functions to the back end, most of which are in cinfo.c and cmemmgr.c.. These callbacks are used eg. to output messages or query auxiliary information about symbols that the back end might need.

There is relatively very little architecture-specific code in the cfe. The largest portion of it is support for auxiliary pragmas and x86-specific extensions (near/far/huge pointers, based pointers, etc.). Inline assembly is processed separately by libraries in the wasm directory (for x86) and as directory (for other platforms). The inline assembly support is shared by all front ends.

C++ Front End

To be written by people who know something about it.

Fortran Front End

Currently only Fortran 77 including some Watcom specific extensions is implemented.

Details soon to come.

Code Generator

The cg is kept separated from language specifics and to a large extent its interface is also architecture independent, although obviously much of its internals is heavily architecture specific. The code generator is, predictably, located in bld/cg directory.

Most of the public interface is implemented in be.c, but this module only routes calls to internal functions. Note that the cg may be built as a DLL, although this isn't currently used for shipping compilers.

As the cg is called, it builds a tree representation of the program. Instead of working with types such as int or float, the cg uses machine-specific types; note that it is written for two's-complement architectures with 8-bit bytes. The cg uses types such as U1 (unsigned, one byte ie. 8 bits), I4 (signed, 4 bytes ie. 32 bits), FS (single-precision IEEE float) or PT (pointer).

The tree representation is again const folded and a number of language- and architecture-independent optimizations are applied, such as strength reduction. Much of this functionality is implemented in treefold.c.

The tree representation is also used for inline expansion of functions. The front end decides which functions are to be inlined based on program source code and selected options. When a call is made to an inlined functions, the cg will call the fe and ask it to process the callee. The expanded function becomes part of the tree representation of the caller.

Eventually the tree is converted into a sequence of pseudo-assembler instructions. This is a medium to low level language that is suitable for many optimizations and can be turned into machine code relatively easily. It uses an orthogonal result/instruction/operands representation. Results and operands are 'names', which may be constants, temporaries, memory locations, etc. During code generations, all references to temporaries are turned into accesses to registers or stack. All instructions are reduced to a form that can be turned into a single machine instruction, or in rare cases a sequence of several instructions.

A large portion of the code generator is architecture independent and is shared by all targets, 16-bit or 32-bit, x86 or otherwise. Architecture specific for IA-32 lives under bld/cg/intel, again with large part shared and separate 16-bit (bld/cg/intel/i86) and 32-bit (bld/cg/intel/386) portions. The RISC support is structured similarly, with bld/cg/risc containing the shared code and architecture-specific subdirectories underneath it. The S/370 support in bld/cg/s37 is unmaintained and does not build.

Architecture-specific subdirectories contain all low level code, which handles several aspects of code generation. Reduction tables provide recipes how to turn pseudo-assembly instructions into generatable code, including some low level optimizations (eg. turning multiplies into shifts, eliminating moves where source and destination is identical, etc.). The tables are supported by custom reduction functions which are used for instance to turn long arithmetic into a sequence of instructions working with shorter operands; doing this instead of calling runtime support library functions enables further optimization.

An important part of architecture-specific code is specification of calling conventions, which tend to vary among platforms, and where rules for variadic functions or passing arguments in registers tend to be fairly complex. Machine register descriptions are likewise provided, supplying information such as overlap between eax/ax/ah/al registers on IA-32. It is worth noting that x87 floating-point registers are handled separately because the x87 stack poses extra challenge.

The architecture-specific code also encodes instructions into machine code and emits object files. Unlike some other compilers, separate assembler pass is not used, which contributes to the speed of the compiler. The x86 target has hardcoded support for OMF output while the RISC backend calls the OWL (Object Writer Library) which provides ELF or COFF output. The OWL is also used by RISC assemblers (under bld/as) and complemented by ORL (Object Reader Library) used by many other tools (linker, librarian, disassembler, etc.).

The x86 cg is more refined than the RISC support and also noticeably more complex in order to provide support for the segmented x86 architecture in both 16-bit and 32-bit mode. All targets have inline assembly support, preferably using the #pragma aux syntax. The cg views inlined assembler code as a function call which is associated with an opaque block of code to be inserted directly into object code.

Most of the code generator's interesting work happens in generate.c, which performs most optimizations, including loop optimizations, scoreboarding, instruction scheduling, code hoisting, straightening and rearranging of basic blocks, dead code elimination, peephole optimizations, and so on. This module also calls the register allocator which goes through the reduction tables mentioned above and turns instructions into generatable form.

Personal tools