Porting Guide

From Open Watcom

Revision as of 22:56, 27 January 2006; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

If you wish to port Open Watcom to support a new platform, that is, a new CPU architecture and/or operating system, the following outlines a suggested sequence of steps. Note that it is only a suggestion; it lists the steps that need to be done to fully support a new platform. If full support isn't required, some steps may be omitted. Also, the sequence is only a suggestion and developers may have good reasons to change the ordering.

  • Write a disassembler - that is, port wdis to the new platform. For RISC platforms with orthogonal instruction sets, this is pretty easy (about a day of work to get decent disassembly). If the object format is not supported (ie. not 32-bit ELF/COFF or OMF), it might take significantly longer and the ORL (Object Reader Library) will need to be extended first. A side benefit of this step is that it will give you a good understanding (if you don't already have it) of the target instruction set, which will come in very handy later.
  • Port the debugger. You'll be glad you did. If the platform is Linux based, it should be quite straightforward and the trap file will likely only require minor modifications. If it's some other Unix, it still shouldn't be too hard. If it's something altogether different, it could be a larger effort. If the debug format isn't supported (ie. not DWARF), that will take additional time. The debugger will reuse the disassembler you ported earlier. The trickiest part will be porting the MAD (Machine Architecture Description) which has knowledge of the target architecture's calling standards and basic type representation.

At this point, you should be able to remotely debug to your target platform, and if it's Unix, it shouldn't be too hard to port the entire debugger front end to it using the platform's native compiler. If the only other debugging alternative is gdb or no debugger at all, porting wd will save you tons of time later on. These first two steps may be quite useful on their own even if porting of other tools isn't required.

  • Port the assembler. Again if it's a RISC platform, and if it doesn't have a "smart" assembler like MIPS for instance (with lots of pseudo-instructions), that's not such a huge amount of work. And again if the object format is unsupported (not 32-bit ELF/COFF), expect additional effort extending the OWL (Object Writer Library). The assembler doesn't have to be complete, but it should be good enough to generate the few assembly routines for the C runtime library (clib) -- things like longjmp aren't really doable in C, and support routines for long division etc. are good candidates for assembly as well. The inline assembler will be reused later by the C and C++ compilers (for #pragma aux and __asm support), which may make interfacing with system calls very easy (see bld/clib/linux/h/sysmips.h for an example).
  • Port the C front end. This should not be too difficult; the biggest platform specific thing to worry about will be functions with variable arguments, and it's almost a given those will cause headaches -- the implementation varies wildly among platforms. The C compiler will be an empty shell at this point, but you'll need it to exercise the codegen.
  • Now comes the really hard part, the codegen. You'll reuse what you learned earlier when porting the assembler. If the target is an orthogonal RISC platform, you will be able to clone one of the existing RISC codegens. If it's something weird or, god forbid, something as awful as x86, you have your work cut out for you. This part is by far the most difficult because it requires understanding of how the codegen works. Taking a quick look at bld/cg/doc/mipsnotes.txt may be a good idea. Every CPU architecture seems to have its quirks (delay slots, not so orthogonal instruction sets, alignment and/or operand size restrictions, and who knows what sorts of other weirdness) and there are no generic solutions. Note that vast swathes of the codegen won't require any changes whatsoever as they're completely generic; the code that translates intermediate code (a kind of portable pseudo-assembly) into machine code is where most of the effort will be concentrated.
  • Port the clib. Again if the target is a Linux or Unix platform, it's probably not going to be terribly difficult. If it's something oddball that doesn't even remotely resemble POSIX, expect additional work. This step will probably overlap with the codegen port. Getting as far as puts()-based hello world is not *that* difficult, but the fun starts with printf(); math support is also guaranteed to be "interesting".
  • Port the linker. Same notes apply as earlier -- if the object and executable format is supported (ELF, PE, etc. etc.), most of the work's done already. If it's something else, there's no telling how much effort exactly it'll take.

By now you should be able to generate basic executables, at first using syscalls directly and later using the clib. Now you will really appreciate the debugger because the codegen will be buggy and the programs are going to die a lot. You may also wish to work on the linker before the codegen, and write first trivial executables in assembler.

  • Run (and pass) tests. At the end of this step, the compiler should be able to pass the tests in bld/ctest/regress and -- quite possibly with extra tweaking needed -- the tests in bld/clib/qa. The tests are at this point not 100% comprehensive. However, if they all pass, that means you have a fairly solid compiler.

At this point it may also be instructive to start cross compiling some utilities for the target platform -- disassembler, linker, etc. Once the port is solid enough, you should be able to generate a functioning compiler. By the time that compiler is good enough to properly compile itself, you should have a pretty solid codegen plus tools and basic runtime library support.

  • Optionally continue with the C++ and perhaps F77 compiler and other tools. The profiler is especially good candidate, and if you have ported the debugger, you're almost done with the profiler as it reuses many of the components. Most of the command line tools are easy to port, but the GUI tools are entirely different kettle of fish. Porting all the GUI tools might conceivably require as much effort as porting the compiler itself.

Many of these steps can be done in parallel, for instance disassembler/debugger and assembler are independent of each other. There are also two distinct porting strategies: Either port the OW tools to the target platform first, using the native compiler, or cross compile from a supported platform. I'd recommend the latter approach, but it's a question of personal preference and other project constraints. The host tools need to be solid, because otherwise you'll be juggling far too many balls at the same time. Porting the codegen/clib/linker/etc. all at the same time is hard enough. It is recommended to use the target platform's tools at least in the initial stages -- eg. generate your own object files but use the existing linker. This is applicable in the other direction as well -- it's possible to port eg. the linker first and use existing tools to generate object files. It is usually a very good idea to ensure interoperability with the platform's "native" tools.

Needless to say, the above assumes that the target platform runs an actual OS. If it's some kind of restricted embedded system, the work will be in some ways easier (perhaps no need to port clib) and in other ways much harder; it will be more difficult to test what you've done, and there will be no option to fall back to "known good" tools.

Personal tools