From Open Watcom
←Older revision | Newer revision→
The wasm assembler was created as a side effect of the inline assembly support in Watcom C/C++ compilers. Early versions of Watcom C supported the powerful
#pragma aux, but did not include an inline assembler. Inline code had to be entered as a series of hexadecimal values. This feature was obviously not very easy to use.
Version 9.0 of Watcom compilers added support for inline assembly language. This required an implementation of a significant subset of a typical x86 assembler. The inline assembler can parse and encode x86 instructions, understands all addressing modes, handles expressions, and supports several directives.
Version 10.0 of Watcom C/C++ shipped with wasm version 1.0, a simple standalone x86 assembler. The assembler was modeled after Microsoft MASM, but only supported a very limited set of directives. An important feature of wasm that is not found in other assemblers such as MASM or TASM was support for memory models and CPU selection on the command line. This made wasm ideal for use in building the Watcom runtime libraries, which often need to build a number of object files from the same source, differring only in memory model and sometimes CPU selection.
The Watcom build system, however, used a number of assemblers, mostly as an accident of history. MASM 5.0 and 5.1 was used by several projects, as well as SLR Systems' OPTASM. One of the first tasks before Open Watcom source code was released was converting all projects to use wasm. In some cases this was as easy as modifying the relevant makefile, in other cases the source files needed to be massaged.
Open Source Development
After Open Watcom source was made available, wasm was significantly enhanced. The released source code only supported instruction sets up to Pentium MMX. Support for new PIII and P4 instructions was added, as well as support for 3DNow!, SSE and SSE2 instruction sets. Compatibility with MASM was much improved, although it remains very imperfect.
The goal is to make wasm as compatible with Microsoft MASM as possible. MASM 6.1 documentation should be taken as a basis, and behaviour should be equivalent to MASM 6.11 or newer. There have been very few significant changes in MASM since version 6.1 (the last commercial release), but 6.0 did have a few differences. There are several reasons why MASM was chosen as the 'gold standard' for wasm development:
- MASM is a very widely used assembler which many programmers are familiar with
- There is an enormous amount of assembly code written for MASM
- The MASM language is reasonably well documented
- MASM supports all features of x86 and OMF
- MASM is still being updated and new versions are shipped as part of Microsoft DDKs
- MASM is what wasm was written to be compatible with from the beginning
While the MASM language is not the most straightforward dialect of x86 assembly, the above benefits far outweigh this drawback.
- Instruction parsing and encoding works very well and is almost 100% match for MASM. One known area of difference is handling of default segments for external symbols; wasm currently disregards segment information for EXTERN symbols in some cases. This should be fixed, but is not likely to be much of a problem in practice; there are differences in this area between MASM 6.x and other assemblers, including MASM 5.x. There is probably little code out there that relies on the MASM 6.x semantics.
- Conditional assembly and include files work well. However, text equate expansion is problematic, not least because it is poorly documented in MASM manuals.
- A large number of MASM directives and operators is unimplemented. Some are accepted but a warning is emitted (mostly listing directives), others will produce an error. Many of the unimplemented directives are for "high level" assembly, such as .IF/.ELSE/.ENDIF, .REPEAT/.UNTIL/.WHILE, TYPEDEF, PROTO, or INVOKE. These are not often used but if they are, rewriting the source may require significant effort.
- Macro support is not great. Simple macros work well but complicated macros do not. Part of the problem is that the MASM preprocessor is poorly documented by Microsoft.
- Most MASM predefined symbols (@Cpu, @CurSeg, @data, @Date, etc.) are not implemented. These are probably not often used but most of them should be easy to implement.
- The internal representation of symbols and expressions needs work. MASM operators such as TYPE or .TYPE/OPATTR provide a fair amount of insight into how MASM works internally. Replicating their behavior is likely to generally improve MASM compatibility as a side effect.
- No listing files are generated. Producing full listings may be a waste of effort because wdis (the Open Watcom disassembler) does a very good job. However, it could be extremely helpful to produce a dump of the internal symbol table the way MASM does, especially for diagnostic purposes.
- Only OMF output is generated. This is not a problem when wasm is used together with other Open Watcom tools, but could be an issue for people wishing to use it with other toolchains. While OMF output support is required because neither COFF nor ELF support all x86 features, support for the latter two formats would be very useful. This support should use the OWL (Object Writer Library) used by other Open Watcom tools.
- There is no real documentation. This is a problem because there is no documentation specific to wasm, and there is no list of differences from MASM that could be used as a fallback method.
Design of a New wasm
After many discussions, a need for a new, or at least much improved, version of wasm has emerged. Some of the design goals are:
- Provide compatibility with Microsoft MASM and Borland TASM. The MASM syntax may be somewhat weird, but a great number of programmers is familiar with it and an enormous body of existing MASM source code exists. In addition, MASM 6.x is a powerful assembler in its own, with many high-level features. TASM is likewise a powerful assembler that many programmers know, with syntax similar to MASM but cleaner. There is currently no free, open source, and portable clone of either MASM or TASM. Emulating MASM is logical because that's what wasm currently does; however, current wasm accepts a strange mix of MASM 6.x and 5.x syntax, not being quite compatible with either.
- Write the assembler in a modular fashion so that code could be reused for inline assembly within the Open Watcom compilers, and possibly reused for a 64-bit assembler.
- Support multiple output formats. Currently wasm supports only OMF. Support for OMF has to be retained because it's the most powerful x86 object format, but support for ELF, COFF and possibly Mach-O would be extremely helpful; this implies using the OWL library. Direct binary output might also be useful for certain projects (boot loaders, simple DOS COM programs etc.).
Following is a brief estimate of reusability of current wasm source code.
- The preprocessor may be partially usable, but has to be significantly reworked. Current wasm works on text buffers; this is unworkable and an updated version needs to operate on tokenized input. C/C++ compilers should also feed tokenized input to the assembler back-end, in part because inline assembly syntax is not the same as that of standalone assembler.
- File management could be reusable, but may need improvements.
- The expression evaluator will likely need a redesign. It has to properly support the .TYPE/SYMTYPE operator and handle different operator priorities for MASM and IDEAL mode. The current implementation is not good enough.
- The symbol table manager has to support various case sensitivities and also properly handle scoping. May need to be redone from scratch.
- Instruction encoding should be reusable. This is probably the best part of current wasm.
- Output to OMF may be reusable, but could need a rework in order to support other output formats.
- The parser will likely have to be rewritten to support different syntaxes switchable at runtime.
Sources of Information
MASM documentation is available in many places on the web. One of the sources is this. The Microsoft documentation is very helpful for people learning assembly. It is unfortunately not precise enough for implementors, although it does include the BNF grammar for MASM.
Documentation for the IBM ALP assembler, which is highly compatible with MASM 6.x, has been found to be very helpful. It contains detailed syntax diagrams and is very thorough. The IBM ALP reference is normally available as OS/2 INF files, but can be also accessed on the web. This is an older version (about 3 years) than the INF file from the Toolkit.
Readers are encouraged to fill in links to comprehensive TASM documentation, if it exists.
JWasm, a fork in progress
Currently found on the JWasm page. It is still not known if the fork will find its way back to the main OW development source system, but hopefully it will, eventually. Right now this fork is the best alternative if you need a free MASM 6 compatible assembler. The fork is distributed under the original Open Watcom license. JWasm version 2.0 supports the AMD64 instruction set and is significantly more compatible with MASM than wasm is.