Advanced Debugging

From Open Watcom

Revision as of 10:09, 7 October 2006; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

A good debugger, as everyone knows, is programmer's best friend. It can save countless hours when diagnosing and analysing problems, and in some instances only a debugger makes such analysis feasible. Debuggers can also be very helpful tools for discovering and learning the structure of a complex application.

This article is not an introductory text and assumes that readers are familiar with the basic principles and concepts of interactive debugging. The aim is to present several debugging techniques that may be considered advanced and non-obvious.

Contents

Brief History and Overview

As luck would have it, Open Watcom comes with a powerful debugger. It was first introduced in Watcom C version 8.0 and up to version 9.5 of Watcom tools, the debugger was called WVIDEO (Watcom Visual Interactive Debugging Execution Overseer); in version 10.0 (1994) it underwent a major overhaul and was renamed simply to Watcom Debugger, or wd for short. The debugger comes in two varieties: console mode, text-oriented wd, and GUI version called wdw (currently only available on Windows and OS/2). The two versions are functionally identical and in the following text, wd will be used to refer to both.

Debugger Architecture

One of the debugger's greatest strengths is its flexible architecture. The major components of the debugger are:

User interface
This is the visible portion, part of the actual wd executable. The interface designed to be as generic as possible and not tied to any specific operating system, CPU architecture, debug information format, or programming language.
Trap file
At the heart of the debugger is a trap file, which allows the debugger interface to communicate with the target system. By default wd uses the 'std' trap file which talks directly to the host OS. However, other trap files can be used to provide a link to a remote system.
Machine Architecture Description
Also known as MAD, the MAD abstracts CPU specifics such as register set or instruction set. Current list of MADs includes x86, Alpha AXP, PowerPC, and MIPS.
Debug Information Processor
Also known as DIP. The DIPs present various debug information formats to the debugger in a uniform manner. Currently supported are DWARF, CodeView, Watcom, and MAPSYM.
Expression parser
This component allows the debugger to parse and evaluate expressions according to rules of various programming languages. Currently supported are C, C++, and Fortran.

Primarily for debug purposes, it is possible to build a version of the debugger that does not have the window-based user interface and only provides a simple command line, while retaining all capabilities related to remote debugging, cross-platform support, debug information parsing, and so on.

Remote Debugging

Strictly speaking, there is nothing 'advanced' about remote debugging. However, some programmers may not be aware of how useful it can be. There are two primary situations where remote debugging is the only type of debugging available:

  • cross debugging to other platforms
  • debugging of certain graphical and/or interactive applications

In the first instance, the target platform (ie. target OS and/or CPU architecture) is different from the host platform. In many cases, it is not possible to run the debugger on the target platform at all, perhaps because no keyboard or display is available, or because the target system isn't powerful enough to run a debugger. This is often the case with embedded development.

The second case is less common and occurs when the application can be run on the development system, but not debugged. That may be because it takes over the screen or input devices in a way that prevent debugger use, or simply because using the debugger significantly interferes with the application's operation. Note that in this situation, it is possible to run the application on the host system but run the debugger on another machine.

Some developers prefer to use remote debugging even when they don't strictly have to - their target platform is the same as host platform and their application doesn't preclude debugger use. However, it is often useful to have separate development and test machines, and not run the application under development on the host system.

Remote Links

There are many ways to connect the host system (ie. the system running wd) to the target system (ie. the system running the debuggee application). Not all links are supported on all platforms. The links are:

  • TCP/IP
  • Parallel link
  • Serial link
  • NetBIOS
  • Named pipes (Windows and OS/2 only)
  • VDM link (Windows and OS/2 only)

The VDM link is not remote in physical sense, because the debugger and debuggee still run on the same machine. However, logically they are remote, because the debuggee runs in a virtual DOS box and a debug server is required. The debug server is, of course, the component that runs on the remote system and listens for communications from the debugger. The debug servers are named after the link type such as tcpserv, parserv, serserv, and so on.

Note that it is possible to chain the links and have the debugger for example communicate to an intermediate system over TCP/IP, while the intermediate machine is using parallel link to the target.

Remote Setup

Experience shows that using file sharing is a good way to facilitate remote debugging. The host system must share the drive/directory containing the application and its files (but not necessarily the application's source code). The target machine maps the drive/directory and runs the application over the network. This scenario may take a bit of effort to set up initially, but is extremely easy to use afterwards because there's no need to copy files around.

In some situations, sharing files over a network may be impractical or impossible, but the application still needs to be somehow transferred to the target. Fortunately, wd has an option to do just that using the download (-do) option over the remote link. If the remote link is slow (serial link), the download may naturally take some time. However, this may still be preferable to other methods, and download over parallel link or network is fast.

On some platforms (DOS and OS/2), a utility called rfx (Remote File eXchange) is also available. It uses the debug link and provides a simple interactive (or batch driven) environment to transfer files between host and target machine. See the Debugger Guide for details.

Memory Breakpoints

Memory corruption problems are probably the most insidious sort of bugs. The reason for the unpleasantness is that the cause of the problem tends to be quite distant from and unrelated to the effect. A program is crashing or displaying other undesirable behaviour; a quick inspection reveals that the program state is not what it should be, but it's not clear why. In simpler cases, a piece of data stored in memory is getting corrupted. In the most dreaded case, heap corruption occurs.

Heap Corruption

Not infrequently, heap corruption manifests itself by crashes in the heap manager when calling functions such as free() or realloc(). That is because these functions are passed a pointer which will be dereferenced to arrive at internal bookkeeping data stored on the heap. Allocation functions like malloc() or calloc() typically only need to access data stored in a separate memory location somewhat separate from user data, and are hence much less susceptible to crashes caused by heap corruption.

Digressing a little, it may be worth pointing out that when a library function crashes, programmers are typically quick to blame the runtime itself. More often than not, the problem turns out to be caused by user code and not a library bug. I've used the Watcom tools for a very long time and supported them for several years, and I have yet to find a bug in the memory allocator. Once or twice I was convinced the heap manager had to be at fault, only to discover that it was not. That's of course not to say the runtime is 100% error free - only that the chance of an error in the runtime vs. error in user code is probably well under 1:100.

As mentioned earlier, memory corruption often manifests itself very loudly. The trouble is finding out where it occurs and why. While debuggers aren't smart enough to answer the 'why', they can be very helpful in determining the 'where'.

Setting Breakpoints

It is usually not too difficult to find the memory location that is getting corrupted. To find out where the corruption occurs, memory breakpoints can be extremely helpful. To set a memory write breakpoint on a variable (including structure members etc.), simply right click on a variable in the Locals window or the Watch window and select Break from the pop-up menu. Break on write can also be set on an arbitrary memory location from the Memory window.

Memory breakpoints, sometimes called watchpoints, can be very efficiently implemented with the help of hardware debug registers (386 and above on the x86 platforms). There is usually only very limited number of hardware watchpoints that can be set at a time, but they allow the debuggee to execute at full speed and the CPU will interrupt execution when the watched address is accessed. If hardware watchpoints aren't available, they can be implemented in software. However, that dramatically slows down the debuggee execution (debugger effectively has to single-step all code) and may not be usable.

Note that on modern operating systems, linear addresses tend to stay the same across multiple executions of a process; therefore, it is possible to set a breakpoint after the corruption occurrs, then restart the debuggee and track the writes to that location.

Setting a simple memory write breakpoint may be enough to track down the source of a problem. The debugger will stop when any write to the address is performed; in many cases, there is only a small number of writes and it's not difficult to spot the one causing the corruption.

If there are too many writes to a location, it may be useful to modify the breakpoint parameters in the Break Point dialog (Break/View All, then Modify in pop-up menu for desired breakpoint). It is possible to specify a countdown value, for example to tell the debugger to stop only after the breakpoint is hit 50 times. Another option is to specify a condition for the breakpoint. This can be any debugger expression involving variable or register values etc.

Instrumenting Code

Another method to fight memory corruption and other problems is instrumenting the application with special debug code. The simplest is the very useful assert() macro. Just as a reminder, if an application encounters assertion failure while it is run under wd, the debugger will stop at the point of the failure and display the assertion message.

When diagnosing heap corruption, the _heapchk() function can be extremely helpful. It performs a consistency check on the heap and validates its internal structures. Placing _heapchk() calls at strategic locations in the application is a good way to narrow down where heap corruption is occurring.

Executing User Code

However, wd offers another - more flexible - interactive method. Consider the following example:

#include <stdio.h>
#include <malloc.h>

/* Returns non-zero value if heap is corrupted */
int check_heap( void )
{
    int     rc = 0;

    switch( _heapchk() ) {
    case _HEAPOK:
        printf( "OK - heap is good\n" );
        break;
    case _HEAPEMPTY:
        printf( "OK - heap is empty\n" );
        break;
    case _HEAPBADBEGIN:
        printf( "ERROR - heap is damaged\n" );
        rc = -1;
        break;
    case _HEAPBADNODE:
        printf( "ERROR - bad node in heap\n" );
        rc = -1;
        break;
    }
    return( rc );
}

void main( void )
{
    char    *buffer;
    char    *dummy;
    char    old;

    buffer = (char *)malloc( 80 );
    dummy  = (char *)malloc( 1024 );
    free( buffer );
    old = *buffer;

    /* Heap is good until now */
    *buffer = -1;

    /* Now the heap is corrupted. Fix it. */
    *buffer = old;
    free( dummy );
}

You can build the example with debug information on and run the executable under wd. Now enter the following command:

call check_heap

As a reminder, you can bring up the command window either through File/Command menu, or by hitting colon (:), assuming default key bindings.

The debugger will execute the check_heap() function inside the debuggee, without disrupting the flow of execution. This should print the message

OK - heap is good

on the debuggee's console. Now step through the code and stop at the point where the heap is corrupted. Execute

call check_heap()

command (just to see that the syntax is flexible and parentheses can be used). On the console you should now see

ERROR - bad node in heap

If you step past the next statement and repeat the exercise, the heap should be good again.

Debug Instrumentation

Note that the call command can pass parameters to the user routines, and that the arguments can be arbitrary debugger expressions. See Debugger Guide for details.

The debug build of the Open Watcom code generator uses this technique to a great effect. The debug version contains a number of functions that dump various internal structures in a human-readable way. While the debugger can of course easily display the the contents of variables or structures, following a linked list or a tree (for example) in this manner is not useful. However, it may be possible to write a function that traverses such structure and dumps the most relevant contents of its nodes.

Such function can then be called at any point from inside the debugger (and remember, parameters may be passed to it). Because it is user code, there is no limit to what the function can do. It can validate a data structure, it can cross reference it with current program state, it can print to the console, it can read or write files, it can communicate over the network.

As mentioned earlier, the Open Watcom compilers use this method to facilitate debugging. The example above shows how to run consistency checks at arbitrary locations. It's likewise possible to write a routine that reads and dumps contents of internal hardware registers that are not directly accessible from the debugger. Any complex application can benefit from this technique.

Debugger Commands

Because wd presents a friendly point-and-click interface to its users, many of them may be unaware that it also supports a fairly rich and powerful command language. For casual use, there's no need to resort to the debugger's command line. However, as the above section shows, entering debugger commands directly provides additional features and power.

Expressions

One of the simplest but most useful commands is print, which may be abbreviated as ?. A very basic example follows:

? 1 + 1

Rather predictably, this will print

2

in the debugger's log window. It is also obvious that the debugger can evaluate simple arithmetic expressions. Consider the following example:

int main( int argc, char **argv )
{
    int i;

    i = 42;

    return( 0 );
}

Build the sample program with debugging information on run it under wd up to the point where the variable i has been set. Now try a few commands:

? i
? i - argc
? argv[0]

The last command printed an address. Since argv is an array of pointers, that's not unreasonable. But perhaps we'd prefer to see the string:

? {%s} argv[0]

Now let's see what the third character of the last string in the argument array is:

? {%c} argv[argc - 1][2]

The debugger also supports a number of internal variables. For instance, this might be a way to determine stack frame depth of current function on a 386 system:

? ebp - esp

Commands

There are of course many more commands. The call command has already been mentioned. To evaluate an expression, the do command (/ for short) may be used. For example

/ i = 10

will assign the value 10 to the variable i.

Most actions that can be performed through the user interface can also be accomplished via debugger commands. Let's assume for instance that you wish to set a breakpoint on a function called my_func(). You can navigate to the source file and line and set the breakpoint, or you can bring up the function list. Or you can use the break command (b):

b my_func

This may be quicker than navigating the user interface. If you're used to the vi editor, you will find the quit command (abbreviated to q) familiar. Bring up the command window with :, then enter

q

and you exit the debugger.

There are of course many more commands, but this article is not intended to be reference material. Please refer to the Debugger Guide for comprehensive description of available commands, operators, and variables.

Scripts

Last but not least, it is possible to write user scripts. Let's say you frequently need to convert expressions to hexadecimal, and do it quickly. You can create a script called hex.dbg with following contents:

* Convert expression to hex
print {%x} <1> <2> <3> <4>

The script will accept up to four arguments; note that an argument is a whitespace-delimited sequence of characters, hence

1+1

is one argument, while

1 + 1

is three arguments. To invoke the script, you can write simply

hex 33

and you'll see

0x21

printed in the debugger log window.

You may wish to examine the sample debug scripts located in the binw directory of your Open Watcom installation. Note that wd itself uses scripts to store configuration information and those scripts should not be edited manually.

Conclusion

The Open Watcom debugger, wd, is a very powerful and complex tool. To use it more effectively, programmers may find some of the techniques outlined in this article helpful. While the Debugger Guide is invaluable, some of the above methods may not be immediately obvious even to users thoroughly familiar with the reference material.

Personal tools