Wgml Sequencing

From Open Watcom

Revision as of 20:05, 29 December 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Introduction

This page is intended to consolidate information developed while working on other topics on the sequence in which wgml 4.0 performs various actions. Although some "rounding out" of the topics is unavoidable, a comprehensive discussion of these topics lies in the future.

Duplicating the steps shown is mandatory only to the extent that following the same sequence as wgml 4.0 is needed to ensure that our wgml produces the same output file from the same input.

From time to time, statements are made about where and when text output occurred. This always refers to output intended to be part of the document, as opposed to the control codes or, in the test framework, identifying text, emitted as a result of interpreting the various compiled function blocks. It might be wondered how it was possible to be certain that no text output occurred when that output included space characters. These steps were taken to ensure accuracy in this matter:

  1. The :DEVICE block was given an :OUTTRANS block which translated " " to "|".
  2. Only %image() (never %text()) was used for the function block output, and any embedded spaces were not converted.
  3. All text output was interpreted.

Thus, spaces intended to appear in the document when printed appear as "|" characters and were quite obvious -- as was their absence.

The test framework used implemented all of the function blocks. When a block is identified as not being interpreted when it is expected to be, that applies to the situation when that block exists: that is, the block exists, a context exists in which it is usually interpreted, and yet in this particular case it is not.

Blocks which do not exist, of course, cannot be interpreted. Every statement that a block is interpreted at a particular point must be understood as qualified by "if that block exists". Unless otherwise noted, the effect of a block not existing is identical to it existing and doing nothing whatsoever when interpreted.

Startup and General Processing

This section discusses what can be said, from the evidence available and from reasonable hypotheses, of how wgml 4.0 starts up document processing. A few insights into the general course of document processing are also documented.

The Initial Startup Sequence

The startup sequence appears to be:

  1. Process the command line and other startup activities (hypothetical, but quite likely).
  2. Extract the information for the specified device from the binary device library.
  3. Interpret the START :PAUSE block.
  4. Interpret the START :INIT block.
  5. If more than one document pass was specified, display "pass #1".
  6. Find the document specification.
  7. Make the document specification the current file.
  8. Find any layout file specified by the command line option LAYOUT and make it the current file.
  9. Process the layout.
    1. Do the "layout-processing sequence" for the current file.
    2. If the current file is not the document specification, make the document specification the current file and do the "layout-processing sequence" for it.
  10. The document specification is once again the current file, and the next item to be input is the first byte after the :eLAYOUT. tag or, if there was no :LAYOUT section, the first byte of the file. Any included files will be opened and made the current file as they are called for during the rest of the processing.
  11. Begin formatting the document.
  12. Interpret the DOCUMENT :PAUSE block.
  13. Interpret the DOCUMENT :INIT block.
  14. Perform an implicit %enterfont().

The "layout-processing sequence" is:

  1. Start with the :LAYOUT section (if any) in the current file.
  2. For each included layout file:
    1. Make that file the current file.
    2. Apply this sequence.
    3. When the end of the file is reached, return to the previous file and make it the current file.
  3. When the :eLAYOUT tag is found, or the end of the original current file is reached, the sequence ends.

A recursion check might be a reasonable part of the implementation of the "layout-processing sequence".

How This Was Determined

There are several factors which determine how much detail wgml 4.0 reports during document processing. These items were used to maximize screen output in investigating sequencing:

  • The command-line option "incl" was used so that file names would be displayed as each file was used.
  • The command-line option "layout" was used to more fully show how the various files that can affect the layout were processed.
  • The command-line option "pass" was used to determine, so far as possible, what happened on each document pass.
  • The :PAUSE and :FONTPAUSE blocks were implemented so that they produced output if interpreted.
  • The blocks in the :DRIVER block were implemented so that they could be clearly identified in the output file.

Now consider this output from wgml 4.0:

WATCOM Script/GML V4.0 Copyright by WATCOM International Corp. 1985,1993.
Processing device information
*** START PAUSE block.
pass #1
Current file is 'e:\progdev\cpp\owtest\wgml\docs\plain.gml'
Current file is 'e:\progdev\cpp\owtest\wgml\docs\testlay.gml'
Processing layout
Current file is 'e:\progdev\cpp\owtest\wgml\docs\plain.gml'
Formatting document
*** DOCUMENT PAUSE block.
*** FONTPAUSE pause01.

examination of the output file shows these blocks in this order:

  1. The START :INIT block :VALUE block
  2. The START :INIT block :FONTVALUE block (multiple instances)
  3. The DOCUMENT :INIT block :VALUE block
  4. The DOCUMENT :INIT block :FONTVALUE block (multiple instances)
  5. The :FONTSWITCH block :STARTVALUE block for :DEFAULTFONT 0
  6. The :FONTSTYLE block :STARTVALUE block for :DEFAULTFONT 0
  7. The :FONTSTYLE block :LINEPROC block :STARTVALUE block for :DEFAULTFONT 0
  8. The :FONTSTYLE block :LINEPROC block :FIRSTWORD block for :DEFAULTFONT 0

By using %setsymbol() and %image(%getstrsymbol()) (which returns a non-null result only when after the %setsymbol(), and so the block it is in, has been interpreted) it can be shown that the blocks are interpreted in this order:

  1. The START :PAUSE block
  2. The START :INIT block
  3. The DOCUMENT :PAUSE block
  4. The DOCUMENT :INIT block
  5. The :FONTPAUSE block for :DEFAULTFONT 0
  6. The :FONTSWITCH block :STARTVALUE block for :DEFAULTFONT 0
  7. The :FONTSTYLE block :STARTVALUE block for :DEFAULTFONT 0
  8. The :FONTSTYLE block :LINEPROC block :STARTVALUE block for :DEFAULTFONT 0
  9. The :FONTSTYLE block :LINEPROC block :FIRSTWORD block for :DEFAULTFONT 0

It is, of course, this analysis that produced the bulk of the sequence given above.

The multiple instances of the :INIT block :FONTVALUE block are explored in more detail here.

Supplemental Tests

These tests helped in determining the relative order in which certain actions occurred, and in justifying treating certain sequences as independent of the text line output sequence.

If an invalid document specification is used, then the screen output is:

Processing device information
*** START PAUSE block.
****ERROR**** IO--001: For file 'none'
                       System message is 'No such file or directory'
                       Cannot open file

the output file contains:

  1. The START :INIT block :VALUE block
  2. The START :INIT block :FONTVALUE block (multiple instances)

This shows that the START blocks are done before wgml 4.0 attempts to locate the document specification.

If a large amount of text, a large number of document passes, and the "Pause/Break" key on the keyboard are used with a device configured to write to an OS/2 printer from the DOS version of wgml 4.0, so that OS/2's timeout for DOS printing causes two files to be captured. The first contains exactly what the sequence above indicates, down to and including the implicit %enterfont(); the second begins with the initial vertical positioning. This shows that the sequence above is output to the device on the first document pass -- and that the rest of the output to the device is done on the last document pass.

If the :LINEPROC 1 of :DEFAULTFONT 0 contains only the line pass number, then wgml 4.0 does emit the message

Abnormal program termination: Memory protection fault

but only after the sequence above has completed. It has no problems before that point with attempting to interpret the :LINEPROC block :STARTVALUE or :FIRSTWORD block as part of the explicit or implicit invocation of device function %enterfont(). This implies that the sequence shown above is indeed independent of the normal sequencing for text line output.

General Processing

Examination of the screen output (from the :DEVICE block) from wgml 4.0 when more than one document pass is specified produces some additional information:

  • The current "pass #" is emitted at the start of each document pass.
  • The same files are opened on each document pass in the same order, including any layout file specified on the command line or in an option file.
  • The layout, however, is only processed (that is, the message stating that it is being processed only appears) on document pass 1.
  • The DOCUMENT :PAUSE block, DOCUMENT :INIT block and virtual %enterfont(0) are done on document pass 1 (that is, the DOCUMENT :PAUSE block only appears on document pass 1).
  • The :FONTPAUSE block for :DEFAULTFONT 0 which is part of the implicit %enterfont() only appears on document pass 1.
  • The remaining :PAUSE and :FONTPAUSE blocks do not appear until the last document pass is done.

The Last Document Pass

This section uses concepts which are developed here and here.

The last document pass is when the bulk of the output file is produced. It is not clear whether or not the various function blocks are evaluated on each document pass. The strongest reason for believing this, that the global sysmbol table must have the same content at the same point on each document pass, turns out to be incorrect. This can be seen by entering these lines in any convenient document specification:

:P.&suzy.
:SET symbol='suzy' value='tom'
:P.&suzy.

and apply wgml 4.0 to it, the result is:

&suzy.

tom

with one document pass. But with two or more document passes, the result is:

tom

tom

from which it can be clearly seen that the global symbol table has the same entries at the start of one document pass (except the first) as it did at the end of the prior document pass.

It is not clear if any other reason to evaluate each block (or some blocks) on each document pass exists. For this reason, our wgml will, at least initially, only interpret each block on one document pass (some, as indicated above, on the first document pass, and the remainder on the last document pass).

This action is observed at the start of each major document section:

  1. Perform the initial vertical positioning.

and this actions are observed, immediately after the first instance of the initial vertical positioning, at the start of the last document pass:

  1. Set the left margin.
  2. Output the first text line.

The latter two steps occur only once, in whichever major document section occurs first.

These steps will now be discussed in greater detail.

The Observed Behavior

This section documents the behavior observed in wgml 4.0 as of the last time it was updated.

Initial Vertical Positioning

The initial vertical positioning occurs at the very start of these major document sections:

  • these sub-sections of the :FRONTM section:
    • the :TITLEP section;
    • the :ABSTRACT section;
    • the :PREFACE section;
    • the :FIGLIST section;
  • the :BODY section;
  • the :APPENDIX section;
  • the :BACKM section, as such;
  • the :BACKM section :INDEX. sub-section.

In addition to the document sections listed, a :FIG with the value "top" for attribute place has also been observed using the same procedure, at least when no "top banner" was specified so that the :FIG was the first element on the page. Other situations where this procedure occurs may be found when side-by-side comparisons of wgml 4.0 and our wgml become possible.

When using a struct to hold the contents of a page before outputting it was explored in the research code, it quickly became apparent that all of the above instances had one thing in common: they all (except the "first instance", discussed next) required the output of a page that was not completely full of text before the start of the section or the appearance of the :FIG.

Not all documents will contain all of these sections, of course. The "first instance" occurs when the first section that exists in the document specification is found; each "subsequent instance" occurs when another section (or :FIG tag, as noted above) is found, and only for those that exist in the document specification.

The first instance of the initial vertical positioning is a straightforward adaptation of the normal vertical positioning sequence. Setting the value of field current_state.y_address and the value returned by device function %y_address() to the value of attribute y_start of the :PAGESTART block, and the value of field desired_state.y_address to the desired value and then applying the normal vertical positioning sequence will produce the observed effects. However, actual use of the normal vertical positioning sequence must only happen if the initial vertical value is different from the value of attribute y_start in the :PAGESTART block, that is, only when actual movement to a new line is needed. Note: at the time this was changed to it present form, our wgml was initializing the vertical position to "1" and then placing the first line on "2" for character devices, making it impossible to completely verify correct behavior. The above still reflects the implementation, as it seems quite reasonable and produced results which might have matched wgml 4.0 had wgml 4.0 placed the first text line on line 2.

The subsequent instances of the initial vertical positioning first produce a new document page and then performs the the normal vertical positioning. The :NEWPAGE block and any :NEWLINE blocks which are interpreted are peculiar in that, if any device function %text() instances are executed, the output translation will be that of the default font (available font 0) without regard to the value returned by device function %font_number() and with no effect on whether or not a font switch occurs as part of outputting the next line of text. This effect disappears immediately after the last :NEWLINE block (if any) is interpreted.

This makes it impossible to use the variable holding the value returned by device function %font_number() to designate the font to use with :OUTTRANS. The field current_state.font_number cannot be used because it must retain its old value long enough to trigger a font switch (if appropriate). A different variable, which tracks the one used with %font_number() except when a new section is entered, was needed. For reasons given here, this variable is used not only to determine the font whose :OUTTRANS table is to be used but also the font whose font_height is to be used in determining how many device lines are needed (that is, as the number of device lines a single :NEWLINE with the value "1" for attribute advance actually moves).

If the :ABSOLUTEADDRESS block is defined, then, even though no :NEWLINE blocks appear, the value returned by the device function %y_address() in the :FONTSTYLE block :LINEPROC block :ENDVALUE block which now immediately follows the implicit %enterfont(0) in the case of the first instance is still found set to the desired vertical position. The actual positioning occurs when the first line of text is actually output, using the :ABSOLUTEADDRESS block (unless %dotab() is encountered and causes the :ABSOLUTEADDRESS block to be interpreted earlier).

The Flag set_margin

The flag set_margin will be "true" or "false" depending on what happened during the initial vertical positioning.

If the first instance of the initial vertical positioning did not skip any device pages, then the flag set_margin will be set to "true".

If the first instance of the initial vertical positioning did skip one or more device pages, then the value of the flag set_margin depends on two factors:

  1. Whether or not the :ABSOLUTEADDRESS block has been defined.
  2. Whether or not the :NEWPAGE block includes device function %flushpage().

If the :ABSOLUTEADDRESS block has not been defined and the :NEWPAGE block does not include device function %flushpage(), then the flag set_margin will be set to "true".

If either the :ABSOLUTEADDRESS block has been defined or the :NEWPAGE block does include device function %flushpage(), or both, then the flag set_margin will be set to "false". However, this does not mean that the :LINEPROC block :ENDVALUE block for line pass 1 for the default font does not occur. It occurs in one of three places:

  1. If both the :ABSOLUTEADDRESS block has been defined and the :NEWPAGE block does include device function %flushpage(), it occurs within the :NEWPAGE block at the point where device function %flushpage() is interpreted.
  2. If :ABSOLUTEADDRESS block has been defined but the :NEWPAGE block does not include device function %flushpage(), it occurs after the first :NEWPAGE block.
  3. If the :ABSOLUTEADDRESS block has not been defined and the :NEWPAGE block does include device function %flushpage(), it occurs within the :NEWPAGE block after the last :NEWLINE block produced by the interpretation of device function %flushpage().

Since this flag only affects the procedures for setting the margin and the indent if the first element is a :P. tag with an indent, and since those procedures are only used after the first instance of the initial vertical positioning, the flag does not affect any major document section after the first to be processed.

Establishing the Left Margin

The left margin is established in this context:

  • The font used is :DEFAULTFONT 0.
  • The value returned by device function %font_number() is "0".
  • The value returned by device function %x_address() is the value of the :PAGESTART block attribute x_start.
  • The value returned by device function %y_address() is the value set as part of the initial vertical positioning.
  • The :LINEPROC block involved is the line pass 1 block for the default font.

using this sequence:

  1. If the flag set_margin is "true":
    1. Interpret the :LINEPROC block :ENDVALUE block.
    2. Set the flag text_output to "false".
  2. Set desired_state.x_address to the value corresponding to the left margin.
  3. Set the value returned by device function %x_address() to the value of desired_state.x_address.
  4. If the flag set_margin is "true":
    1. Interpret the :LINEPROC block :STARTVALUE block.
    2. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.

The device function %dotab(), if used in the blocks interpreted in the last two steps, will cause horizontal positioning to occur. This implies that current_state.x_address has the value "0".

If device function %dotab() is used in more than one block, it will only cause horizontal positioning in the first block. Thus, in terms of the model, the value of current_state.x_address is set to the current print head position after the horizontal positioning has occurred.

The items that need to be implemented, that is, that actually matter, are:

  • Set desired_state.x_address to the value corresponding to the left margin.
  • Set the value returned by device function %x_address() to the value of desired_state.x_address.

That is, only the internal state really matters.

First Text Line

The first text line is the first line actually output to the device. This can be part of a title page, or a heading (the :H0 heading was tested), or part of a banner (one with "body" and "top" specified was tested), or a horizontal line drawn with characters defined by the :BOX block (both boxes produced by control word .bx and by tag :FIG were used), and perhaps other features not yet discovered, as well as the first line of actual document text.

In most cases tested, the line is processed per the appropriate sequences, keeping in mind that the vertical positioning has already been done and so will not be done again; any indent is processed as if it were specified in the first text_chars instance of the line.

The known exceptions are discussed in the sub-sections below.

So far as I can tell, the items that need to be implemented, that is, that actually matter, are:

  • Set desired_state.x_address to the value corresponding to the left margin plus the indent.
  • Set the value returned by device function %x_address() to the value of desired_state.x_address.

That is, only the internal state really matters.

And it seems likely that the internal state could be set to the left margin plus the indent without setting it to the left margin first. Nothing appears to be using the value of the margin as opposed to that of the margin plus indent.

:P With Indent

When tag :P is encounterd and an indent is specified, a second sequence appears, separately establishing the indent.

This is the context:

  • The font used is :DEFAULTFONT 0.
  • The value returned by device function %font_number() is "0".
  • The value returned by device function %x_address() is "0".
  • The value returned by device function %y_address() is the value set as part of the initial vertical positioning.
  • The :LINEPROC block involved is the line pass 1 block for the default font.

If device function %dotab() was encountered in establishing the left margin, the value returned by device function %x_address() will not be "0" but rather will reflect the horizontal positioning done as a result.

This is the sequence used:

  1. If the flag set_margin is "true":
    1. Interpret the :LINEPROC block :ENDVALUE block.
    2. Set the flag text_output to "false".
  2. Set desired_state.x_address to the value corresponding to the left margin plus the indent.
  3. Set the value returned by device function %x_address() to the value of desired_state.x_address.
  4. If the flag set_margin is "true":
    1. Interpret the :LINEPROC block :STARTVALUE block.
    2. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.

The device function %dotab(), if used in any of the blocks interpreted in this sequence, will cause horizontal positioning to occur, if not done previously. The value of current_state.x_address is set to the current print head position after the horizontal positioning has occurred.

Thus, the indent is established in the same sense that the margin was. After this, the line is processed per the appropriate sequences, keeping in mind that the vertical positioning has already been done and so will not be done again.

Centered Titles Requiring a Font Switch

When outcheck.c was used to emulate a centered title requiring a font switch, it turned out that wgml 4.0 does not update the value returned by device function %x_address() until after the font switch. The value computed to center the title does not occur until the initial horizontal positioning is done.

An attempt was made in outcheck.c to do this by using an "empty" text_chars at the start of the text_line with the margin as the value of its x_address field and the font of the title. However, this also produced a :HTAB block moving to the left margin. It appears that this font switch is actually a part of the sequence discussed here. Or perhaps something more complicated is happening.

Fortunately, since none of this is visible in actual devices, it can be ignored for now. The font switch still occurs as part of the normal line processing.

Additional Observations

This section holds some of the additional items observed, in case they provide any insight into how wgml 4.0 works.

An initial text line (no tags or control words between it and the :BODY tag) produces the same effect as an initial text line starting with the :P tag: the value returned by device function %x_address() goes from "0" to the value of the attribute x_start of the :PAGESTART block to the value of the left margin to the value of the left margin plus the indent given in the :LAYOUT block for the :P tag, and the line begins at that indent. Placing a ".br" on the line above it, however, eliminates the last step and the the line begins at the left margin.

The Actual Implementation

An attempt was made to implement the material in the preceding section, but reflection suggests that it is mostly unnecessary and various observations show that it is incomplete in any case. This section attempts to develop the theory which will underly the eventual actual implementation.

Our wgml performs this actions a bit earlier than wgml 4.0 does. This is shown by attempting to produce the "paragraph indent" step with control words and rather than tag P. In the other cases tested, wgml 4.0 does not establish the position the text will actually appear at until the text itself is sent for output. Our wgml, in contrast, uses the horizontal and vertical positions computed before the first control word or tag or text is encountered.

This has led to fb_position(), the function written to perform the initial positioning, to be redesigned to be done exactly once. It has also been rewritten for situations where device pages are not emitted at the start of the document (before the first text appears); as a result, the flag set_margin no longer exists.

Since this is all done with font 0, and since font 0 uses font style "plain" in all known devices, and since font style "plain" is not actually defined by any device but the default font style "plain" generated by gendev 4.1 is used, and since this version of font style "plain" produces no output whatsoever, it follows that nothing is actually emitted in real devices. So all those :LINEPROC function block invocations noted above can, in fact, be removed eventually (at present, the test devices make use of them to explore the state of the output code at various points). This leaves setting the system state and the actual initial vertical positioning (for devices not defining an :ABSOLUTEADDRESS block) as the only necessary actions.

And it might be wondered just how necessary that is -- that is, it might be wondered just why the starting position of the first text presented for output would not do just as well.

Possible Future Research

The WGML Reference discusses the various values of the various place attributes in terms of when the function blocks are interpreted, and the terminology poses some questions that may need to looked at in the future. Consider this table, where the third column summarizes the event which causes the corresponding block to be interpreted:

Block    Place          Location
:INIT    START          wgml starts processing the input source
:INIT    DOCUMENT       wgml starts processing a document
:FINISH  DOCUMENT       wgml finishes processing a document
:FINISH  END            wgml finishes processing the input source
:PAUSE   START          wgml begins processing the source input
:PAUSE   DOCUMENT       wgml begins processing the document text
:PAUSE   DOCUMENT_PAGE  the beginning of each document page
:PAUSE   DEVICE_PAGE    wgml begins a new page on the output device

The last two lines will be discussed below.

It is an open question whether or not "the input source" and "the source input" refer to the same concept; it is quite likely that they do, although, technically, the first would refer to the stream from which the input is received, and the latter to the input itself.

It is also an open question whether or not "a document" and "the document text" refer to the same concept; and this case is less clear, since "a document" might refer to the "document specification", of the "document text" is but a part. Unless, of course, what is meant by "the document text" is "the document specification text". Alternately, either or both could refer to the output file.

Whether it is worth while to investigate these questions is anybody's guess at this point.

A more interesting question is the distinction between "the input source/a source input" and "a document/the document text".

As far as the :PAUSE and :INIT blocks are concerned, the sequencing above suggests a straightforward interpretation: "input" includes everything wgml 4.0 uses to produce the document, including all command line options, while "document" refers to a subset of the "input", that is, the document specification itself.

And it has long been known (see the discussion toward the end of this section) that, if an END :FINISH block is present, any DOCUMENT :FINISH block will be ignored by wgml 4.0 -- which implies that these two events occur at the same time.

So, since the order of events is known, it is not clear that this question needs to be further examined either. Only time will tell how important it is to investigate these issues.

Turning now to DOCUMENT_PAGE and DEVICE_PAGE, the distinction between "document page" and "device page" is also a question that may or may not require further study. The WGML Reference defines a document page in this way in Section 15.10.2.1 PLACE Attribute:

A document page is the amount of output that WATCOM Script/GML 
formats for a page in the document. The document page may be 
smaller or larger than the physical page produced by the output 
device. If the page being printed is both the document page and the 
device page, the document page pause block takes precedence over 
the device page pause block.

A good example of this difference can be seen by using device TERM: generally, the document pages are longer than the screens (device pages), and the pauses are written to reflect the difference between starting a new page and continuing the current page.

This was also observed using the a very simple document specification. Each DEVICE_PAGE or DOCUMENT_PAGE :PAUSE block interpretation was paired with an interpretation of the :NEWPAGE block in the output file. When the header and footer were re-enabled, headers and footers only appeared in conjunction with a DOCUMENT_PAGE :PAUSE block.

Outputting Lines

Our wgml is intended to replace the existing wgml 4.0 in the Open Watcom documentation build system, which means that it should produce the same output as wgml 4.0 when given the same inputs. These "outputs" are text files which are used, in the case of device WHELP, as input to whlpcvt.exe, and, in the case of PS, primarily for the creation of PDF files. This means that our wgml will not only have to emit lines of text but also do so identically to wgml 4.0. Clearly, as time goes by, exactly what this involves will become clearer and clearer.

It has been noted elsewhere that the code written so far necessarily depends on a model of wgml 4.0 which may prove to be, if not actually incorrect, then less useful than alternate models which might develop in the course of implementing our wgml. That is just as true here as anywhere else; the code that results from these investigations should be regarded as a useful first draft, subject to revision as needed.

The Physical Device Model

The physical device model that works best in describing how wgml 4.0 outputs a text line that of a dot matrix printer. Even when the actual output is a disk file, the terminology still works. These are some notes on the terms used:

  • A page is a physical piece of paper, on which (a part of) the document is printed. When used of an output file, this is the text between any of
    • the start of the file and the first :NEWPAGE block;
    • any two successive :NEWPAGE blocks;
    • the final :NEWPAGE block and the end of the file.
  • A line is a specific vertical position on the page. When used of an output file, it is the location the line will have in the final product.
  • A print head is a physical device which prints a letter on the page. When used of an output file, it is a short form of print head position or print position.
  • The print head position or print position is the position of the print head. When used of an output file, it is the location the next character output will appear in the final product.
  • A line pass is the physical movement the print head over a given line on the paper. The various "overprint" font styles are created by specifying multiple line passes over the same line. In an output file, of course, each line pass appears on a separate line, but, in the final product, they will be printed on top of each other. This term is often used to designate the actions taken by wgml 4.0 during each line pass.

Text Line Model

This section is based on a very simple, quite vague, model of how wgml 4.0 produces the output stream:

The value of wgml 4.0 lies in how it uses its layout, tags, control
words, symbols, macros, and so on to produce a document. Nonetheless,
if all that processing is factored out, wgml 4.0 can be said to
produce a sequence of text lines for output, preceeded by the :INIT
block(s), followed by the :FINISH block, and with a small number of
other blocks embedded in the sequence of text lines (some startup
material, the :NEWPAGE block for document pages, the :HLINE block,
the :VLINE block, and the :DBOX block).

The rest of this section decribes the model implemented for our wgml of how wgml 4.0 processes each text line.

A text line is a linked list of these structs:

struct text_chars {
    text_chars * next;
    text_chars * prev;
    uint32_t     x_address;
    uint32_t     width;
    uint16_t     count;
    uint16_t     length;
    text_type    type;
    uint8_t      font_number;
    uint8_t      text[length];
}

This struct is said to control the sequence of characters pointed to by field text. This struct will be discussed in more detail below.

Each line of text is assembled as an instance of this struct:

struct text_line {
    text_line  * next;
    uint32_t     line_height;
    uint32_t     y_address;
    text_chars * first;
    text_chars * last;
}

through this procedure:

  1. If a text_chars instance is left over from the previous line, make it the first text_chars instance in this line, adjusting its field x_address to the appropriate starting position for the line and adjusting the remaining line length.
  2. Acquire the next text_chars instance and determine its length. If it will fit on the current line, attach it to the list, set its field x_address appropriately and adjust the remaining line length.
  3. Repeat step 3 until the new text_chars instance will not fit on the line. Save this instance (and its length) for use with step 2 of the next line.
  4. Determine how many lines the print head needs to be moved and set the field y_address to the current line plus the number of lines times the largest value of wgml_font field line_height associated with the fonts used in the line.
  5. If justification is to be done, do it now.
  6. Output the text line to the device.

Additional insights into the layout procedure can be found here.

Devices that do not define the :ABSOLUTEADDRESS block should have the same value for wgml_font field line_height in each wgml_font instance and in the text_line field as well. As noted here, this condition is satisfied by WHELP, and, given the complexity reflected here, imposing it greatly simplifies the implementation.

The fields in struct text_line are used in this way:

  • text_line.next is a pointer to the next text_line instance; this is intended to be used by the page layout code to accumulate groups of text_line instances for further processing before output.
  • text_line.line_height is used by the page layout code to contain the line height for this particular text_line instance, which in turn is intended to be used to determine whether a given text_line will appear on the currrent page or the next page.
  • text_line.y_address contains the position on the page of the line on which the text is to appear in vertical base units.
  • text_line.first is a pointer to the first text_chars instance.
  • text_line.last is a pointer to the last text_chars instance.

The fields in struct text_chars are used in this way:

  • text_chars.next is a pointer to the next text_chars instance.
  • text_chars.prev is a pointer to the previous text_chars instance.
  • text_chars.x_address contains the position on the line of the first character in the text controlled by this text_chars instance. It is given in horizontal base units.
  • text_chars.width contains the length of the text controlled by this text_chars instance. It is given in horizontal base units, and will be used to increment, among other things, the value returned by device function %x_address().
  • text_chars.count contains the number of characters in the text controlled by this text_chars instance.
  • text_chars.length contains the number of characters in the buffer allocated by this text_chars instance.
  • text_chars.type contains a value from an enum indicating the type of processing to be used with this text_chars instance. Current values are: "norm", for normal processing; "sub", for subscript processing, and "sup", for superscript processing.
  • text_chars.font_number contains the number of the available font to be used with the text controlled by this text_chars instance.
  • text_chars.text is an array of length bytes, which starts with first character in the text controlled by this text_chars instance.

It might seem that the field text_chars.text could be a pointer into a buffer containing (potentially) the entire text of the document. However, it turned out that isolating each "word" (as the WGML Reference puts it) and doing the input translation were inextricably linked, requiring the text to be copied to a different buffer anyway. That being the case, and since each text_line is a composite of multiple text_chars instances, it makes more sense to have the "different buffer" be specific to the text_chars instance rather than separate from it.

Character strings are not used because the input translation mechanism can be used to insert any byte at all, including a null byte, into the text.

It is important to notice two things that this model does not require:

  • This model does not require wgml to construct the entire page before outputting any part of it.
  • This model does not require that a text line to be output actually exist physically in a contiguous buffer.

The last point, of course, refers to the pre-output state of the text line: the result of outputting it may well be to place it into a contiguous output buffer, from which it is sent to the device (or output file).

Insights Into wgml 4.0 Layout

This section contains information developed during testing which provides insight into how wgml 4.0 does the layout for a document. It is necessarily incomplete. The file outcheck.c and functions within it will be referred to; this is a research program file designed to test some aspects of text output and, while its code may be useful as inspiration, the actual production code will likely be considerably more developed. Note that, although outcheck.c is just a research program, the text_line processing code in devfuncs.c and outbuff.c is intended to be production code, although it will probably need tuning as work continues.

These notes were compiled using command line option "script". As it happens, the rules are a bit different in most cases when option "wscript" is used instead. Some notes of the differences have been added, and others may be added in the future. The effects of option "noscript" (which is the default and so need not be explicitly used) on text output appear to be the same as that of "wcript". Of course, when "noscript" is in effect, all control words are treated as text.

When drawing horizontal lines using the characters defined by the :BOX block, the code doing layout clearly forms the entire line in a buffer and uses, in terms of the model, a single text_chars instance to control it. It also uses a special sequence to output it.

Several other instances in which a complete line, including internal spaces, is controlled by a single text_chars instance, which is never justified, exist:

  • Text in a box created with tag :FIG -- but not the text given with a :FIGCAP.
  • Index entries -- but not the page number, at least, not as part of the same text_chars instance.
  • The title "Table of Contents" -- but not the entries.
  • The title "List of Figures" -- but not the entries.

The effect at first appeared to be identical to using ".co off" and independent of how the rest of the text was being processed. This turned out to be wrong, at least for the tag :FIG; the others have not been investigated in any detail at all yet.

Exploring the effects of ".co off", boxing using ".bx", other break-inducing control words such as ".br", and justification was interesting.

First, ".co off" itself with option "script" without tabbing but with justification on:

  1. All text in the current input record are placed in a single text_chars instance, which prevents justification.
  2. Inline tags, such as :HP3., in addition to changing the font, end the current input record and start another one: a simple case will produce three text_chars instances: one containing all text before the phrase, one containing all text in the phrase, and one containing all text after the phrase. Justification does not occur even though more than one text_chars instance is present in the text_line.
  3. It appeared that the end-of-line indicator was treated as a break, and this was said to be the main effect of ".co off". However, considering how the control words work and the way in which an input record is processed, it makes more sense to say that a break occurs at the start of each line -- affecting, as all breaks do, the last output text line processed.

Under the same conditions, with option "wscript", this "extension", documented in section "14.3.33 WSCRIPT" of the WGML Reference, was seen to occur:

When .CO OFF is set, lines which exceed the line length are split into two lines.

Next, tabbing, with option "script" or "wscript" and with justification on:

  1. The tab is treated as an internal input record terminator; text on the line before the tab is not justified; how many text_chars instances it occupies depends, doubtless among other things, on whether ".co off" or ".co on" is in effect.
  2. The text following the last tab to the end of the text_line is justified between that tab position and the right margin, provided ".co on" is in effect and no break is encountered.

Finally, a break-inducing control word (in particular, ".br"):

  1. The prior text_line is not justified, no matter how many text_chars instances it contains.

Blank lines (actually, any blank logical input record) are treated differently:

  • With option "script", a blank line acts very much like an ".sk" control word: a break occurs and a line is skipped.
  • With option "wscript", a blank line is ignored. Concatentation continues.

This "extension", documented in section "14.3.33 WSCRIPT" of the WGML Reference, probably applies:

Extra blanks between words are suppressed in concatenate mode.

That is, the blank line is treated as just another blank. Initial spaces and '\t' (0x09) characters are treated the same way: no break occurs and the extra spaces are removed (with option "script" both cause a break and appear in the output).

With either option "script" or "wscript", it appears that a break is all that is needed to prevent justification of a line, even when justification is on.

A text_chars instance will be said to be "empty" if its count field contains "0". Such an instance can be used to produce horizontal positioning using a specified font. The :LINEPROC block :STARTWORD and :ENDWORD blocks will appear in their normal positions. Such instances are presumed to be produced by the wgml 4.0 layout code, so the text line output sequence need only process the text_chars instance.

Note that the empty text_chars instances discussed here only appear with option "script", not with option "wscript". Thus, with option "wscript", these lines cannot be distinguished:

this is :HP1.a test:eHP1. of highlighted phrases
this is:HP1 .a test:eHP1. of highlighted phrases
this is :HP1.a test :eHP1.of highlighted phrases
this is:HP1. a test :eHP1.of highlighted phrases

They can be with option "script" -- and, if, for the current device, the space characters of fonts 0 and 1 have different widths, they may even appear slightly different in the output.

In addition, all lines (at least on the first line pass) skip over the margin plus any indent using the font of the first word on that line, and never the font of the last word on the prior line.

The use of "empty" text_chars instances is unavoidable with option "script": consider how these tags are used:

:HP0. :HP1. :HP2. :HP3. :SF.

These are usually used to surround a phrase, starting with a non-space character. Except for the very first word in a paragraph or other layout element, this phrase will be preceded by text, which will be controlled by preceding text_chars instance. The text controlled by a text_chars instance ends with the last non-space character. This leaves a space character which has the same font as the previous text_chars instance. When that font differs from the font specified by the tags shown, then an empty text_chars instance will be needed to ensure that that space is output. The value of the x_address fields of the empty text_chars instance and the first text_chars instance of the phrase will be identical: the horizontal positioning will be done by the empty text_chars instance, none will be done by the first text_chars instance of the phrase. Note that this information is based solely on tests of :HP1 versus default text (implicit :HP0). It does, however, apply to spaces that follow the text in an :HP1 phrase but are part of the phrase: it is not unique to the default font, although that is where it is most likely to be seen in practice. The function emulate_text_output() in outcheck.c produces empty text_chars instances quite naturally.

When the font styles of the empty text_chars instance had a subsequent line pass (variants of "plain" and "bold" with three :LINEPROCs each were used), then the horizontal positioning appears on the subsequent line passes as well, at least within the text_line.

An empty text_chars instance sometimes occurs at the start of a text line. This is actually an extension of the situation discussed above: in this case, the phrase starts a new line. Investigation, including the use of "uscore" (for variety) on the very first word of text, suggests that these are the behavior depends on the line pass. For the first line pass:

  1. The first text line of each document element (:TITLE, :H0, and :P were checked) uses the available font of the first text_chars instance.
  2. The subsequent text lines in the same document element (of those tested, applies to :P only) use the available font of the last text_chars instance in the preceeding text_line instance.

Experience with outcheck.c suggests that this is not as complicated as it may appear: the very first text_chars produced from a block of text will necessarily use the available font associated with the first word in that block, for there is nothing else to use; an empty text_chars that appears at the end of a text_line will necessarily have the same font as the current phrase, and moving it to the front of the next text_line goes a long way to implementing the second item, although the details, as can be seen in emulate_text_output(), are a bit messy.

For the subsequent line passes, whether of the first text line or subsequent lines, these rules appear to apply (this was tested with :P and option "script" only):

  1. If all text_chars instancess are associated with a font style that defines a :LINEPROC instance for that block, then the same rules apply as do to first line pass.
  2. In other cases, those rules are ignored and the available font of the first text_chars instance which uses a font style which has a :LINEPROC defined for that line pass is used for the initial horizontal positioning as well as the text output (if any).

The text_line itself will still start with the empty text_chars. If that text_chars corresponds to a :FONTSTYLE containing a :LINEPROC for this pass, then the first item will be observed; if it does not, then it will be skipped as will all subsequent text_chars instances until the first text_chars which corresponds to a :FONTSTYLE containing a :LINEPROC for this pass is found, at which point the second item will be observed. At least, that will be the starting point when the subsequent line pass processing is implemented.

Again, the prior section is simply not applicable when option "wscript" is used. However, when tabbing was examined in more detail, it turned out that empty text_chars instances are produced whether "script", "wscript", or "noscript" is in effect, as documented here.

This section was tested only with "script"; however, since "wscript" causes extra spaces to be skipped, it seems likely that this behavior will not apply either, since the EOL appears to be treated as a space (see note above concerning blank lines).

Careful testing confirms that, if the document specification has a line break after the space encoded by the empty text_chars instance, then the behavior differs from that seen when no such line break exists. The actual behavior also depends on whether or not the empty text_chars has "0" as the value of its font_number field. That is to say, when the space between "first" and "sentence" is in the default font,

:P.This :HP1.is the :eHP1.first 
:HP1.sentence of:eHP1. 

then a text_chars instance which does horizontal positioning only is not produced, but rather a perfectly ordinary text_chars doing both horizontal positioning and text output, while

 :P.This :HP1.is the :eHP1.first :HP1.sentence of:eHP1. 

does produce a text_chars instance which does horizontal positioning only, and the next text_chars instance starts its text ("sentence") on the left margin. On the other hand, when the space between "is" and "the" is part of the :HP1 phrase (and so uses font style "bold"),

first:eHP1. paragraph. This :HP1.is :eHP1.
the second sentence in 

then a text_chars instance which does horizontal positioning only is produced, but the next text_chars instance starts its text ("the") further in from the left margin (wgml 4.0 prints two spaces immediately before "the"), while

first:eHP1. paragraph. This :HP1.is :eHP1.the second sentence in 

does produce a text_chars instance which does horizontal positioning only, and the next text_chars instance starts its text ("the") on the left margin. This was tested with various :DEVICEFONT and :FONTSTYLE instances: all that mattered is whether the default font (:DEFAULTFONT 0) was involved or not and whether a newline was present or not. Neither the :DEVICEFONT nor the :FONTSTYLE associated with :DEFAULTFONT 0 made any difference in the pattern shown.

Thus, intentionally or not, wgml 4.0 does distinguish between a space character and a space character followed by a newline character; it also treats the default font differently from the other fonts. However, most if not all of the above can be eliminated by simply not ending an input record with a space character. The documentation of control word .ct appears to apply to wgml 4.0: a space is added at the end of each input record, and at least some of the above may have more to do with two spaces at the end of the input record than with anything else.

When investigating tabbing, the test file showed some of the same phenomena as listed above because, as lines were re-organized, some acquired terminal space characters. When those cnaracters were removed, one situation produced an effect, and it did it for both "script" and "noscript". When the two input records

all. >This>is>a>test>line>which>is>infested>with>wgml>tabs.
And this line shows one>tab, two>>tabs, and three>>>tabs.

were processed, the first was written past the margin on one line (as is normal for tabbing in certain situations) and so the second started a new line. This line, however, was indented by two spaces from the left margin. It did this even when no space was present at the end of the first input record. When the input records were changed to

all. >This>is>a>test>line>which>is>infested>with>wgml>tabs. And
this line shows one>tab, two>>tabs, and three>>>tabs.

then the second output line started at the left margin.

Actually implementing this behavior does not appear necessary at present because:

  1. Most of these effects only appear when an input record ends in a space, and it is not clear if that occurs in the Open Watcom documentation.
  2. The effects are undesireable. The correct behavior is surely whatever the normal rules would call for. Certainly unexpected indents cannot be considered to be how things should work.

Also, with concatenation "on" this

here we
.us have a
highlighted phrase

and this

here we:hp1. have a:ehp1. highlighted phrase

are, when tested using "script", identical, even though the first uses a newline to indicate the end of the highlighted phrase and the second does not. (Note: this was originally mentioned in case it throws some light on what happens when an input record ends in the middle of a line. It is not clear that it does. Normal processing, implemented by our wgml, splits input records at each tag, so it is not clear that the two are all that different internally.)

It should be noted that the example above will not produce the same result with .co off: the first example will appear as three separate lines, while the second will be a single line, whether "script" or "wscript" is in effect.

When the text the text controlled by a text_chars instance is too long to fit on the current line, under some conditions it is split instead of being moved to the next line. When this happens, the original text_chars instance now controls either just enough text to fit on the current text_line instance or just enough to fit if followed by a hyphen. The rest of the text is controlled by a new text_chars instance and, if that text satisfies the conditions for being split, it will be split in turn. This will continue until a text_chars instance is produced whose text does not satisfy the conditions for being split.

The conditions which cause the text controlled by the current text_chars instance to be split when it will not fit on the current line vary depending on other factors:

  1. If wgml tabs are part of the sequence, then the entire sequence will be part of the current text_line instance, separated out by tab stop: it will not be split. This is normal for wgml tabs.
  2. If .co is "off", the normal behavior for this condition, which depends on whether "script" or "wscript" (or "noscript") is in effect, is seen:
    1. If "script" is in effect, then the entire sequence will be part of the current text_line instance as-is: it will not be split.
    2. If "wscript" (or "noscript") is in effect, then the sequence will be added to the current text_line instance and split at the end of the line. No hyphenation occurs; the last character on the line will be the last one that will fit.
  3. If .co is "on", the result is affected by the way that constructs like "mid:hp1.dle:ehp1." are treated: they are treated, in terms of whether they will fit on the current line or must be moved to the next line, as if they were a single word. The observed behavior is consistent with a model in which each text_chars instance is processed in turn:
    1. If the current text_chars instance will fit if it is placed at the start of the next line, then the current text_line instance is sent for output processing, a new text_line instance is obtained, and the text_chars instance is used to start the new text_line instance. It is not split in this case.
    2. If the current text_chars instance will not fit even if placed at the start of a new text_line instance, and there is at least one space between the start of the text it controls and the last character of the text controlled by the last text_chars instance in the current text_line instance, then it is appended to the current text_line instance and split one hyphen's width (in the current font) from the end. A text_chars instance controlleing a single hyphen is then inserted at the end of the current text_line instance. The current text_line instance is then sent for output processing, a new text_line instance is obtained, and the new text_chars instance is used to start the new text_line instance.
    3. If "wscript" or "noscript" is in effect and the current text_chars instance follows immediately after the last text_chars instance in the current text_lines instance, then if the total length of the "word" formed by all such text_chars instances will not fit on the current text_line instance, then the entire "word" is removed from the current text_line instance, which is sent for output processing, and used to start a new text_line instance. If the entire "word" is too long to fit on that new text_line instance, then it is split and hyphenated.
    4. If "script" is in effect, then the entire "word" formed by a contiguous sequence of text_chars instances is treated exactly as it would be treated if it were controlled by a single text_chars sequence.

There are some aspects of hyphenation that need to be mentioned:

  • For these points, it does not matter whether "script" or "wscript" (or "noscript") is in effect:
    • No attempt is made to find stems or other "proper" hyphenation points.
    • The width of a superscripted or subscripted word or phrase is computed as if those characters were to be displayed full-size. This may cause the line, when displayed, to not end at the right margin if the display width is less than the computed width. It should be possible to avoid this by always placing the subscript or superscript inside a highlighted phrase which uses a font which has been scaled to match the size actually displayed.
    • The hyphen is put into its own text_chars instance which is placed at the end of the text_line instance. The hyphen is always in the default font and is never subscripted nor superscripted.
  • When a "word" spread over more than one text_chars instance is involved and "wscript" or "noscript" is in effect, two additional points were noted:
    • If the text_chars instances which are part of the "word" which were processed before the current text_chars were moved to a new text_line instance, or were previously split so that the current text_line contains only a part of the "word", then the current text_chars instance is processed based entirely on the current text_line. As a practical matter, this can result in a "word" which is controlled entirely by a single text_chars instance ending up being placed on the current text_line instance and split there while simply placing part of it into a highlighted phrase or making part of it a subscript or superscript will cause it to be placed on a new line because it is too long for the current line but not too long to fit on a line by itself. This difference is quite visible.
    • If, in identifying the text_chars instances already a part of the current text_line instance which are part of the same "word" as the current text_chars instance, it turns out that all of the text_chars instances, that is, the entire line, is part of that "word", then the current text_chars is added to the currrent text_line and split: a blank line is not formed just because the "word" will not fit on the current text_line; the current text_line must begin with at least one text_chars instance that controls text that is not part of the "word".
  • The processing with ".co on" plus "script" would, if implemented, require either the abandonment of processing each text_chars individually (which works with ".co off" and with ".co on" and "wscript" or "noscript") or the ability to recall a text_line instance already committed and reprocess it. The first alternative, in the sense of forming and lining all text_chars instances in a "paragraph" (that is, the input text between two "breaks") and then forming them into text_line instances as a final step when the "break" is signalled, somehow grouping the text_chars instances forming a single "word" together and then deciding what to do, is probably better. However, since our wgml is unlikely to implement script and since outcheck.c is merely a research program, implementing this behavior can be postponed, at least for now.

Introducing wgml tabs into long words generally worked in wgml 4.0 as expected: the entire long word was placed on one line without being split. However, it also produced a variety of strange effects, some of which made no sense at all; for example, under certain conditions, a long word not containing a wgml tab separated from a prior long word containing a wgml tab nonetheless was not split at the end of the line: the space between "counted" when it came to placing the second word on its own line, but not when it came to halting the influence of the wgml tab.

The function oc_process_text() illustrates how this information can be used to produce a reasonable text_line for output. Some of the stranger effects of mixing tabs with long words are not implemented, in part because the conditions under which they occurred were hard to determine, but mostly because not doing them makes the output look better. It should be noted that realistic instances of long words were only produceable by telling wgml4 to use 5 or more columns per page. This made long words such as "persistantly" too long to fit in a column. Since only the indexes appear to use columns, the OW documents should not need an exact replication of the wgml 4.0 effects.

The control words .hw and .hy are not used in the Open Watcom documents, and preliminary testing suggests that .hy is not, in fact, implemented in wgml 4.0: the value of system symbol syshy is "OFF" initially and remains "OFF" even if ".hy ON" or ".hy USER" is encountered.

If the information in wgml Fonts is carefully considered, then it is apparent that:

  • input translation occurs before width computation; and
  • width computation occurs before output translation.

Indeed, those steps must occur as part of forming the text_chars instances into a text_line instance. Output translation must occur much later, when the text is placed into the output buffer and so is not actually part of the layout process at all.

Program Context Model

This section documents the program context used in the existing code.

A few definitions are needed, in no particular order at present:

  • The phrase text line will refer to text, whether resulting from ordinary text, a title page, a banner, a title, or any other named feature, which is to be printed out on the same line of a page by the device.
  • The phrase initial vertical positioning will refer to the establishment of the vertical position of the first text line in the output document.
  • The phrase initial horizontal positioning will refer to the horizontal location specified by the first text_chars instance in a text_line instance.
  • The phrase internal horizontal positioning will refer to the space between text_chars instances in a text_line instance.

This typedef struct is used to record the state of the page:

typedef struct {
    uint32_t    font_number;
    uint32_t    y_address;
    uint32_t    x_address;
} page_state;

and these two global variables are used:

page_state  current_state;
page_state  desired_state;

The variable current_state is used to contain the current font number and location of the print head. The variable desired_state is used to contain the number of the next font to be used and the next location of the print head. The values returned by device functions %font_number(), %x_address(), or %y_address() are contained in stand-alone file-level variables.

Four file-level flags are used:

bool    at_start;
bool    set_margin;
bool    textpass;
bool    uline;

The flag at_start is initialized to "true" and is used in the initial vertical positioning to trigger the interpretation of the :LINEPROC block :ENDVALUE block for line pass 1 of available font 0 at the appropriate point. It is then set to "false" and remains "false" for the remainder of program execution.

The flag set_margin is initialized to "false" and is set to "true" if, and only if, value of the flag at_start is still "true" after any device pages have been skipped; the at_start flag is set to "false" at the same time. The effect is to confine the use of the at_start flag to three of the four cases where device pages are skipped, the exception being when the :ABSOLUTEADDRESS block is not defined and the :NEWPAGE block does not contain device function %newpage(). The sequences given here and here only interpret function blocks if the flag is set to "true". Since these sequences only occur at the start of the last document pass, the flag is never set back to "false" once it has been set to "true" in the current code.

The flag textpass is initialized to "false" and is primarily used to control whether or not text is to be output. It is set to "true" by device function %textpass() and as part of the "first text_chars instance" and the "new font text_chars instance" sequences when no :LINEPROC blocks at all are present in the :FONTSTYLE block. The value set will remain unchanged through all of the "subsequent text_chars instance" sequences which follow; these sequences are discussed here. When the implementation checks to see if the appropriate :LINEPROC block :ENDVALUE block should actually be interpreted, a value of "true" will result in the interpretation occurring; this check also sets its value to "false".

The flag ulineon is initialized to "false" and is used to control whether or not the character provided by the :UNDERSCORE block is to be output. It is set to "true" by device function %ulineon() and to "false" by device function %ulineoff(). These are the only functions that affect its value. When the implementation checks to see if the appropriate :LINEPROC block :ENDVALUE block should actually be interpreted, a value of "true" will result in the interpretation occurring; this check does not affect its value.

The Sequence for Text Lines

This sequence appears to apply to text lines, wherever they occur. It does not apply to horizontal and vertical lines using the characters defined by the :BOX block.

The sequence appears to be:

  1. Set the value of field desired_state.font_number to the value of field text_chars.font_number of the first text_chars instance.
  2. Set the value of field desired_state.x_address to the value of field text_chars.x_address of the first text_chars instance.
  3. Set the value of field desired_state.y_address to the value of field of the text_line.y_address.
  4. If the value of the flag text_output is "true", interpret the :LINEPROC block :ENDVALUE block using the value of the field current_state.font_number to identify the appropriate :FONTSTYLE block.
  5. Perform the normal vertical positioning.
  6. Set the value returned by device function %font_number() to the value of field desired_state.font_number.
  7. Perform the first line pass.
  8. For each subsequent line pass:
    1. Set the value of field desired_state.font_number to the value of field text_chars.font_number of the first text_chars instance.
    2. Set the value of field desired_state.x_address to the value of field text_chars.x_address of the first text_chars instance.
    3. Interpret the :LINEPROC block :ENDVALUE block using the value of the field current_state.font_number to identify the appropriate :FONTSTYLE block.
    4. Perform the overprint vertical positioning.
    5. Perform the subsequent line pass.

The Sequences for Boxing

These sequences appears to apply to to horizontal and vertical lines using the characters defined by the :BOX block.

This section was not finished during the preliminary investigation; however, the topic is also discussed here, and that will do for now.

Related Topics

The Normal Vertical Positioning

The normal vertical positioning sequence is:

  1. If the values of the fields current_state.y_address and desired_state.y_address are different, then:
    1. If the value of desired_state.y_address is on a different device page from the value of current_state.y_address, do the "device page sequence" once for each device page that must be moved over to reach the correct device page.
    2. Set the value returned by device function %x_address() to the value of the :PAGESTART block attribute x_start.
    3. Set the value returned by device function %y_address() to the value of desired_state.y_address scaled to fall on the correct device page.
    4. Set the value of current_state.y_address to the last line of the device page before the current device page.
    5. If the :ABSOLUTEADDRESS block is not defined, do the "final vertical positioning" sequence.

The "device page sequence" is:

  1. Interpret the :NEWPAGE block.
  2. If the :ABSOLUTEADDRESS block is defined:
    1. Set the value of current_state.y_address to the last line of the device page before the current device page.
    2. Set the value returned by device function %y_address() to the value of current_state.y_address.
    3. If the value of the flag at_start is "true":
      1. Interpret the :LINEPROC block :ENDVALUE block for line pass 1 of available font 0.
      2. Set the value of the flag at_start to "false".

The "final page sequence" is:

  1. If the number of lines requested is zero:
    1. Interpret the :NEWLINE block for which the value of attribute advance is "0". If no such block exists, then interpret the :NEWLINE block for which the value of attribute advance is "1".
  2. If the number of lines requested is greater than zero:
    1. Use one or more :NEWLINE blocks to position the print head vertically to the correct line.
  3. Replace the value of current_state.x_address with the value of the :PAGESTART block attribute x_start.
  4. Set the value returned by device function %x_address() to the value of current_state.x_address.

Note these two cases:

  • If the values of the fields current_state.y_address and desired_state.y_address are the same, nothing happens.
  • If the :ABSOLUTEADDRESS block is defined, then the print head still must be positioned using the :ABSOLUTEADDRESS block, but :NEWPAGE will have been used as required to reach the correct device page and all the values indicated above have been updated as shown.

The value of "0" is treated as valid to accomodate control word .sk -1, which is used in the Open Watcom documents use in some macros.

The Overprint Vertical Positioning

The overprint vertical positioning sequence is:

  1. If the :ABSOLUTEADDRESS block is not defined:
    1. Interpret the :NEWLINE block for which the value of attribute advance is "0". If no such block exists, then interpret the :NEWLINE block for which the value of attribute advance is "1".
  2. Set the value returned by device function %x_address() and the value of current_state.x_address to the value of the :PAGESTART block attribute x_start.

Note this case:

  • If the :ABSOLUTEADDRESS block is defined, then the value returned by device functions %x_address() and the value of current_state.x_address are still changed to the value of the :PAGESTART block attribute x_start but the print head is not moved.

Implementation note: originally, code was inserted to increment the current state properly if no :NEWLINE block with the value "0" for attribute advance was found. This produced problems when invoked at the start of a new section: the first text line now had a vertical position that was above the current position. In effect, the document layout code and the text output code became unsynchronized. This code was removed.

Reflection suggests that this is a knotty topic: in order to work, some sort of offset would have to be computed and continually applied to the specified vertical positions in order to maintain synchronization. This would have the effect of making the document page longer than it was intended to be, and so would produce possibly unexpected (and unwanted) device pages. Since the device involved (HELP) is not used in producing the Open Watcom documents and since its output was apparently displayed on a computer screen rather than printed, not correcting for this is unlikely to cause any problems.

Device Pages

Device pages are kept track of by the text output code. The document layout code only tracks document pages. See Page Layout Subsystem for additional information.

These facts need to be considered:

  1. The value returned by device function %y_address() must be scaled to refer to the current page, since it may be used by the device to locate a position on that page.
  2. The value returned by device function %pages() is only affected by document pages, which change after the :NEWPAGE block is interpreted.

The number of lines on a device page is given in the attribute page_depth of the :DEVICE block. The correct line location on the current page is the result of the integer computation

desired_state.y_address mod page_depth

and the value returned by device function %y_address() must always be set using this computation.

The fact that a new device page is needed can be detected by testing whether the integer computation

desired_state.y_address / page_depth

is greater than zero (it should, of course, never be less than zero). Alternately, the result is the number of :NEWPAGE interpretations that must be done to reach the correct device page.

The value returned by device function %pages() must not be altered when a new device page is being reached.

:NEWPAGE and :PAUSE

The WGML Reference states in section 15.10.2 PAUSE Block (in part):

DOCUMENT_PAGE

The pause block is evaluated at the beginning of each document
page. 
 
DEVICE_PAGE

The pause block is evaluated when WATCOM Script/GML begins a
new page on the output device.

and in section 15.9.5 NEWPAGE Block

The newpage block defines the method by which WATCOM Script/GML will start a
new page in the output, and must be specified.

The latter is not entirely correct.

The :NEWPAGE block is concerned primarily with finishing up the current page; the clearest explanation of an actual :NEWPAGE block is:

:CMT.To start a new page, the carriage return and form feed control
:CMT.characters are sent to the device.
:CMT.Execution of the data currently in the buffer is forced, and the
:CMT.recordbreak then ensures that these characters are sent to
:CMT.the device before any other data could be sent in the same record.
:CMT.This makes it possible to feed single sheets into the printer
:CMT.if there is a device_page pause in the corresponding device
:CMT.definition.

Although all :NEWPAGE blocks do not use %recordbreak(), many of them do, and the purpose is clearly to flush the output buffer (and so the last bit of text for the current page) as well as to actually eject the page, which may or may not result in a new page appearing in the device.

The :PAUSE blocks are concerned primarily with the new page; that is, they allow processing to be paused until the next sheet of paper is inserted into the device.

This is quite clear in the case of TERM, where a "document page" is however long the :LAYOUT section specifies it to be, but a "device page" is only 20 lines. In this case, rather than inserting a new piece of paper, the :PAUSE blocks allow the user to view the current text before the screen is cleared and new text produced. Doing the document_page and device_page :PAUSE blocks before the :NEWPAGE block results in text appearing at the wrong location: for example, the document title will appear, not on its own page, as it should, but on the first text page of the document.

Thus, for both device pages and document pages, the appropriate :PAUSE block must be interpreted after the :NEWPAGE block is interpreted.

Since section 15.10.2 PAUSE Block also states (in part):

DOCUMENT_PAGE
If the page being printed is both the document
page and the device page, the document page pause block takes
precedence over the device page pause block. 

both the device_page and the document_page :PAUSE blocks must be implemented if a pause is needed after the interpretation of the :NEWPAGE block for a particular device; if only the device_page :PAUSE block is implemented, then no pause will occur when a new document page is needed.

:ABSOLUTEADDRESS and :NEWLINE

When the :ABSOLUTEADDRESS block is interpreted, the effect to to do both horizontal and vertical positioning to the location specified by the values returned by device functions %x_address() and %y_address(). Part of interpreting the :ABSOLUTEADDRESS block is deemed to be:

  1. replacing the value of current_state.x_address with the value of desired_state.x_address; and
  2. replacing the value of current_state.y_address with the value of desired_state.y_address.

When the :ABSOLUTEADDRESS block does not exist, one or more :NEWLINE blocks are interpreted. Part of interpreting a :NEWLINE block is deemed to be:

  1. replacing the value of current_state.y_address with the value of the vertical position actually attained.

In either case, when this process is complete, then the value of current_state.y_address will be the same as the value of desired_state.y_address.

When more than one line needs to be skipped, wgml 4.0 appears to follow this simple algorithm:

  1. Interpret the :NEWLINE block the value of whose attribute advance is as large as possible, but no larger than the number of lines that need to be skipped.
  2. Reduce the number of lines that need to be skipped by value of the attribute advance of the :NEWLINE block just interpreted.
  3. If more lines need to be skipped, returned to step 1.

When all of the :NEWLINE blocks needed have been interpreted, then, if the at_start flag is "true", the :LINEPROC block :ENDVALUE block for line pass 1 of available font 0 is interpreted and the at_start flag is set to "false".

The examples given here are intriguing and suggest that additional research would be interesting on the use of the :NEWLINE blocks. As noted below, however, it really is not needed at this time. These topics suggest themselves; others may exist:

  1. Confirmation that the "font-from" is used to determine how much vertical space a :NEWLINE 1 may be expected to produce.
  2. Confirmation of how much vertical space :NEWLINE blocks with higher values for attribute advance are expected to produce.
  3. Investigation of the possibility that this code keeps track of the vertical position per the formatting code versus the actual vertical position per the device and compensates in some way.

The example given here which shows a :NEWLINE 2 block being used for a vertical advance of "5" when the block, one would think, would be doing either "4" or "6" suggests the possibility that any gaps between where the device is locating lines and where wgml 4.0 believes they are located are "handled" in the code deciding how to use the :NEWLINE blocks, possibly by compensating in some way for the difference. The code implementing the normal vertical positioning handles all device paging. This may or may not help resolve some of these "differences".

The sequence of :NEWPAGE invocations and values returned by %y_address() have been identical in every pair of files produced for devices which differ only in that one defines :ABSOLUTEADDRESS and the other does not. Thus, these "differences" must be handled identically for all devices, if they are handled at all.

On the other hand, mixing font sizes on a device which relies on :NEWLINE blocks may not be something that wgml 4.0 is actually intended to support. Certainly neither WHELP nor PS do this (PS uses :ABSOLUTEADDRESS, WHELP uses exactly one font, mono01, which is not scaled and in which each character has the same width), so it is not particularly urgent, except as investigating it sheds light on other issues. This might change if our wgml is released more widely, although I have my doubts that any device using scaleable or different-sized fonts (and expecting the line height to vary with the font) would not use :ABSOLUTEADDRESS.

Applying Font Styles

This section deals with the sequence used by wgml 4.0 to output each text_chars instance for the current line pass. The discussion of the :FONTSTYLE block found here deals with the how the block is used.

In researching this topic, the :FONTSTYLE block :LINEPROC blocks were set up to indicate when the various sub-blocks were interpreted. These indicators started and ended with a : character and included a two-letter abbreviation of the sub-block name, the line pass number, and a one to three letter abbreviation of the font style name. These indicators made it very clear where each block of each line pass of each style was being interpreted.

The First Line Pass Sequence

This section deals with the sequence usually used to output the first line pass.

This sequence is used for an entire text line, that is, for all of the text_chars instances in the current text_line instance, for the first line pass. It has some presuppositions:

  • The fields of desired_state have been set appropriately.
  • The value returned by device function %font_number() is that specified by the field desired_state.font_number.
  • The :LINEPROC block :ENDVALUE block, if needed, has been done.
  • The vertical positioning has been done.

The sequence itself is:

  1. Do the "first text_chars instance" sequence on the first text_chars instance in the text_line.
  2. Do the "subsequent text_chars instance" sequence on each text_chars instance in turn until a text_chars instance which contains a new value of field font_number is reached.
  3. Each time a text_chars instance which contains a new value of field font_number is reached:
    1. Do the "new font text_chars instance" sequence on that text_chars instance.
    2. Do the "subsequent text_chars instance" sequence on each text_chars instance in turn until a text_chars instance which contains a new value of field font_number is reached.

The "first text_chars instance" sequence appears to be:

  1. Set the value of the flag textpass to "false".
  2. Set the value returned by device function %x_address() to the value of the field desired_state.x_address.
  3. If a font switch is required, do one. If not, interpret the :FONTSTYLE block :STARTVALUE block.
  4. If no :LINEPROC exists, set the value of the flag textpass to "true"; otherwise:
    1. Interpret the :LINEPROC block :STARTVALUE block.
    2. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
    3. Interpret the :LINEPROC block :STARTWORD block if a font switch was not done.
  5. If the value of the textpass flag is "true":
    1. Do the initial horizontal positioning, if not already done.
    2. Print out the text controlled by the text_chars instance, if any.
  6. Update the value of current_state.x_address to reflect the current position of the print head.
  7. Set the value returned by device function %x_address() to the value of the field current_state.x_address.
  8. Interpret the :LINEPROC block :ENDWORD block.

If no font switch occurs and no :LINEPROC block :FIRSTWORD block was defined, then the :LINEPROC block :STARTWORD block will be interpreted twice in succession.

The "new font text_chars instance" sequence appears to be:

  1. Set the value of the flag textpass to "false".
  2. Set the value of field desired_state.font_number to the value of field font_number in the current text_chars instance.
  3. Set the value of field desired_state.x_address to the value of field x_address in the current text_chars instance.
  4. Set the value returned by device function %font_number() to value of the field desired_state.font_number.
  5. Interpret the :LINEPROC block :ENDVALUE block, using the value of the field current_state.font_number to identify the appropriate :FONTSTYLE block.
  6. Do a font switch. (It will be required by definition: if the value of field font_number has not changed, this sequence will not be in effect).
  7. If no :LINEPROC exists, set the value of the flag textpass to "true"; otherwise:
    1. Interpret the :LINEPROC block :STARTVALUE block.
    2. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
  8. If the value of the textpass flag is "true":
    1. Do the internal horizontal positioning, if not already done.
    2. Print out the text controlled by the text_chars instance, if any.
  9. Update the value of current_state.x_address to reflect the current position of the print head.
  10. Set the value returned by device function %x_address() to the value of the field current_state.x_address.
  11. Interpret the :LINEPROC block :ENDWORD block.

Note that the :LINEPROC block :STARTWORD block appears at most once, and only if no :LINEPROC block :FIRSTWORD block is defined.

The "subsequent text_chars instance" sequence appears to be:

  1. Set the value of field desired_state.x_address to the value of field x_address in the current text_chars instance.
  2. Set the value returned by device function %x_address() to the value of the field desired_state.x_address.
  3. Interpret the :LINEPROC block :STARTWORD block.
  4. If the value of the textpass flag is "true":
    1. Do the internal horizontal positioning, if any.
    2. Print out the text controlled by the text_chars instance, if any.
  5. Update the value of current_state.x_address to reflect the current position of the print head.
  6. Set the value returned by device function %x_address() to the value of the field current_state.x_address.
  7. Interpret the :LINEPROC block :ENDWORD block.

Since this sequence will only be used when the value of field font_number has not changed, no font switch will occur, and the :LINEPROC block :STARTWORD block will always appear.

If any of the blocks interpreted (including all blocks interpreted as part of a font switch) include device function %dotab() and certain other conditions are met, the horizontal positioning will be done during the interpretation of that block.

At this level, a font switch is "required" whenever the font number for the current text_chars instance differs from the value of current_state.font_number. The font switch sequence determines whether an actual "font switch", in sense of actually interpreting any :FONTSWITCH block sub-blocks, is needed.

The Subsequent Line Pass Sequence

This section deals with the sequence usually used to output the subsequent line passes.

This turns out to be less straightforward than the first line pass sequence. One of the reasons for this appears to be that, while all font styles define a first line pass (if not done explicitly, then this is used), not all define any subsequent line passes. Thus, when processing a subsequent line pass, some text_chars instances may be associated with a font style which does nothing on that line pass. The one complication that can not occur is a font style that skips a line pass: as stated here, the values of the :LINEPROC block attributes pass must, within a given :FONTSTYLE, start at "1" and be numbered consecutively (that is, no gaps are allowed).

The term processed will be used to refer to those text_chars instances which are associated with a font style which defines a :LINEPASS block for the current line pass. The term skipped will be used to refer to those text_chars instances which are associated with a font style which does not define a :LINEPASS block for the current line pass.

Testing was done with three font styles:

  • "plain", which had one line pass;
  • "plain2", which had two line passes; and
  • "bold", which had four line passes.

When a second line pass in which all text_chars instances were processed was examined, it was seen to be done with these presuppositions:

  • The fields of desired_state have been set appropriately.
  • The value returned by device function %font_number() is that specified by the last text_chars instance in the text_line.
  • The value returned by device function %x_address() is that resulting from the processing of the last text_chars instance in the text_line on the previous line pass.
  • The :LINEPROC block :ENDVALUE block has been done.
  • The overprint vertical positioning has been done.
  • The value returned by device function %x_address() was set to "x_start" by the overprint vertical positioning.

The sequence itself is identical to that given above. The "first text_chars instance" sequence differs only in omitting step 1: the value returned by device function %x_address() is not set to the value of the field desired_state.x_address on subsequent line passes. The other two subsequences are identical to those used above. This sequence has been implemented in our wgml.

This difference is what distinguishes the first line pass from the subsequent line passes as such. But what of line passes in which some text_chars instances are skipped?

Initially, these appeared to be a veritable zoo of different patterns. However, it soon became apparent that the pattern for each line pass depended almost entirely on which font styles were in use, and in which order. What mattered in most cases was where each skipped text_chars instance was located.

There are three positions in which a skipped text_chars instance may appear:

  1. It may appear at the start of the text_line, and so before the first text_chars instance which is processed (initial).
  2. It may appear within the text_line between two text_chars instances that are processed (medial).
  3. It may appear at the end of the text_line, that is, after the last text_chars instance to be processed (final).

There may, of course, be more than one skipped text_chars instance in any of those positions. Note that a line consisting entirely of skipped text_chars instances does not have the line pass in question (nothing appears), so the skipped/processed dichotomy only makes sense in a text_line that contains at least one of each.

The medial position appears to work this way:

  1. The :FONTSTYLE block :ENDVALUE block, preceded by the :LINEPROC block :ENDVALUE block, is interpreted at the end of the processed text_chars instance immediately preceding the skipped text_chars instance. Multiple skipped text_chars instances, even with different values for font_number, have no additional effect. The :FONTSTYLE block :ENDVALUE block is probably produced by whatever follows it, as is generally the case.
  2. If the next processed text_chars instance uses the same font style as the last processed text_chars instance, then the :FONTSTYLE block :STARTVALUE block is interpreted; however, since the :FONTPAUSE block is not interpreted, this is not the font switch sequence. The :LINEPROC block :ENDVALUE block does not appear between the :FONTSTYLE block :ENDVALUE block and the :FONTSTYLE block :STARTVALUE block.
  3. If the next processed text_chars instance uses a different font style than the last processed text_chars instance, then the full font switch is done, starting with the :FONTSTYLE block :ENDVALUE block (which thus appears twice) and including the :FONTPAUSE block. The :FONTSTYLE block :ENDVALUE block is preceded by the :LINEPROC block :ENDVALUE block.

The final position appears to be quite simple:

  1. The :FONTSTYLE block :ENDVALUE block, preceded by the :LINEPROC block :ENDVALUE block, is interpreted at the end of the processed text_chars instance immediately preceding the skipped text_chars instance. Multiple skipped text_chars instances, even with different values for font_number, have no additional effect. It is then followed by the :LINEPROC block :ENDVALUE block emitted as part of the setup for the next line or line pass.

The initial position shows this sequence twice:

  1. The :FONTSTYLE block :STARTVALUE block was done for "bold".
  2. The :LINEPROC block :STARTVALUE block was done for "bold" for the current line pass.
  3. The :LINEPROC block :FIRSTWORD block was done for "bold" for the current line pass. Prior testing included these blocks, so, if the :LINEPROC block :FIRSTWORD block is not defined, then the :LINEPROC block :STARTWORD block for "bold" for the current line pass would be done instead.

The sequence itself is normal; the duplication is not.

The behavior noted above for the initial and final positions of skipped text_chars instances also occurs in a different context:

  • The first text_chars instance is skipped on the current line pass.
  • The last text_chars instance is processed on the current line pass.
  • All text_chars instances were processed on the preceding line pass, which was a subsequent line pass, not a first line pass.

It must be noted that, even if the other conditions are met, if only the first line pass processed all text_chars instances, then this behavior does not occur. It is not clear why this happens; however, there should be no problem writing the code so that it does happen or not as appropriate.

Most, if not all, of this appears to be explained by these rules:

  1. If a text_chars is to be skipped and the prior text_chars was processed, then the font style is terminated.
  2. If a text_chars is to be processed and the prior text_chars was skipped, and no font switch is needed, then the font style is initialized.

This occurs in addition to any other output which would normally occur, which is why some sequences occur twice in the output file. The reason that this behavior occurs only on some subsequent line passes with skipped text_char instances rather than to all of them remains undiscovered.

The sequences involving skipped text_chars instances have not been completely implemented in the code for our wgml. The reasons are:

  1. Device functions other than %dotab(), %textpass(), %ulineon(), and %ulineoff() appear to occur only in line pass 1 in the devices available to me. Note: this turned out not to be true of device TASA, which does multiple line pass overprinting for both font styles involving "bold" text and the font styles involving the underscore character.
  2. Repeating device function %dotab() has no effect since the current and desired horizontal positions are set equal by the first invocation of device function %dotab().
  3. Repeating device functions %textpass(), %ulineon(), or %ulineoff() has no effect since all these functions actually do is set or clear a flag.
  4. The unimplemented sequences make no sense, that is, there is no apparent reason why they are needed.

Subsequent testing on a completely different topic showed that, in one instance, this sort of thing needed to be implemented. When a text fragment such as this:

:HP1.hi1::EHP1. with colon :HP1.hp1:EHP1.

is processed then, when wgml font 1 uses the font style "uline", underlining would stay on throughout, as the result using TASA shows:

0          hi1:   with  colon hp1
+          ______________________

Testing showed that, at least for this phrase, moving "with colon" into the initial or final position did not reproduce this problem: it appears to apply only to the medial position.

What was happening is quite clear: font style "uline" relies on the :LINEPROC block :ENDVALUE block to invoke device funtion %ulineoff(), and that block was not being invoked because the code was going from one text_char instance using wgml font "1" to a second text_char instance using wgml font "1", which resulting in no font switch-related blocks being interpreted.

The actual implementation is a bit different from the above, and so may need further tweaking, but it is consistent with the "rules" given above.

If a text_chars has been skipped in the medial position and the font number does not change, then these function blocks are interpreted in this order:

  1. The :FONTSTYLE block :LINEPROC block :ENDVALUE block.
  2. The :FONTSTYLE block :ENDVALUE block.
  3. Skip the space occupied by the text_chars instances that are skipped.
  4. The :FONTSTYLE block :STARTVALUE block.
  5. The :FONTSTYLE block :LINEPROC block :STARTVALUE block.
  6. The :FONTSTYLE block :LINEPROC block :FIRSTWORD block.

The first three steps effect the termination of the font style, and the last three effect the initialization of the same font style. The middle step effectively ensures that the intervening space will not be affected. The space is skipped using the usual method for horizontal movement within a text line, that is, using the :HTAB block if appropriate but not the :ABSOLUTEADDRESS block, even if the :ABSOLUTADDRESS block is available.

The remaining sequences will, of course, be reconsidered if it should turn out that any of them are needed.

The :LINEPROC Block With %ulineon()/%ulineoff()

Most of the testing done in researching the sequencing used an :UNDERSCORE block which had a non-null character string as the value of attribute font. This attribute can instead take a number (which designates a :DEFAULTFONT block) or an empty string (which causes the underscore to use the same font as the text).

The complexity which resulted from this caused me to restart the testing using, first, just font style "plain" with no markup at all, and then to use "plain" and "bold". This has helped in isolating the effects of using device functions %ulineon() and %ulineoff() when the font attribute of the :UNDERSCORE block is a non-null character string. Testing was done with the overprint versions of both font style uline and font style uscore.

These modifications of the sequences above are found:

  • Only the horizontal positioning is actually done using the specified font style.
  • The :STARTVALUE block, :FIRSTWORD block, :STARTWORD block, and :ENDVALUE block are all done from line pass 2 of the specified font style.
  • The font is then changed. However, this is not quite an ordinary font change because no :FONTSTYLE block :STARTVALUE block is interpreted.
  • The underscore characters for the current text_chars instance are then emitted.
  • A normal font change is then done, starting with the :FONTSTYLE block :ENDVALUE block for font style "plain".

If the :UNDERSCORE block uses numeric value for attribute font, then much the same happens as shown above. The differences are:

  • The :LINEPROC block :ENDWORD block is interpreted after the :FIRSTWORD block (and so before the :STARTWORD block) when the horizontal positioning (but not the initial horizontal positioning, that is, the left margin plus any indentation) is done.
  • If the font style associated with the designated :DEFAULTFONT block is the same as that assigned in the option file, then the :LINEPROC block :ENDVALUE block is interpreted after the underscores and before the :FONTSTYLE block :ENDVALUE block for that font style.
  • If the font style associated with the designated :DEFAULTFONT block is different from that assigned in the option file, then the only difference is that the :FONTSTYLE block :ENDVALUE block is for the font style associated with the designated :DEFAULTFONT block.

When the font attribute of the :UNDERSCORE block was given an empty string as its value, this was observed:

  • The normal font switch, ending with the :FONTSTYLE block :STARTVALUE block, was done.
  • The first text_chars instance, that is, the one preceded by the left margin, was done as a unit: first the initial horizontal positioning, then the underscore characters, with nothing in between. This was preceded by the :LINEPROC block :STARTVALUE block and (for font style "uline") the :FIRSTWORD block or (for font style "uscore", which had no :FIRSTWORD block) the :STARTWORD block.
  • The :FONTSTYLE block :STARTVALUE block was then repeated. No :LINEPROC sub-blocks occurred between the last underscore and this block.
  • The remaining text_chars instances were then done. Each was output as a unit (spaces followed by underscores for any text for font style "uscore", a solid line of underscores for font style "uline", that is, the usual and expected behavior) but was preceded and followed by :LINEPROC sub-blocks per the procedure shown above.
  • The actual sequence of :LINEPROC sub-blocks leading up to the second text_chars instance (that is, the first of the grouped instances) was (for font style "uline") the :STARTVALUE block, the :FIRSTWORD block, the :ENDWORD block, and the :STARTWORD block (in that order) or (for font style "uscore", which had no :FIRSTWORD block) the :STARTVALUE block, the :STARTWORD block, the :ENDWORD block, and the :STARTWORD block.

Subsequent testing using a third and fourth line pass (with an empty string for the value of the font attribute of the :UNDERSCORE block) showed different behavior from that listed above. As noted here, these passes have some output peculiarities of their own. Since none of this is currently implemented, there is little point in pursuing the details.

No known device defines an :UNDERSCORE block. As documented here, the result is that the current font is used. Our wgml currently implements this case only.

Our wgml does not, however, do so using any of the above behavior. Instead, it does a perfectly normal subsequent line pass but emits underscore characters instead of text at the appropriate point. The underscore characters are preceeded by the :LINEPROC block :STARTWORD block and followed by the :LINEPROC block :ENDWORD block which, for underscoring as opposed to underlining, is necessary to turn underscoring on and off for each word. The only Open Watcom device which actually uses more than one line pass is TASA; and the output to TASA from our wgml and wgml 4.0 appears, so far, to be identical. This can be changed in the future, if any part or parts of the wgml 4.0 behavior turn out to actually be necessary.

If, at some point, the other cases (numeric font, non-null character string font) are implemented, the same procedure should be followed: a sensible method should be used at first, with the behavior noted above implemented only to the extent that it is actually necessary.

Drawing Lines Using Characters

This section is not about this situation:

  • The value of the attribute frame of tag :FIG is a character string.

Suppose the character string is 'abcde'. The output line which resulted with the test file was:

abcdeabcdeabcdeabcdeabcd

This line was output using the normal procedure for a text_line with a single text_chars instance. The only point of interest is that the entire "horizontal line" was formed as a unit. This turns out to be quite common when wgml 4.0 draws lines using characters, rather than the :HLINE block, the :VLINE block, or the :DBOX block.

While attempting to express the content seen in the output files generated by wgml 4.0 from the test document specifications in terms of the model and structs discussed here, it became apparent that, when the characters specified by the :BOX block are used to create a horizontal or a vertical line, the procedure for outputting those characters differs from the standard procedure given earlier. This section is about this alternate procedure.

A horizontal line appears as a single block, for example:

+----------------+

just as the line formed from the character string did above.

A vertical line appears as a single character, for example:

|

This is what caught my attention: these items are neither preceeded by the :LINEPROC block :STARTWORD block nor followed by the :LINEPROC block :ENDWORD block.

The sequences discussed in this section apply to horizontal lines (not followed by text) associated with the use of both control word .bx and tag :FIG, and with both the top line and the bottom line of the box.

The sequence, which uses the normal font switch sequence if a font switch is required, is:

  1. If a font switch is required, do one. If not, interpret the :FONTSTYLE block :STARTVALUE block.
  2. Interpret the :LINEPROC block :STARTVALUE block.
  3. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.
  4. Print out the horizontal or vertical line.
  5. If a font switch was done in step 1, do a font switch back to the original font. If not, interpret the :FONTSTYLE block :STARTVALUE block.
  6. Interpret the :LINEPROC block :STARTVALUE block.
  7. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.

The value returned by device function %font_number() is what would be expected normally at all points. The value returned by device function %x_address() is constant throughout the sequence -- even after the line has been printed out, device function %x_address() returns the value designating the left margin.

If the value of attribute font of the :BOX block is a character string, then the font style "plain" is used. If the value of attribute font of the :BOX block is a number, then the font style specified by that :DEFAULTFONT is used, but only the first line pass :LINEPROC block is used (and, of course, only the :LINEPROC sub-blocks shown above are interpreted): no overprinting occurs.

The steps seen next fall into two categories:

  • what happens most of the time; and
  • what happens with vertical lines when tag :FIG is being used.

Testing has been patchy both because of my inexperience with Watcom GML (for example, no attempt was made to wrap text around a box and so no horizontal lines followed by text were observed) and because wgml 4.0 does not always emit vertical line characters where expected. Device TERM shows these problems clearly with my use of control word .bx; with tag .FIG, TERM shows the box with a vertical line on the right, something that my test driver never showed.

Most of the time, that is, for control word .bx and for tag :FIG for horizontal lines (at least those with no text following them) this happens next:

  1. The text controlled by the first text_chars instance, if any, is printed out.
  2. The :LINEPROC block :ENDWORD block is interpreted.

After that, the normal text output sequence is clearly in effect. The value of device function %x_address() is updated to reflect the position of the last character printed with this :LINEPROC block :ENDWORD block.

For tag .FIG, when the vertical line is drawn, this happens next:

  1. Interpret the :LINEPROC block :STARTVALUE block.
  2. Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.

The value returned by device function %x_address() in these steps refers to the start position for the text itself. Thus, if the left margin requires six spaces, seven characters will be printed (six space characters plus one vertical line character) but the value returned by %x_address() will increase from "6" to "8" without ever having the value "7". The resulting space character appears before the output text, that is, it is treated as normal horizontal positioning. This is followed by the output text itself, printed out almost as if it were a text_line with one text_chars instance, the difference being that the initial space character is treated as internal horizontal positioning (the :ABSOLUTEADDRESS block, if available, is not used).

There are also two other conclusions that may be drawn:

  • It appears that wgml 4.0 does not use the same procedure for outputting all possible text lines.
  • It appears that wgml 4.0 keeps track of the widths of characters output which are intended to actually appear in the final document in a way that all text line output sequences can access.

Presumably, the value of current_state.x_address is read by :LINEPROC block :ENDWORD or :ENDVALUE block, whichever comes first, and set to zero at some point after that but before additional characters are output.

Supplemental Tests

When the implicit %textpass() is used, there are no :LINEPROC blocks. The result is entirely consistent with the sequences shown with nothing appearing when the :LINEPROC blocks are interpreted; that is, the text appears but nothing else does.

When font style "bold" was modified to use three :LINEPROC blocks, identical except for the value of n (see the notes on the test setup above), the results showed that the third line pass was treated identically to the second line pass, except, of course, that the :LINEPROC sub-blocks were from the appropriate line pass. Thus, wherever "second line pass" is mentioned, the same remarks can be taken to apply to all subsequent line passes as well.

When a "plain" :FONTSTYLE block with a :LINEPROC 1 block with no device function %textpass() in its :STARTVALUE block is used, then the first line pass only was affected: the text_chars instances using font style "plain" were processed normally, except that neither the horizontal positioning nor the text appeared. Each text_chars instance was clearly present, marked by the :ENDWORD block (the :STARTWORD block appeared or not as indicated above). The values returned by device function %x_address() were updated exactly as they were when device function %textpass() was present; this is why, in the sequences above, the steps involving device function %x_address() and the x_address fields are shown as not depending on whether or not text is actually output.

This is, of course, completely different from the behavior on the second line pass, where the text_chars instances using font style "plain" are skipped by either spaces or, if available and more than eight spaces would be needed, the :HTAB block.

Removing device function %textpass() from the first line pass, second line pass, or both, :LINEPROC blocks of the overprint "bold" :FONTSTYLE produced the same effect: the text_chars instances using font style "bold" worked normally on the affected line pass, except that neither the horizontal positioning nor the text appeared.

Horizontal Positioning

The definitions of "initial horizontal positioning" and "internal horizontal positioning" are found here. This section discusses both, as well as the action of device function %dotab(). The reason for this is that there are at least three different patterns with which wgml 4.0 does horizontal positioning.

This is the core supposition of the model with regard to horizontal positioning:

Horizontal positioning only occurs when there is a difference 
between the values of the fields current_state.x_address 
and desired_state.x_address.

When horizontal positioning is complete, then the field current_state.x_address must have the same value as the field desired_state.x_address. It does not matter which of the patterns discussed next or which of the methods those patterns can use actually performs the horizontal positioning.

Extensive testing confirms that device function %dotab() always produces horizontal positioning when the core supposition calls for it.

The first pattern is the normal pattern for initial horizontal positioning:

  1. Set the value returned by device function %x_address() to the value of the field desired_state.x_address.
  2. If the :ABSOLUTEADDRESS block is available, it is used.
  3. Otherwise, if the :HTAB block is available, it is used if the number of spaces needed would be greater than eight or the horizontal spacing requested is not an even multiple of the width of the space character in the current font.
  4. Otherwise, spaces are used.

The second pattern is the normal pattern for internal horizontal positioning:

  1. Set the value returned by device function %x_address() to the value of the field desired_state.x_address.
  2. If the :HTAB block is available, it is used if the number of spaces needed would be greater than eight or the horizontal spacing requested is not an even multiple of the width of the space character in the current font.
  3. Otherwise, spaces are used.

Note that the :ABSOLUTEADDRESS block is not used, even if it is available.

The third pattern is the pattern used with device function %dotab():

  1. Set the value returned by device function %x_address() to the value of the field desired_state.x_address.
  2. If the :ABSOLUTEADDRESS block is available, it is used.
  3. Otherwise, spaces are used.

Note that the :HTAB block is not used, ever. This allows the :HTAB block, if desired, to be defined as device function %dotab().

These implementations may fail to produce the requested horizontal spacing under these conditions:

  1. The implementation of the second pattern when the horizontal spacing requested is not an even multiple of the width of the space character in the current font and the :HTAB block is not defined.
  2. The implementation of the third pattern when the horizontal spacing requested is not an even multiple of the width of the space character in the current font and the :ABSOLUTEADDRESS block is not defined.

In either case, the code will emit the largest number of space characters which produce a total spacing which is still less than the horizontal spacing requested.

These implementations are based on these assertions:

  1. For devices which do not define the :ABSOLUTEADDRESS block, justification is done in terms of space characters.
  2. All devices which define the :ABSOLUTEADDRESS block will also define the :HTAB block.

If this is correct, then neither the implementation of the second pattern nor that of the third pattern will ever produce less horizontal spacing than was requested. And, since device PS defines both the :ABSOLUTEADDRESS block and the :HTAB block, while device WHELP, which does not define the :ABSOLUTEADDRESS block, uses only one font, MONO01, which has a character width (for all characters) of "1", these assertions certainly apply to the production of the Open Watcom documents.

Device function %dotab() will produce horizontal positioning in accordance with the core supposition above in these blocks:

:FONTSTYLE 
    :STARTVALUE
    :ENDVALUE
    :LINEPROC
        :STARTVALUE
        :FIRSTWORD
        :STARTWORD
        :ENDVALUE
:FONTSWITCH
    :STARTVALUE
    :ENDVALUE     

Device function %dotab() was never observed in the test files to produce horizontal positioning in the :LINEPROC block :ENDWORD block, most likely because, at the point this block was interpreted, the core supposition never called for it to do so.

The initial horizontal positioning can consist of two elements, which can appear separately at the start of the output file:

  • The correct value to establish (skip over) the left margin.
  • The correct value for any further indentation which must be skipped over.

In this context, the term "indentation" includes at least these cases:

  • The layout specifies an indentation for the given line (for example, the first line of a paragraph may have an indentation specified).
  • On the second line pass, the horizontal spacing from the left margin to the first text_chars instance associated with a :FONTSTYLE block which has a :LINEPROC block defined for that line pass is treated as an indentation.

Other contexts may exist.

Investigation of the use of varying line heights also produced some interesting results since, as noted here, the height of the font (if given) is also used as its width, and this affected the horizontal positioning. Apparently, wgml 4.0 considers all glyphs to be square.

Here are two examples of what was seen when varying line heights were investigated:

  • When the value of attribute char_width is "1", then device function %default_width() returns "6" (when the value of attribute font_height is "600") and wgml 4.0 indents to "9", prints four characters at "33" and moves on to the next line.
  • When the value of attribute char_width is "2", then device function %default_width() "12" (when the value of attribute font_height is "600" and the value of both attribute horizontal_base_units and of attribute scale_basis is "6"), then a 1.5i margin plus indentation results in a blank line with right margin at 7i: 9 spaces x 12 = 108, while 7i x (72/6) = 84 pts. Printing resumes on line 2, which starts with an HTAB to move 6 characters, two characters ("Th") 30 (108 - 84 + 6), a single "-" at 42, and then we get the next line.

The anomalous use of :HTAB suggests that these are all part of the same line from the viewpoint of the layout code. And that at least some of the "-" are from the line output code. This is all very murky. It is also something that is of interest only in determining what wgml 4.0 is doing, since a properly-designed device will most likely avoid this oddities and our wgml might reasonably do something different.

Switching Fonts

While this might appear to be a fairly simple topic, it turns out to have a few interesting quirks.

The test framework was set up so that each :DEVICEFONT associated a unique :FONTPAUSE and :FONTSWITCH with the font it names, and so that each :DEFAULTFONT associates a unique :FONTSTYLE with the font it names (which has the effect of tying the :DEVICEFONT which names the same font into that :DEFAULTFONT). Each :FONTPAUSE was implemented to increment a symbol and each :FONTPAUSE block each sub-block of each :FONTSTYLE block and each :FONTSWITCH block to print it out as an "Instance" number. This allowed the :FONTPAUSE, :FONTSWITCH, and :FONTSTYLE blocks to be associated with each other unambiguously. The file "default.opt", using the FONT option, was then used to vary this setup for test purposes.

The program context model discussed here is extended in this section to include two flags:

  • do_always, which is set by the binary library parsing code; and
  • do_now, which is set separately for each font switch.

The following sections contain further details.

The Normal Sequence

The tests performed revealed the actual sequence of events in switching a font. This precondition must be satisfied:

The value returned by device function %font_number() must have been 
changed to that of the available font being switched to before this sequence 
is applied.

The sequence is:

  1. If the font switch involves two distinct :FONTSWITCH instances, set the value of the flag do_now to "true".
  2. If the font switch involves only one :FONTSWITCH instance, set the value of the flag do_now to the value of the flag do_always.
  3. If the value of the flag do_now is "false", then set it per the result of the :FONTSWITCH block :STARTVALUE block evaluation.
  4. Interpret the :FONTSTYLE block :ENDVALUE block for the available font being switched from.
  5. If the value of the flag do_now is true, then interpret the :FONTSWITCH block :ENDVALUE block for the available font being switched from.
  6. Interpret the :FONTPAUSE block for the available font being switched to.
  7. If the value of the flag do_now is true, then interpret the :FONSTSWITCH block :STARTVALUE block for the available font being switched to.
  8. Interpret the :FONTSTYLE block :STARTVALUE block for the available font being switched to.
  9. Set the value of the field current_state.font_number to the value of the field desired_state.font_number.

If any of the blocks shown contain device function %dotab(), then the horizontal positioning may occur when they are interpreted, as discussed here.

The :FONTSTYLE sub-blocks were included because including them in this sequence appears to make more sense than attempting to include them elsewhere. For example, as noted here, at times the :FONTSTYLE block :ENDVALUE block occurs twice, and the second occurrence is clearly part of a font switch while the first is clearly not. Thus, the structure of the output file suggests that they belong to this sequence when this sequence is invoked.

Evaluating Function Blocks

The WGML Reference states in part in Section 15.9.11.2 STARTVALUE Section:

When a switch between two fonts is necessary, the startvalue 
sections of the two fonts are evaluated. The font switch is only 
performed if the results of the two evaluations are different.

This is, at best, only partially correct:

  • If the :FONTSWITCH instances involved are distinct, then the :FONTSWITCH blocks are always interpreted, without regard to whether their "evaluations" are the same or different.
  • Even when the :FONTSWITCH instances are the same, if the 21 flags show that any one of these device functions:
%date() %font_number() %pages() %time() %wgml_header()

is present in the :FONTSWITCH block :STARTVALUE block, then the :FONTSWITCH blocks are always interpreted, without regard to whether their "evaluations" are the same or different.

The binary device library parsing code sets the flag do_always to "true" if any of those five device functions are used, and to "false" if none of them are used.

The remaining seventeen functions whose presence is signaled by a flag are:

%default_width() %font_outname1() %font_outname2() %font_resident()
%font_height() %font_number() %font_space() %line_height()
%line_space() %page_depth() %page_width() %tab_width() %thickness()
%x_address() %x_size() %y_address() %y_size()

When the :FONTSWITCH instances are the same, then each of these, when it causes the only difference between the two "evaluations", does control whether or not the :FONTSWITCH blocks are actually interpreted.

This leaves all Type I and these Type II these device functions to consider:

%add() %decimal() %divide() %getnumsymbol() %getstrsymbol() %hex()
%lower() %remainder() %subtract()

If this section is consulted, it will be seen that, for these seven Type II device functions:

%add() %decimal() %divide() %hex() %lower() %remainder() %subtract()

gendev 4.1 folds the entire expression into a single literal parameter, which it compiles as if it were the argument of device function %image(), unless a non-literal parameter which is not one of the seven device functions shown is present in the expression. But the only device functions that can be used as a non-literal parameter are precisely those involved with the 21 flags plus device functions %getnumsymbol() and %getstrsymbol().

Device functions %getnumsymbol()and %getstrsymbol(), when tested in contexts where they returned the same value which was output rather than just tested behaved exactly like the seventeen device functions listed above. The only difference is that their presence in a :FONTSTYLE block :STARTVALUE block cannot be detected by consulting the 21 flags.

As to the Type I functions, those that do not take parameters clearly cannot behave differently and those that do take parameters take precisely those functions whose presence is indicated by the 21 flags.

Since these evaluations are only done if the same :FONTSWITCH block :STARTVALUE block is used by both the "from" font and the "to" font, and the only difference is the font number (that is, the available font and its related :FONT block), it follows that the only device functions that can differ are those discussed here, each of which is associated with a specific one of the 21 flags:

Flag   Device Function
01     %font_outname1()
02     %font_outname2()
03     %font_resident()
06     %default_width()
16     %font_height()
17     %font_space()
18     %line_height()
19     %line_space()

These flags, then, must be made available for use in the evaluation process, the sequence for which has this precondition:

the value of the flag do_now is "false"

and contains these steps:

  1. Invoke each of the device functions listed above which are present in the :FONTSWITCH block :STARTVALUE block for each font.
  2. If any pair of results differ, set the value of the flag do_now to "true".

Thus, some of the 21 flags are now processed by the binary device parsing code for the :FONTSWITCH block and used in the implementing the font switch sequence. Those for which no use has been discovered, however, need are not and need not be. This includes all of the 21 flags used with the :FONTSTYLE block.

This method of evaluation, by avoiding actually interpreting the blocks, also avoids any side effects from such device functions as %dotab(), %enterfont(), or %flushpage(), which might have a negative effect on the output.

Modified Sequence Used With %ulineon()/%ulineoff()

When the character provided by the :UNDERSCORE block is used with device functions %ulineon() and %ulineoff(), then, as noted here if a font name is given for use with the underscore character, then the font switch sequence changes: the last step of the switch-to method, the interpretation of the :FONTSTYLE block :STARTVALUE block, does not occur.

Since all known devices use the default :UNDERSCORE block, and so use the current font for the underscore characters, this has not been implemented in our wgml, as it is not needed.

Personal tools