Page Layout Subsystem

From Open Watcom

Revision as of 19:38, 4 December 2012; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Scope and Status

This page discusses the Page Layout Subsystem, that is, it will eventually gather together in one place all available information on Page Layout as done by wgml.

While this page has developed beyond a focus on the device driver blocks to include some system symbols and some layout items, it is far from complete.

Binary Device Files

These binary device file items are involved:

  • The Attributes: page_width, page_depth, horizontal_base_units, and vertical_base_units.
  • The :PAGESTART block.
  • The :PAGEOFFSET block.
  • The :PAGEADDRESS block.

This presents these items for two device/driver pairs:

Item                      PS/PSDRV            ASA/ASADRV
page_width                 8500               132
page_depth                10920                66
horizontal_base_units      1000                10
vertical_base_units        1000                 6
x_start (:PAGESTART)        200                 0
y_start (:PAGESTART)      10800                 1
x_start (:PAGEOFFSET)       200                (0)
y_start (:PAGEOFFSET)       200                (0)
x_positive (:PAGEADDRESS)   yes              (yes)
y_positive (:PAGEADDRESS)   no               (yes)

The parenthesized values in the last column are the defaults placed in the binary file when the corresponding block is not present in the source.

These values must interact with any number of wgml tags dealing with headers, footers, pagination, and so forth. All of this will need to be explored in some way.

The :PAGEADDRESS block is documented in the WGML Reference Section 15.9.12 thusly:

As text is placed on the output page, the X and Y components of the 
address are adjusted to make a new address. With some output 
devices, this adjustment is added (positive) to the address. The 
adjustment is subtracted (negative) with other output devices. The 
pageaddress block specifies whether the adjustment is positive or
negative. If the output device does not support page addressing, 
this block should not be specified. (See "Page Addressing" on page 
227 for more information).

Examination of the output intended for the PS device shows that, for that device, the values returned by device function %y_address() decrease as the print position moves down the page. Thus, the effect of a "no" in attribute y_positive is that the location of the next line is formed by subtraction. Presumably, a "no" value in attribute x_positive would have a similar effect, although no examples exist.

Another block that "should not be specified" if page addressing is not supported is the :ABSOLUTEADDRESS block. Since, as shown here the :ABSOLUTEADDRESS block is a prerequisite to the :HLINE block, the :VLINE block, and the :DBOX block, those blocks fall under the same guideline.

Section 15.2 Page Addressing states:

A particular point on the output page is identified by a horizontal 
(X-axis) and a vertical (Y-axis) component. Together, the X and Y 
components designate the address of a point on the page. As each 
word and line of output is processed, the X and Y components of the 
address are adjusted to make a new address. Many devices restrict
the adjustment of the address. Other devices are known as point 
addressable or full page addressing devices, and allow any point on 
the page to be addressed.

WATCOM Script/GML assumes that the start of an output page is the 
upper left corner. The horizontal component of the page address is 
adjusted for each character placed on the output page. The vertical 
component of the page address is adjusted for each output line. The 
current X and Y address component values are available through the
%X_ADDRESS and %Y_ADDRESS device functions.

This suggests that device functions %x_address() and %y_address() are intended for use with devices that support page addressing, although testing shows that they return the same values at the same points whether the device supports page addressing or not.

The only documentation of the :PAGEOFFSET block

:CMT. Position past the unprintable region of the page

is suggestive, but hardly complete -- and some of the uses of the attributes of this block, as seen below, are a bit unexpected.

Lines and Columns

Vertical and horizontal space is measured in many different ways in wgml 4.0, as shown in the WGML Reference and the document Waterloo SCRIPT. Most of these are absolute: inch, millimeter, centimeter, cicero, pica. Others vary by font or device.

The "Em" unit, in the document Waterloo SCRIPT, is described this way:

[T]he "Em" unit is the character width of one blank character. It is the same as a pure numeric 
argument.

In the WGML Reference, it has apparently become three different units:

  • Characters, described as:
The number of characters. The width of a character is determined by the CPINCH command line option. 
The default option for this value is 10 characters per inch. Example: 23
  • Device Units, described as:
The number of characters. The width of the character zero (0) in the current font is used.  
Example: 23DV
  • Ems, described as:
The number of ems followed by the M symbol. The width of an em space is the width of the 
character ’M’ in the current font. Example: 9M

As it happens, values which specify a unit are treated somewhat differently from values that do not do so. Hence the next two sub-sections.

Unit Is Specified

The units allowed can be distinguished between those that are absolute and those that are relative to the device, as they depend upon the metrics of the current font. In most cases this appears to be the default font.

The absolute units are:

  • Centimeters (CM)
  • Ciceros (C)
  • Inches (I)
  • Millimeters (MM) (only: wgml 4.0 will not accept "W", as Script apparently did)
  • Picas (P)

and the relative units are:

  • Device Units (DV)
  • Ems (M)

"Characters" and "pure numeric argument" are one and the same, and are discussed in the next section.

As is usually the case with wgml 4.0 and gendev 4.1, although the WGML Reference shows the units are shown in all capital letters, each letter can be upper or lower case: thus, in addition to "CM", wgml 4.0 also accepts "cM", "Cm", and "cm".

wgml 4.0 imposes restrictions on the number of characters that can be used to specify the value. Here is a table summarizing them:

Unit              Whole Part               Fractional Part          Separator
C                 4                        4                        C
CM                4                        2                        .
DV                4                        2                        .
I                 4                        2                        .
M                 4                        (not allowed)            (not allowed)
MM                4                        2                        .
P                 4                        4                        P

The Units can be divided into three classes:

  1. Four that allow a decimal point followed by up to two digits (CM, DV, I, MM).
  2. Two that allow two sets of four digits separated by the Unit, but not a decimal point, with or without following digits, before or after the Unit (C, P).
  3. One that allows only four digits followed by the Unit (M).

In addition to the formatting rules that apply to values in general, additional formatting rules apply to to values giving horizontal or vertical space measurements. If the value is delimited then:

  1. Spaces are not allowed before the Unit.
  2. Spaces are not allowed after the Unit.
  3. Spaces are not allowed before the start of the value.

Thus, a delimited value designating a horizontal or vertical space value cannot contain imbedded spaces. If the value is not delimited then:

  1. Spaces before the value are skipped and are not considered part of the value.
  2. Spaces after the value are skipped and are not considered part of the value.
  3. Spaces inside the value are taken to terminate the value. The following part of the value and any following attribute/value pairs will be treated as text, which will cause defaults to be used for optional attributes or an error to be produced if any required attributes have not yet been processed.

Thus, an undelimited value designating a horizontal or vertical space value cannot contain embedded spaces either.

These values can only be delimited when used as attribute values; when used as control word operators, wgml 4.0 will not accept delimiters.

Limited testing shows what the various units actually represent:

  • Units CM (centimeters), I (inches), and MM (millimeters) represent exactly what their names imply, converted to horizontal or vertical base units. One CM is taken to be 0.39374 I; one MM is taken to be 0.0394 I. These are not quite the values given on the Web, although they are very close to them.
  • Unit DV (device units) represents horizontal or vertical base units directly. While this makes it device-dependent and so limited since it would have very different results with, say, the PS device than it would with the WHELP device, still, if a document is intended for use only with PS, it would allow very precise positioning of the various document elements. Although decimals are allowed, they are ignored, even when the value is rounded: '5.9DV' always produces the value "5". Rounding will affect negative values: '-5.9DV' produces the value "-4" when rounded.
  • Unit M (ems) represents the value given (no decimals allowed, as stated above) multiplied by either the width of the character "M" in horizontal base units (for horizontal distances) or the line height in vertical base units (for vertical distances). Vertical distances are also multiplied by the line spacing (single-space, double-space and so on). The actual values are documented to be dependent on the current font.
  • Units C (ciceros) and P (picas) are described in the WGML Reference in the same way, yet wgml 4.0 produces different results for C than for P. Some insight can be found on WikiPedia: the pica is based on the English inch and the cicero is based on the French inch. A third pica, based on "the Anglo-Saxon compromise foot of 1959" is said to be the "contemporary computer pica". Use of a spreadsheet and multiple data points, however, shows that wgml 4.0 is doing something a bit different:
    • The number of picas is multiplied by 0.1663, presumably producing the value in inches.
    • The number of ciceros is multiplied by 0.1776, presumably producing the value in inches.
    • The resulting value is then multiplied by the number of base units per inch.

When used with an attribute whose values are rounded, these are all rounded as shown below (for the attributes tested).

No Unit Provided

The maximum number of digits allowed without a unit being designated is four (4). No decimals are allowed; nor is a decimal point, even within delimiters. When used with an attribute whose value is rounded, these are rounded, which affects negative values only: using '-5' with an attribute will become the value "-4" if rounding is done.

If the description of control word IN is consulted, the numeric argument is said to be the number of "spaces" to be indented. In the description of control word TB, however, the numeric argument is said to designate a "position". When tabbing was investigated, it turned out that the default tab stops were based, not on space width, as documented for control words, nor on the value of CPINCH, as documented for wgml 4.0, but on the documented default value of CPINCH (characters-per-inch):

tab-column-width = (horizontal_base_units/inch) / 10 characters-per-inch

Testing with the command-line option CPINCH showed that that value had no effect on the result. Of course, since character-mode devices typically use "10" for the value of attribute horizontal_base_units, for those devices each "column" can just as well be called a "space". But for the PS device, which uses "1000" for the value of attribute horizontal_base_units, the column-width is "100". This is, of course, nothing like the width of "one blank character". Well, not unless the space character for a particular font happens to have that width.

The implementation of control word IN in our wgml, in contrast, does use CPINCH: the horizontal space specified is the same column-width as used with control word TB only when CPINCH was "10" (the default value). When "CPI 20" was specified on the command line, it became clear that the formula used was:

column-width = (horizontal_base_units/inch) / characters-per-inch

This pretty clearly matches the description of "Character" quoted above. It seems likely that this how integer horizontal space measurements are interpreted throughout wgml 4.0 for tags and control words that also accept horizontal space measurements with units specified. Certainly the attributes width and x_off of tag :GRAPHIC works this way, as does control word BX.

Control word TB accepts only integers (or expressions) to designate tab stops, not horizontal measurements based on units. This may be why it works differently, whether that is deliberate or a result of coding the computation of tabs separately from that for horizontal space measurements in general. Since control word BX does respond to the CPINCH command-line option, and since control words TB and BX tend to be used together to form boxed text, boxes which work with the default CPI value will not look right unless the value for CPI is "10". Indeed, testing with "CPI 20" on a sample box with the PS device shows that even the horizontal lines bounding the box are not drawn correctly with "CPI 20": they are shifted slightly, but visibly, to the right. This restricts effective use of the BX command to documents produced with CPI 10, which seems to be inconsistent with allowing the CPI to vary at all.

The vertical space units, in the document Waterloo SCRIPT, are described identically to the horizontal space units. In particular, "the character width of one blank character" is said to be the meaning of a pure numeric argument when vertical spacing is being specified.

In the WGML Reference, these notes modify the descriptions given above:

A vertical space unit is specified in the same way as a horizontal space unit. An EM space 
specifies the number of lines, the height of a line determined by the current font, adjusted for 
the document spacing value currently in effect. For example, a vertical space value of ’2M’ with 
double spacing in effect results in four lines worth of space.

An integer number specifies the number of lines, the height of a line determined by the LPINCH 
command line option, adjusted for the document spacing value currently in effect. The default lines 
per inch value is 6.

A device unit space(DV) specifies the number of lines without the current document spacing 
accounted for. For example, a vertical space value of ’2DV’ with double spacing in effect results 
in two lines worth of space.

When an integer is used for the value of attribute depth of tag BINCLUDE or tag GRAPHIC or of attribute yoff of tag GRAPHIC, however, the behavior seen in wgml 4.0 is not that documented in the second paragraph above. Instead, it behaves as specified in the first paragraph: it is interpreted as a number of lines, and the space produced depends on the line height associated with the current font and the document spacing value. LPINCH has no effect on the result. This is, very probably, the normal manner in which integers provided for vertical spacing are treated by wgml 4.0.

The LPINCH option is used in computing the value of the system symbols "systm", "sysbm", "syshm", and "sysfm" at the start of each document pass:

systm = (vertical_base_units/inch * 6) / lines-per-inch
sysbm = systm
syshm = (vertical_base_units/inch * 1) / lines-per-inch
sysfm = syshm

The literals "1" and "6" are the default values of the conceptual entities "footing margin" and "heading margin" for "1", and "top margin" and "bottom margin" for "6". However, as shown, the initial values of the symbols are computed in vertical base units and affected by the LPINCH option. For example, with the PS device, the value of symbol "systm" will be "1000" normally because the default LPINCH value is "6". If LPINCH is used to set the value to "8", then the value of symbol "systm" will be "750".

Within the document pass, a line like this:

.tm 8

is encountered, the value of symbol "systm" will be "8" until the start of the next document pass, and similarly for the other three symbols. None of these control words is used by the document specifications for the Open Watcom documents, quite possibly with good reason.

It might be thought that

systm = 6 * syshm

However, this ignores the way integer math works: for the PS device with an LPI of "6", the value of symbol "syshm" will be 166 and 6 * 166 is 996, not 1000, the observed value of symbol "systm". Testing shows that these values are unrelated to the font metrics. In contrast to this, a value of "6" for attribute top_margin under the same conditions with the font 0 line-height being 167 produces a value of 1002, which is 6 * 167, not 1000. Thus, values computed using LPI and values computed using font metrics can be clearly distinguished.

Expressions and Relative Values

Expressions can be used with control words but only when no unit is provided. That is, all of these are not allowed by wgml 4.0:

1i+2 1+2i 1i+2i (1+2)i

Within an expression, each individual number must satisfy the requirements given above for values with no unit provided.

Since they are only used with control words, expressions cannot be delimited as such, although parentheses can be used to group subexpressions or entire expressions as desired. Expressions cannot contain any whitespace. There appears to be no practical limit on their length or complexity.

Relative values are identified by an initial "+" or "-" sign:

+3 -5 +1+5 -1+3 +2-3 -2-5 +(1+5) +(1-3) -(2+6) -(2-4) 

all are relative values, whether they are multi-part expressions or not, and whether they are enclosed in parentheses or not.

Some control words, such as TB and IN, require the values they are given to be in ascending order. For these control words, an initial "-" is treated as an error even if the result would be a larger value than the prior value; that is,

.bx 1 -(2-3)

would produce an error even though the second position would be 2: 1 - (2 - 3) = 1 - (-1) = 1+1 = 2. In effect, these control words allow relative addressing only with "+".

Rounding and Truncation

Since a device does positioning in terms of horizontal base units and vertical base units, the values must be converted to the appropriate base units. This cannot be guaranteed to produce a whole number of base units, yet a device can only use a whole number. wgml 4.0 uses at least three methods to deal with this mismatch:

  1. The value is truncated: any decimal part is ignored.
  2. The value, used with a tag attribute, is rounded:
    1. For positive values: any decimal part greater than 0.5 causes the value to be rounded up to the next-highest integer; otherwise, the decimal part is truncated.
    2. For negative values: any decimal part less than 0.5 causes the value to be rounded down to the next-lowest integer; otherwise, the decimal part is truncated.
  3. The value, used with a control word, is rounded the same way positive tag attribute values are rounded and then, if it is a relative value, the result is added or subtracted from the base value.

For a device with 10 horizontal base units and 10 vertical base units per inch, the rounding behavior can be made quite clear:

Value    Rounded
 0.55i    5
 0.56i    6
-0.54i    -4
-0.55i    -5

keeping in mind that the negative rounding shown only applies to tag attribute values.

This has been tested with and found to apply to all attributes of wgml tags which take a vertical or horizontal space measurement -- whether they are used in a LAYOUT section, in the text of the document specification, or even in a letter. It has also been tested with all control words described in the document Waterloo SCRIPT which take horizontal or vertical values.

The results of this testing can be summarized as follows:

  1. No horizontal space measurement for tag attributes is rounded.
  2. Most horizontal space measurements for control words are rounded, but some are not.
  3. All vertical space measurements are rounded.
  4. When a unit is present, no tag attribute value responds to CPI or LPI.
  5. When a unit is present, most control word values do respond to CPI or LPI (as appropriate).

Note: control word value response to CPI or LPI was detected using these tests:

  1. For horizontal values, rounding occurred with the default CPI (10) but not CPI 5.
  2. For vertical values, rounding occurred with LPI 10 but not the default LPI (6).

These tests do not, of course, actually guarantee that the values produced depend on the CPI or LPI in any other sense, only that they are affected by the CPI or LPI. Indeed, the results with CPI 5 are very hard to interpret as showing that the CPI value was actually being used to compute the value, even when it affected rounding. This may, of course, be in part a consequence of some horizontal values being rounded and/or affected by the CPI and others not rounded and/or not affected by the CPI.

Note that, when no unit is used (that is, the value is taken as either a column or a line count), then the CPI does affect horizontal values used with tags. Also, in at least one case (control word AD), the value used for positioning is not rounded and does not depend on the CPI, but the value of the associated system symbol ($ad or sysad) both is rounded and depends on the CPI.

Very few attributes accept negative values; those that do are rounded as shown above. Those found were:

  • attribute binding of the DEFAULT LAYOUT tag
  • attribute xoff of the GRAPHIC tag
  • attribute yoff of the GRAPHIC tag

A negative value for attribute binding moves the margins (on odd-numbered pages only) to the left instead of to the right.

Some attributes had no effect, that is, they behaved as if their value was always "0". This only occurred in some tags, not with all tags:

  • attribute align in tags APPENDIX, Hn, and TOCHn;
  • attribute pre_skip, but only in TOCH0;
  • attribute post_skip, but only in the TOCHn which is the last heading to be displaced according to the value of attribute toc_levels (the headings which are not displayed because of this setting effectively ignore all the attributes of the corresponding TOCHn tag).

Attribute pre_skip of tag APPENDIX had a peculiarity: the value "0" acted as the value "8" would have acted. Since the test device had 10 vertical base units per inch, it is possible that the value used was actualy "0.8i". Similarly, a value of "0" for attribute post_skip of tag GL is treated as "1".

These attribute values were doubled when used. Testing was not done to try and determine whether this happened to the raw value or to the value after conversion to base units:

  • attribute left_adjust of tag FIGLIST
  • attribute left_adjust of tag INDEX
  • attribute right_adjust of tag INDEX
  • attribute left_adjust of tag TOC

Note that the attribute right_adjust of FIGLIST and TOC were not doubled, making it less likely that this was caused by some other setting.

For control words, the sequences "h|+h|-h" and "v|+v|-v" always indicated that a value with a unit may be used; the single letters "h" and "v" almost always do so. The equivalent letters/sequences using "hn", "i", "w", and "h1", "h2", "h3", "h4", "h5", "h6", "h7", "h8", and "h9" all work the same way. When "n" or "m" was used, the values either clearly could not use units (what would a page number of 1 inch be?) or were tested and found to reject or ignore values using them.

These values could not be evaluated for the reasons given:

  • Control word .cc end w: wgml 4.0 will not accept .cc begin, .cc end, or .cc inline, making it impossible to tell how w is treated.
  • Control word .cp end w: wgml 4.0 will not accept .cp begin, .cp end, or .cp inline, making it impossible to tell how w is treated.
  • Control word .df testfont v: wgml 4.0 will accept .df, but .bf will not recognize the font name, so it is impossible to test v.
  • Control word .dh has three items that wgml 4.0 will not accept, making it impossible to test h or v: .dh n hang h, .dh n spbf v, and tcof h; also, wgml 4.0 accepts .dh n tcin h, but it has no effect, either on the Table of Contents or on the headings themselves. Note that .dh n skbf v and .dh n spaf v do affect the headings in the document (albeit in the exact opposite way as that documented).
  • Control word .fb dump w is accepted by wgml 4.0; however, since wgml 4.0 usually (but not always) prints the content of the block immediately, w has no discernable effect and so cannot be tested.
  • Control word FK is accepted by wgml 4.0 but is either printed at the top of the current page (not the next page) or immediately; in either case, w has no discernable effect and so cannot be tested.
  • Control word .fn thresh v: this is accepted by wgml 4.0 but has no discernable effect on the output; indeed, quite ridiculous and obviously impossible values are accepted by wgml 4.0 without demur.
  • Control word LN with negative relative values: wgml 4.0 would only accept one value: "-1". This did indeed produce output at the start of the next page, so in that sense it works.
  • Control word PW: this was accepted by wgml 4.0, but had no visible effect on either margin, and so could not be tested.

The one instance where "h" was used when it should not have been, that is, when "n" should have appeared instead, was with control word TP. .tp set 2.5i (which appears in the examples) sets the first tab to 6, which is to say, it is ignored and treated as .tp, which restores the default tab table.

Horizontal values used with these control words are not rounded (when used for positioning):

AD CD PM 

CD illustrates that the behavior of some control words is not what the documentation suggests: these sequences

.cd 2 1i 1.5i
.cd 2 1i +0.5i 
.cd 2 1i -0.5i

all produce the same result, which means that the second value is added to the first (rather than the previously-established second column width) and the "-" sign is treated as if it were "+". Since CD, it appears, is used 45 times in the Open Watcom Documents (as shown here), this may affect how it is implemented.

Horizontal values used with these control words are rounded and show some response to the CPI (using CPI 5 causes the rounding to disappear):

BX IL IR LL 

Horizontal values used with these control words are rounded and show no response to the CPI:

CL HI IN OF UN

Vertical values used with these control words show some response to the LPI:

BM CC (.cc v only) CP (.cp v only) DH (.dh n skbf v and .dh n spaf v only)
FM HM LN PL SK SP TM

The fact that SK (and SP) will accept values like "1.5i" will, if this ability is used by the Open Watcom documents, require some rethinking of how the various skips are handled.

The control words BM, FM, HM, TM (and FS, HS), at least in these tests, only had an effect on output positioning when RT was in use -- they appear to have no effect if banners are in use instead.

Vertical values used with these control words show no response to the LPI (they round as if the LPI were 10 even when it is 6):

SL 

Although the documentation suggests otherwise, wgml 4.0 applies SL to character mode devices as well as to page addressing devices, but not in the same way:

  • For the PS device, .sl p12 restores single spacing; .sl 0 causes all subsequent lines to overwrite each other.
  • For a character device, .sl 0 restores single spacing; .sl p12 produces double spacing.

It is a good thing that control word SL does not appear to be used by the Open Watcom documents.

This has some implications for implementation. At the moment, I am inclined to adopt this simple method:

Both tag attributes and control words using these values will treat them identically.
Specifically, all horizontal values will be truncated, and all vertical values rounded.
The slight difference between rounding for tag attributes and control word values will be 
    preserved when a control word requiring it is implemented.
Any related system symbols that are retained should match the actual value used.

The basic idea, of course, is to have horizontal and vertical positions computed identically whether affected by tag attributes or by control words (and to have the associated system symbols accurately reflect the values being used). Having two different systems for computing horizontal positions, one never rounding, one usually rounding, makes very little sense. particularly when the same effect can be produced by a tag or a control word, and the result depends on which was used with the same value provided.

It is not clear whether rounding negative relative values will be needed. Most control words using vertical values are not, it appears, used in the Open Watcom documents. Those that appear to be are:

  • CC, which is used once with an integer value (and several times as a possible extension for C++ source code files).
  • CP, which is used quite a bit, is used in macros with macro parameters, making it hard to say what it is used with: this control word might need to round negative relative values.
  • DF, which is indeed used a lot, but not to define fonts; rather, it is redefined using DM.
  • PL, which indeed appears to exist (that is, one instance of ".pl " exists), but consulting the document produced shows that this is actually the compiler option "pl" used in an example (the "." terminates the symbol containing the appropriate slash preceeding the option).
  • SK, which currently works only with integer values and which is used quite a bit: this control word might need to round negative relative values.

Implementation of negative relative value rounding can be postponed until it turns out to be needed.

The Open Watcom documents use 10 CPI and 6 LPI exclusively, mostly by not specifying them and so using the default values, which happen to be 10 CPI and 6 LPI. Treating the control word values identically to the tag attribute values, then, should have no effect on the Open Watcom documents.

It is harder to be certain of rounding; but these facts must be recognized:

  1. The device PS is unlikely to be affected, given that it has 1000 base units per inch.
  2. The device WHELP might be affected, if a horizontal value is used with a control word that rounds.
  3. For rounding to be a consideration, with inches and other units that only take two decimal digits, both digits must be used because, with 10 horizontal base units per inch, 0.1i becomes 1 horizontal base unit.

The only horizontal value I have seen that satisfies the third criterion is "0.25i", and that would be the same (2 columns or 2 characters) whether rounded or not. So I think it would make sense to see if horizontal rounding is actually needed for the Open Watcom documents before implementing it.

Horizontal Boundaries

Brief testing showed that wgml 4.0 does, in fact, support test devices which have the value "no" for attribute x_positive of the PAGEADDRESS block. However, since all known devices have the value "yes" for the attribute x_positive, our wgml does not check this value, but proceeds always as if it were "yes". This can, of course, be revisited should a device appear that requires it.

This section only discusses the left and right margins of the page as a whole, or, alternately, a page with a single column. Column boundaries can, of course, be revisited if our wgml needs to produce multi-column pages.

The factors involved in computing the left margin (parenthesized names will be used in the formulas) are:

  • The value of attribute left_margin of the PAGE block in the LAYOUT block ("left-margin").
  • The value of attribute x_start of the PAGESTART block in the DEVICE block ("x-start").
  • The value of attribute x_start of the PAGEOFFSET block in the DEVICE block ("x-offset").

The computed left margin ("left-margin-comp") is:

if( left-margin > x-offset ) {
    left-margin-comp = max( left-margin + x-start - x-offset, x-start )
} else {
    left-margin-comp = x-start
}

In other words, x-start acts as the smallest allowed initial computed left margin.

The quantity "left-margin - x-offset", which is used as the value of the system symbol $pagelm, is not allowed to be negative: the minimum allowed value of $pagelm is "0".

The factors involved in computing the right margin (parenthesized names will be used in the formulas) are:

  • The value of attribute right_margin of the PAGE block in the LAYOUT block ("right-margin").
  • The value of attribute page_width of the DEVICE block ("page-width").
  • The value of attribute x_start of the PAGEOFFSET block in the DEVICE block ("x-offset").

The computed right margin ("right-margin-comp") is:

right-margin-comp = min( right-margin + x-start - x-offset, page-width )

In other words, page-width acts as the largest allowed right margin.

A number of errors related to horizontal positioning are checked for and reported by wgml 4.0; of these, our wgml traps and reports those errors caused by these conditions:

  1. right-margin > page-width -- which seems reasonable, since any output past page-width would appear on the platen;
  2. left-margin-comp >= right-margin-comp -- which seems reasonable, since the left margin should be to the left of the right margin and there should be some space between them.
  3. right-margin - x-offset > '0.25i' where '0.25i' is
    1. in horizontal base units for the PS device or a test device configured to mimic PS; or
    2. based on 10CPI for other devices

It is very difficult to characterize the "other devices". These factors appear to have been ruled out:

  1. The driver name begins with "ps", unless the test device actually mimics the PS device metrics.
  2. The value of attribute y_positive is set to "no" and y-start is set to 66, producing a subtractive device with character metrics.
  3. The value of attribute horizontal_base_units is increased to 100 or to 1000. The results clearly show that the required minimum value is decreased by 10 or 100 in these cases.

For now, our wgml will simply enforce a minimum right margin of '0.25i'. The smaller values allowed for the "other" devices in some instances are probably not important: even with the PS device, a right margin of '0.25i' is not useable, since any tag or control word that affects horizontal positioning is likely to produce one error or another about not having enough horizontal space to work in. This minimum right margin, in other words, is not actually useable (in the sense that wgml 4.0 will actually process the document with it) except, possibly, in very simple documents.

The quantity "right-margin - x-offset", which may seem an odd quantity to base an error message on, is used as the value of the system symbol $pagerm.

Error messages involving command-line option BIND are discussed in the following sub-section.

Option BIND and Attribute binding

Command-line option BIND is documented in the WGML Reference section 14.3.2 in this way:

The two option values specify the default page margin values for the odd and even 
pages. If the value for even margin is not specified, the first value applies to both odd
and even pages. The initial default value is zero.

However, in wgml 4.0, the value of BIND appears to have only one use: triggering various error messages. It has no effect on either margin of any page seen in testing. It is most certainly not used as a "default page margin", if only because the default layout specifies a left margin. Overriding that default with the value of "0" for the attribute left_margin of the PAGE block in the LAYOUT block produces a left margin based on "0" regardless of any values used with command-line option BIND.

Attribute binding of the DEFAULT block in the LAYOUT, in contrast, is used to modify both the left and right margins of some pages (identified as "odd numbered") but not others, as stated in the WGML Reference. It was not seen triggering any error messages in testing, even when both margins were far to the right of the right edge of the page.

Command-line option BIND does not appear to be used in building the Open Watcom documents. The attribute binding is explitly set to 0 at all occurrences except for mspslay.gml in directory wgmlref, where a line setting it to '0.25i' is commented out.

What is not clear is what these values are intended to accomplish. The names suggest a relationship to the process of binding a book, and there have been and may still be in use some binding techniques that need the right margin on left-hand pages and the left margin on right-hand pages to be a bit wider than might otherwise be needed so that the text is placed far enough from the binding that it can be read. But whether that is what is going on here is anybody's guess. I suppose that, if the "odd numbered" pages were always right-hand pages, then the normal right margin could be set to be correct for left-hand pages and the value of the attribute used to shift the right-hand page margins outwards with the distance between the two margins remaining the same.

In our wgml, neither has any effect on either margin of any page. If this needs to be changed, it is suggested that:

  • A clear understanding of how the values are expected to be used is obtained.
  • Any margin that is placed beyond the left or right edge of the paper triggers an error.
  • If command-line option BIND triggers errors, then it also affects the margin(s).
  • If the attribute binding affects the margin(s) of any page, then it also triggers errors when appropriate.

Vertical Page Boundaries

This section is based primarily on testing done with a document specification containing text only -- no banners, no footnotes, no widowing, no other distractors. Limited additional testing shows that these items, provided the layout is set up to start them on the "first line", all show up on the same "first line" as that used with the document specification containing text only:

  • Section titles ("ABSTRACT", "APPENDIX", "PREFACE" or their user-specified replacements).
  • The first line of text produced by a "top" banner.

It thus seems quite likely that all of the contents of an output page must fit within the boundaries discussed here.

The code for our wgml was very helpful in identifying many of the details reported in this section.

Note: there are control words which affect page layout; the one for top margin is .tm. However, consulting Keyword Statistics shows that it is not used in the Open Watcom documents, and so it will be ignored for now.

The Basics

This is an idealization of how the vertical boundaries are computed.

Conceptually, there are three values that need to be computed in vertical base units:

  1. The value which will cause the first line of text to appear in the proper position: page-top.
  2. The value which will cause the last line of text to appear in the proper position: page-bottom.
  3. The depth of the page: page-depth.

The actual computations are affected by the value of attribute y_positive of the PAGEADDRESS block in the DRIVER block. This controls whether or not the vertical position increases or decreases as the output moves down the page.

Thus, if we wish to define page-depth in terms of page-top and page-bottom, we can write

page-depth = | page-top - page-bottom |

using the absolute value brackets to compensate for the fact that page-top can be either greater or less than page-bottom, depending on the value of the attribute y_positive. As a computation, this can be shown in psuedo-code as:

if( y_positive == 0 ) {
    page-depth = page-bottom - page-top;
} else {
    page-depth = page-top - page-bottom;
}

and the wgml code does in fact contain many sections that are "twinned", as it were, in this fashion.

In actual practice, however, it is page-depth that is given explicitly (as the value of attribute depth of the PAGE block in the LAYOUT block), and so page-bottom that must be computed. However, that may not be necessary: because the depth of each item placed on a page can be computed as a positive integer, and page-depth is a positive number, page-depth can be use directly to determine when a page is full without having to use the value of the attribute y_positive, producing simpler code.

Computing page-top, even on a basic level, is not as simple: there are at least three items that affect where the first line on a page will be located. These are:

  • The value of attribute y_start of the PAGESTART block in the DEVICE block.
  • The value of attribute top_margin of the PAGE block in the LAYOUT block.
  • The line height of the first line of text.

The Real page-depth Value

The value of attribute depth of the PAGE block in the LAYOUT block is converted by wgml 4.0 to vertical base units and is rounded using the algorithm given above. The attribute depth will not accept negative values, so only the algorithm for positive values is used.

When the wgml code was examined, it was reducing the page depth by the value of the attribute y_start of the :PAGEOFFSET block ("y-offset"). The position of the first line is not affected. This is quite apparent when the output text line at which our wgml and wgml 4.0 breaks the first page is compared with and without this adjustment. The pseudocode for this is:

page-depth -= y-offset; // for the page depth

Direct comparison shows that, when the system symbol $paged is given this value by our wgml, and the line

.ty $paged = &$paged.

is placed in the text file, then both wgml 4.0 and our wgml are assigning the same value to $paged. This system symbol is output by the PS driver in the START :INIT block, and so appears in every file produced for the PS device.

The Real page-top Value

If testing is done with a test device set up this way:

  • The metrics are typical of a character-mode device: 6 lines-per-inch, 66 lines-per-page.
  • The value of attribute y_positive of the PAGEADDRESS block in the DRIVER block is "no".
  • The value of attribute y_start of the PAGESTART block in the DEVICE block is "66".
  • None of the prefixes associated with augmented devices are present in the driver name, so that no "enhancements" are triggered.

then it is clear that the behavior seen with the PS device actually applies to any device for which the value of the attribute y_positive is "no" and the value of attribute y_start is a positive value clearly related to the number of vertical base units on a page. These devices will be referred to here as subtractive. Devices for which the value of the attribute y_positive is "yes" will be referred to as additive.

The actual computation of page-top turns out to involve several values (the terms in parentheses will be used in the formulas and discussions below):

  1. The value of attribute y_start of the PAGESTART block in the DEVICE block ("y-start").
  2. The value of attribute y_start of the PAGEOFFSET block in the DEVICE block ("y-offset").
  3. The value of attribute y_positive of the PAGEADDRESS block in the DRIVER block ("y-positive").
  4. The value of attribute top_margin of the PAGE block in the LAYOUT block, which must always be zero or positive and which is rounded using the algorithm discussed above.
  5. The line-height of the default font ("default-font-line-height").

The term top-margin will be used for the final rounded value produced for the attribute top_margin by wgml 4.0.

The difference between additive and subtractive devices is quite marked, and the basic reason appears to be this:

  • In an additive device, y-start represents the first printable line, unless it is "0".
  • In an additive device where the value of y-start is "0", and in a subtractive device, y-start represents a vertical position located one vertical base unit above the first available print position.

Unfortunately, what y-start represents does not, by itself, determine how simple or complicated computing and page-top is. For example, for subtractive devices the location of the first line of text on the page is a simple matter of subtraction; but for additive devices the situation is much more complicated.

When the DV Unit was examined, an interesting fact was discovered: instead of top-margin itself, wgml 4.0 first forms what will be referred to as the net-top-margin:

if( y-offset < top-margin ) {
    net-top-margin = top-margin - y-offset
} else { 
    net-top-margin = 0
}

The appearance of y-offset, that is, of the value of the attribute y_start of the PAGEOFFSET block, is very interesting. It implies that the net-top-margin begins at the physical top of the page and y-start starts below the space specified by y-offset; thus, the actual top margin will be the larger of y-offset and top-margin. While this appears to be the case with the PS device (where y-start is 10,800 vertical base units, 200 vertical base units below 11,000, the number of vertical base units on an 11 inch page, and y-offset is 200 vertical base units), it is not clear that this is the case for the character-mode devices available in the Open Watcom repository, since y-start is "1" and y-offset is "0".

If testing is done with a test device set up this way:

  • The metrics are typical of a character-mode device: 6 lines-per-inch, 66 lines-per-page.
  • The value of attribute y_positive of the PAGEADDRESS block in the DRIVER block is "yes".
  • The value of attribute y_start of the PAGESTART block in the DEVICE block is "0".
  • The value of attribute y_start of the PAGEOFFSET block in the DEVICE block is "0".
  • The value of attribute top_margin of the PAGE block in the LAYOUT block is "0".
  • The value of attribute depth of the PAGE block in the LAYOUT block is "66".
  • The value of attribute threshold of the WIDOW block in the LAYOUT block is "1".
  • All the fonts are associated with a line-height of "1".
  • There are no banners or other items affecting vertical positioning.

then if a document specification written so that the 66th line in the output file is inside a paragraph, the lines on the first page will be numbered from "1" to "66", not "0" to "65". This confirms the statement that a y-start of "0" designates a position above the first printable line. Presumably, for some older printers, the top of the paper was aligned in such a way that a NEWLINE block was needed to actually place the first line of the paper (as opposed to the platen) under the print head. The character devices available in the Open Watcom repository, which use the value of "1" for y-start, do not starting the output with a NEWLINE block.

Now consider the first line of text to appear on a page. This is the formula that would be the clearest possible:

if( y_positive == 0 ) {
    position = page-top - max( default-font-line-height, actual-line-height) 
} else {
    position = page-top + max( default-font-line-height, actual-line-height) 
}

The "actual-line-height" includes any other factors which cause lines to be skipped before the first line of a page.

The effect of this formula is quite interesting: given a document for which

  • The space to be skipped (from whatever source) before the first line is the same on each page.
  • If different default fonts are used on different pages, they all have the same line height associated with them.
  • On any page, the line height associated with the default font is no smaller than the line height associated with any font used on that page.

then the first line of each page in the document will start at the same vertical position. This, no doubt, improves the appearance on the document when printed.

Now the computation of page-top can be considered. For a subtractive device, it is quite simple:

page-top = y_start - net-top-margin

but for an additive device it is much more complicated.

The first step is to realize that, given both a non-zero y-start and a non-zero net-top-margin, they are not added together; instead, we have what will be referred to as net-y-start:

net-y-start = max( y-start, net-top-margin )

That is, the greater distance from the top of the page will be skipped.

The second step is to compute what will be referred to as the y-start-correction":

if( y-start > net-top-margin ) {
    y-start-correction = min( y-start - net-top-margin, default-font-line-height)
} else {
    y-start-correction = 0
}

It is this value that makes the formula for the position of the first line on a page work.

The final step is to compute page-top itself:

page-top = net-y-start - y-start-correction

This produces a value equal to the value inferred for wgml 4.0 by subtracting the relevent line height from the position produced for the first line.

The two formulas for page-top, one for subtractive devices and one for additive devices, appear to produce the correct values.

page-bottom and Related Issues

Testing for this section used the subtractive character test device described in the prior section. Some of the terminology defined in the prior section is also used here, as is the distinction between device and document pages.

With respect to the document page, page-bottom is defined exactly as stated above:

if( y_positive == 0 ) {
    page-bottom = page-top - page-depth;
} else {
    page-bottom = page-top + page-depth;
}

This applies to both document text if there is no bottom banner, and the last line of the banner area if there is a bottom banner (whether text appears there or not).

With respect to the device page, however, a new factor comes into play:

  • The value of attribute page_depth in the DEVICE block ("page_depth").

and the formula (using "page-bottom-dp" for the device page value) becomes:

if( y_positive == 0 ) {
    page-bottom-dp = page-top - min( page_depth, page-depth)
} else {
    page-bottom-dp = page-top + min( page_depth, page-depth)
}

This makes it possible to say that

a device page occurs whenever the current line would be printed below page-bottom-dp

For an additive device, this works fine and, in fact, the TERM device depends on it to avoid displaying more text than a standard DOS terminal screen can show at one time.

For a subtractive device, however, this can produce negative vertical positioning values. This can be observed with device PS by manipulating the value of attribute depth of the PAGE block in the LAYOUT block: since y-start is 10800 and y-offset is 200, a value of '11i' for the attribute depth will produce a page-depth of 10800, and so a page-bottom of 0. Larger values of the attribute depth will produce negative horizontal positions, limited by the value of the attribute page_depth, which is 10920 to -120 (10800 - 10920) -- which can be produced by making the value of the attribute depth '11.12i'. Values of the attribute depth greater than '11.12i' will trigger a device page.

Unfortunately, when an output file prepared for the PS device contains device pages, the Post Script interpreter will report errors and not display it properly. The reason for this is that the document page numbers are emitted into the output file, and they are not updated for device pages. Since wgml 4.0 does not support this situation properly (it should, presumably, be an augmentation), our wgml will report an error. This decision can, of course, be revised if necessary.

Now consider bottom banners -- not bottom banners as such, but rather how they affect the location of the last line of non-banner text. This appears to be determined by these factors:

  • The value of page-bottom.
  • A value to be referred to as "ban-depth".

The ban-depth is computed as follows:

if( a unit was used to specify the value of  attribute depth of the BANNER block in the 
        LAYOUT block ("ban-depth-orig") ) {
    ban-depth = value of ban-depth-orig in vertical base units
} else {
    ban-depth = ban-depth-orig * ban-line-height
}

where "ban-line-height is the largest line_height associated with any of the fonts specified in the various BANREGION blocks. This will quite likely not be the same as the line_height associated with the default font.

The lowest possible position of the last text line, which will be referred to as "page-bottom-botban", is then:

if( y_positive == 0 ) {
    page-bottom-botban = page-bottom + ban-depth;
} else {
    page-bottom-botban = page-bottom - ban-depth;
}

If there is no bottom banner, then page-bottom and page-bottom-botban have the same value.

Text may not actually appear on page-bottom-botban due to the effects of line spacing, widowing, specified skips, and perhaps other features of wgml 4.0. However, non-banner text will never appear below page-bottom-botban.

Once our wgml was producing the same vertical positions as wgml 4.0 (for the test cases tried, if not for all possible situations), an intriguing behavior was noticed: when the value of y-start was 10800, the value of the attribute depth was "11i", and the value of y-offset was greater than 200, then the value page-bottom-botban for subtractive devices became

page-bottom-botban = y-offset - 200

Testing showed that our wgml and wgml 4.0 were behaving the same way, and were placing the last line of text above the bottom banner identically. While this seems quite strange (where, for example, does the value "200" come from?), since our wgml behaves identically to wgml 4.0 in this respect, it is by definition correct.

The Height of the First Line

The first line on a page is treated a bit differently from the subsequent lines.

The term "first line" is a bit ambiguous when used in discussing how a page is laid out. In most cases, it excludes any text printed as part of the top banner, if one is present; that is, it refers to the first line of what might be called "ordinary text", if it weren't for the fact that it includes section headings. However, in this section, the meaning is different: the "first line" is literally the first line to appear on the page, even if it is part of a top banner.

The first line on a page, then, is positioned following this principle:

The line height used will be the greater of the first line's height (including the effect of 
any vertical line skips or the applicable spacing) and the height associated with the default font.

The "default font" is the value of attribute font of the DEFAULT tag in the LAYOUT section. So far, no situations where some other font was used as the default font for this purpose has been found.

This applies to both PS and character devices; indeed, it was first noticed in character devices, where it was considered (wrongly) as related to character devices with variable-height fonts; in fact, it applies to all devices, it is just that it has no visible effect on devices where only one font height is in use with all fonts.

Vertical Page Internals

This section discusses the internal vertical organization of the page. It is based on preliminary testing and is definitely not complete. It builds on the "page-top" and "page-bottom" values defined and computed in above.

An output page consists of four parts:

  1. the top banner;
  2. the page-width part;
  3. the main part; and
  4. the bottom banner.

Banners are quite interesting: each banner has a specified depth, which is reserved for it at the top or bottom of the page on which it appears. If there is not enough vertical space on a page for a banner, an error of this sort appears:

LO--003: For banner with docsect = abstract and place = top
         For banner with docsect = abstract and place = bottom
         Depth of banner(s) too large for a page

This creates a very simple situation: either the banners fit or document processing stops. Banners never flow onto the next page.

The top banner starts at page-top and ends at a value called the page-width-top, which is also the top of the page-width part of the page. If there is no top banner, then page-width-top = page-top. The basic relationship between them is:

top-banner depth = | page-top - page-width-top |

The bottom banner starts at bottom-banner-top, which is the end of the main part of the page, and extends to page-bottom. If there is no bottom banner, then bottom-banner-top = page-bottom. The basic relationship between them is:

bottom-banner depth = | bottom-banner-top - page-bottom |

The page-width part is used to hold items which extend across the width of the page even when the document text will have more than one column; it is this feature that requires it to be treated separately. So far, these items have been identified as going into this part:

  1. A section heading which is to be displayed and which is for a section which always starts on a new page.
  2. A FIG with place TOP and width PAGE.

Only one item can appear in the page-width part on each page; additional items awaiting placement are said to be deferred.

The page-width part ends at a value called main-top. If the page-width part has no contents, then main-top = page-width-top. The basic relationship between them is:

section heading or FIG depth = | page-width-top - main-top |

The main part extends from main-top to bottom-banner-top, that is, its depth can be defined as:

main-top depth = | main-top - bottom-banner-top |

The main part itself contains one or more columns, each of which is divided into three parts:

  1. the column main part;
  2. the bottom FIG part; and
  3. the footnote part.

Each column part is composed of one or more elements.

These items can be placed only in the first element in the column main part:

  1. A section heading which is to be displayed and which is for a section which only starts on a new page when the last page of the prior section is full in the same sense that a page within a section is said to be full.
  2. A FIG with place TOP and width COLUMN.

Only one such item can appear per column; additional items awaiting placement are deferred.

Otherwise, the column main part is where the bulk of the document is placed. As the page fills up, other items that go into this section may be deferred.

The column main part starts at main-top and ends at a value called the bottom-fig-top, which is also the top of the bottom FIG part of the column. If there is no bottom FIG, then bottom-fig-top = footnote-top (defined below). The basic relationship between them is:

colum main depth = | main-top - bottom-fig-top |

The bottom FIG part starts at bottom-fig-top and ends at a value called the footnote-top, which is also the top of the footnote part of the column. If there are no footnotes, then footnote-top = bottom-banner-top. The basic relationship between them is:

bottom FIG depth = | bottom-fig-top - footnote-top |

Only one FIG with place BOTTOM may be placed on any page. Testing suggests that, if a FIG with place TOP has been deferred while processing the same page as that on which a FIG with place BOTTOM would otherwise be put, that FIG with place BOTTOM is also deferred. It is not clear if a FIG with place BOTTOM is allowed to displace an element already in the column main part, or if it is deferred instead.

The footnote part starts at footnote-top and ends at bottom-banner-top, which is also the bottom of the column and, of course, the top of the bottom banner. The basic relationship between them is:

footnote depth = | footnote-top - bottom-banner-top |

Footnotes are placed in this section as they are encountered; it is not yet clear if wgml 4.0 permits them to displace elements in the column main part or if they are deferred at that point instead. Preliminary testing showed that footnotes following a FIG with place BOTTOM that was deferred are themselves deferred to follow that FIG.

Note that FIGs with explicit widths, and those with position INLINE, still need to be explored, as do the details of FIG placement in the bottom FIG part and footnote placement in the footnote part noted above and the interaction between FIG and footnote. There may be other issues that arise as well.

Vertical Line Skips

This section deals with "skips". Parts of it are based on the wgml code as it existed before page-oriented output was implemented.

There are several types of vertical line skips used in wgml 4.0 (the terms used to denote them when expressed in vertical base units appear in quotes after each is listed):

  • Values of attribute post_skip ("post-skip").
  • Values of attribute pre_skip ("pre-skip").
  • Values of attribute pre_top_skip ("pre-top-skip").
  • Values of attribute skip ("int-skip").
  • Those created by control word SK ("sk-skip").
  • Those created by control word SP ("sp-skip").

It is possible that others exist but have not yet been identified. The attributes, which apply to many tags in the LAYOUT section, will be treated as part of the "element" which a give tag in the LAYOUT section describes. On the page, this means that the skip (to the extent that it is used) will precede or follow whatever that element produces in the output file. To a great extent, this section is about the conditions under which, and the extent to which, a skip is actually used in producing the output file.

The WGML Reference distinguishes pre-skip, pre-top-skip, and post-skip in this way:

  • Attribute post_skip follows the element, but is ignored at the end of the page.
  • Attribute pre_skip precedes the element, but is ignored at the top of the page.
  • Attribute pre_top_skip precedes the element, even at the top of the page.
  • The pre-skip of the current element and any post-skip from the prior element are "merged".
  • The pre-top-skip of the current element and any post-skip from the prior element are "merged".

It also describes int-skip in this way:

  • Attribute skip is used between repeated instances of those elements using it.

Nothing is said about how sk-skip and sp-skip are treated, although the Waterloo SCRIPT document does explain how they worked in Script 88.1.

Nor is anything said about how pre-skip and pre-top-skip interact. This might seem either superfluous or obvious; that is, it might be thought that one of these conditions applied:

  1. A LAYOUT tag can have an attribute pre_skip or an attribute pre_top_skip, but not both.
  2. When a tag has both, then pre-top-skip is used at the top of the page and pre-skip is used everywhere else.

But this is not the case. The first LAYOUT tag discovered which has both the attribute pre_skip and the attribute pre_top_skip was APPENDIX. This appeared to be something that could be worked with: by setting the APPENDIX attribute page_eject to "no" and so ensuring that a heading would be emitted it would be possible to test the pre-skip and pre-top-skip with the post-script of the LAYOUT tag P as well as the control words SK and SP. However, the heading did not appear; investigation showed that a (normal, that is, non-LAYOUT) H1 tag was needed to produce the heading.

Investigating skips can be confusing because the actual space between two lines includes the height of the lower line. Testing with the fonts set so that they all have the same height makes it much easier to determine the number of lines skipped.

The second option shown above is so attractive that two the term "top-skip" will be used for the skip to be done at the top of a page and "subs-skip" (for "subsequent skip") will be used for the skip to be done elsewhere on the page.

Conversion to Vertical Base Units

The attributes (post_skip, pre_skip, pre_top_skip, and skip) all allow the use of units, and so are converted to vertical base units as discussed above.

The control words SK and SP take only an integer indicating the number of lines in addition to two other operands ("A" or "ABS" and "C" or "COND"). The difference between them appears to be the difference between attributes pre_skip and pre_top_skip: an sk-skip at the top of a page (or column) is ignored while an sp_skip at the top of a page (or column) is not.

Both allow the value "-1", which causes a line to be overprinted. Even using various-sized fonts in PS, the effect is that the line following the codeword is printed at the exact same position as the preceding line without regard to which line, if either, has a higher line height.

The value "0" causes SK and SP to behave exactly as BR does: a new line starts, but with no intervening blank line(s).

The space skipped is converted to vertical base units using this formula:

(number-of-lines * spacing * vertical-base-units/inch) / lines-per-inch

If the "A" or "ABS" option is present, then "spacing" is taken to be "1"; otherwise it is the unconverted value of the attribute spacing, as discussed here.

With the PS device and a default line height is 167 (that is, 12 points or 1/6th of the number of vertical base units per inch [1000] rounded up), and spacing equal to "1", a value of "3" for any of the attributes (post_skip, pre_skip, pre_top_skip, and skip) produces a skip of "501" (that is, 3 x 167) regardless of the LPI value. An SK or SP value of "3" produces "500" ((3*1000)/6) for the default LPI of "6" and "375" ((3*1000)/8) for an LPI of "8".

Merging Skips

Throughout the following subsections, the value of attribute spacing (discussed here) is "1".

Within each subsection, skips not explicitly mentioned are taken to have the value "0".

These results are reported in terms of a character-mode test device using NEWLINE blocks; a test device using the ABSOLUTEADDRESS block instead of NEWLINE blocks shows the same number of lines skipped, but the ABSOLUTADDRESS block only appears once, when the text is to be output.

sk-skip and sp-skip

For the first non-banner element, for values greater than "0", these rules apply:

  • When SK is used by itself, it is ignored.
  • When SP is used by itself, it is used.
  • When both are used, then the value used with SK is ignored and the value used with SP is used.
  • When SP is used twice in succession, then the sum of the two values is used.

Thus, SK is indeed ignored at the top of the page.

For the subsequent non-banner elements, for values greater than "0", these rules apply:

  • When SK is used twice in succession, then the larger of the two values is used.
  • When SK precedes SP, then the larger of the two values is used.
  • When SP precedes SK, then the sum of the two values is used.
  • When SP is used twice in succession, then the sum of the two values is used.

When the value of attribute spacing is greater than "1", it is the value given with the SK or SP control word multiplied by the value of attribute spacing which is used when the larger of two values is chosen. Use of operand "A" or "ABS" works for both SK and SP exactly as documented, so given this:

.sk 3
.sk 4 A

with the value "2" for the relevant attribute spacing, the skip used would be "6".

This behavior differs from what is documentated in the Waterloo SCRIPT document in these ways:

  • The same rules apply when "C" or "COND" is present; that is, this operand has no effect.
  • For the subsequent non-banner elements, SK and SP do not behave identically.

When the value used with SP or SK is "0", the above rules very likely apply, but, given the properties of "0", the effect is that it will always be ignored. A break will still occur, so this useage is not entirely without effect.

When the value "-1" is used with other values, the rules are:

  • This pattern
.sk -1
.sk 3

produces a skip of three lines followed by the next text line.

  • This pattern
.sk -1
.sp 3

produces a skip of two lines followed by the next text line.

  • This pattern
.sp -1
.sk 3 or .sp 3

produces a NEWLINE with "0" as the value of attribute advance and then a skip of three lines followed by the next text line.

  • This pattern:
.sk 3 or .sp 3
.sk -1

produces a skip of two lines followed by the next text line (which effectively overwrites the third line specified as being skipped).

  • This pattern:
.sk 3 or .sp 3
.sp -1

produces a skip of three lines followed by the next text line. (Literally: with the character-mode text device using NEWLINE blocks, it skips 2 lines, then 1 line, and then 1 line, thus clearly skipping the three lines first.)

When the value of the applicable spacing attribute is greater than "1", then ".sk -1" has the effect of overwriting the last blank line inserted because of the spacing. At the top of a page, even when the value of the applicable spacing attribute is "1", it is ignored. This means that a decision is made to place it at the top of the next page, rather than to overwrite the last line of the prior page, even though the effects of ".sk -1" are said to include not counting the next line against the number of lines allowed on a page.

In contrast, ".sp -1" behaves even more oddly; in addition to the effects shown above:

  • When used by itself, when the value of the applicable spacing attribute is "1", then:
    • For the first non-banner element, two separate NEWLINEs with "1" as the value of attribute advance appear.
    • For the subsequent non-banner elements, a NEWLINE with "0" as the value of attribute advance is emitted, but it is then followed by then a NEWLINE with "1" as the value of attribute advance.

SP was investigated because this page suggests that it is used by the Open Watcom documents. It should be implemented to mimic the behavior in wgml 4.0 only if necessary; if possible, it should be implemented the same as SK, except that it applies at the top of a page.

pre-skip and pre-top-skip

The rules appear to be:

  1. For the first non-banner element, pre-top-skip is used unaltered.
  2. For the subsequent non-banner elements, these rules apply:
    1. If pre-skip is "0", then the pre-top-skip is used.
    2. If pre-top-skip is "0", then the pre-skip is used.
    3. If both are not "0", then these rules apply:
      1. If the pre-skip is less than twice the pre-top-skip, then the pre-skip is ignored.
      2. If the pre-skip is more than twice the pre-top-skip, then the value used is
pre-skip - pre-top-skip

At least, that is the case with pre-skip of "1" and "2" and pre-top-skip of "0" through "5", and with a pre-top-skip of "1" and "2", and a pre-skip of "0" through "5".

pre-skip and post-skip

The rules appear to be be:

  • For the first non-banner element, pre-skip is ignored.
  • For the first non-banner element, since the prior element must be the last non-banner element on the prior page, post-skip is ignored.
  • For the subsequent non-banner elements, the larger of pre_skip or post_skip is used.

pre-top-skip and post-skip

The rules appear to be be:

  • For the first non-banner element, since the prior element must be the last non-banner element on the prior page, post-skip is ignored.
  • For the subsequent non-banner elements, the larger of pre_skip or post_skip is used.

Computing top-skip

This section describes how to compute the skip to be used on the first non-banner element placed on a page. But first, what it means to say that a skip is "ignored" requires clarification.

Suppose a non-banner element exists which has a post-skip of "3" and it happens that this element is placed on a page so that only two lines are left below it. In this case, the following (non-banner) element (provided the merged skip is still "3") will be placed on the next page and the two lines at the bottom of the page left blank. So, in a very real sense, two of the three lines to be skipped are, in fact skipped; only the third (the line which would be expected to appear before the next element) is ignored, and only for positioning that element on its page. Thus, top-skip is never used to determine on which page an element will be placed, but only its position on that page if it happens to be the first non-banner element on that page.

The rules appear to be:

  • post-skip, pre-skip, and sk-skip are ignored.
  • pre-top-skip and sp-skip interact very oddly; some observations:
    • With no sk-skip, sp-skip is divided between the prior page and the current page.
    • With an sk-skip, even "0", sp-skip, if greater than "0" becomes a one-line skip unless sp-skip > sk-skip + 2, in which case the value used is
sp-skip - sk-skip + 2 

Testing was not sufficient to completely describe what is going on. If this affects the Open Watcom documents, then it can be explored further.

pre-top-skip is not split between pages unless it is larger than the page-depth. This has been observed specifically with the TITLE tag. In this case, blank pages are emitted and the pre_top_skip is reduced by the page-depth until the first TITLE's text appears (or would appear, if there is no text) on that page.

However, this can only be done when it is known that the top-skip is to be used. Thus, at least until SP is implemented, this is the formula for top-skip:

top-skip = pre-top-skip

Implementing SP may require that its value be kept separate, as it may be split under different circumstances from pre_top_skip.

Computing subs-skip

This section describes how to compute the skip to be used on those non-banner element placed on a page which follow the first such element. As noted above, this value is also used to determine whether a given element will fit on the current page or will need to be placed on the next.

Although there are several pieces to the puzzle, they turn out to fit together reasonably clearly. Items not mentioned in a particular rule are understood to be "0", and all of those mentioned are understood to be greater than "0". The rules appear to be:

  • pre-skip and pre-top-skip are merged as discussed above ("cur-el-skip").
  • post-skip and sk-skip are merged by taking the larger value ("pre-el-kip").
  • pre-el-skip and cur-el-skip are merged by taking the larger value.
  • sp-skip is added to the result of merging pre-el-skip and cur-el-skip.

So the formula for subs-skip turns out to be

subs-skip = sp-skip + max( merged( pre-skip, pre-top-skip ), max( post-skip, sk-skip )

Testing was not sufficient to be certain that this completely describes what is going on with sp-skip. If sp-skip causes other effects with the Open Watcom documents, then it can be explored further.

Implementation Notes

The key to implementing the vertical line skips and the blank lines (when they need to be output) is a common function which performs these actions:

  • Takes the structs encoding the current element's post_skip, pre_skip, and pre_top_skip attribute values and also the integers encoding the current spacing attribute value and the current font.
  • Obtains the pre-skip and pre-top-skip values in vertical base units.
  • Obtains the sk-skip and sp-skip values in vertical base units.
  • Resolves the existing post-skip value (from an extern variable containing the prior element's value), the produced pre-skip and pre-top-skip values, and the sk-skip and sp-skip values into the top-skip and subs-skip appropriate to the current element.
  • Stores the values of the top-skip and subs-skip into extern variables.
  • Replaces the value of the extern variable storing the post-skip value with the value of the attributes post_skip of the current element in vertical base units.
  • Replaces the value of the extern variable storing the spacing with the correct value in vertical base units based on the parameters.
  • Replaces the value of the extern variable storing the space occupied by any blank lines to be output with the value in vertical base units computed from the current number of such lines.

At the end, then, these globals will have the correct values:

  • One storing the top-skip to be used with the current element.
  • One storing the subs-skip to be used with the current element.
  • One storing the post-skip of the current element, to be used with the next element.
  • One storing the current spacing, to be used with the individual text lines.
  • One storing the space occupied by blank lines which need to be output.

The number of blank lines which need to be output is accumulated as they are encountered and identified, and is stored in an extern variable. Note: as discussed below, this variable will only have a non-zero value when a blank line is encountered while concatenation is off.

The existing implementation of SK was modified slightly to do this:

  • Set an extern flag if the value is "-1".
  • Clear the same extern flag if the value is not "-1".
  • Replace the existing value, stored in an extern variable, with a value depending on whether or not the current value is "-1":
    • if it is, use the existing value minus one;
    • it is not, use the larger of the current and existing values.
  • Provide a function to convert the current value (treating "-1" as "0") to vertical base units.

Note: implementation of SP is deferred until it is needed and its usage can be examined to see how many of its peculiarities are used by the Open Watcom document specifications.

The treatment of "-1" is intended to match how it is treated for SK by wgml 4.0. Since SK causes a break, the text following an ".sk -1" will be placed in the first line of a new element. The extern flag is used when vertical positioning is done to assign treat the line as having a height of "0" except at the top of a page, where it is ignored. Within a page, this will place it at the same vertical position as:

  • The value produced by applying ths spacing, provided it is greater than zero.
  • The value produced by applying the top-skip or subs-skip (whichever is applicable), provided it is greater than zero.
  • The value assigned previously to the last line of the prior element.

depending on how the the vertical position to be adjusted by the current line's line height was computed. The height of the line will also be treated as "0" when determining how much space it will occupy on the page, unless not doing so would place it at the top of the next page.

When an instance of the struct encoding the next element is created, then this is done:

  • The fields overprint, top_skip, subs_skip, spacing, and blank_lines are copied from the extern variables.
  • The extern flag set when ".sk -1" is encountered is cleared.

For text, in most cases, this is done when full or partial text lines are finalized (e.g., justification) and added to the existing element: if there is no existing element, a new instance is obtained. when a break occurs, then the last partial line is added to the element and it is added to the page.

When the element is processed, the fields overprint, top_skip, subs_skip, spacing, and blank_lines are used both to determine if the element will fit on the page and when assigning vertical positions. Assignment of vertical positions is deferred until the starting point of each part of the page has been finalized.

In some cases, currently restricted to the tags found within the TITLEP/eTITLEP pair, to section headings, and to banners, the element is obtained and initialized by the implementing function. These functions also complete each element they create and add it to the page. Except for banners, these items all do vertical line spacing in the context of the system discussed here, although they may (in calling the function discussed above) pass unusual values as parameters: those with an attribute skip pass the value of their attribute pre_top_skip for the first use of the tag only and the value of their attribute skip value for subsequent uses of the tag only. The list tags have similar peculiarities.

Banners do not participate in this system in any way. Instead, they use the field top_skip to hold the appropriate value, based on their attribute voffset, to be skipped from the top of the banner.

A curious phenomenon was noted in testing: if a sequence such as

:P.Some text.
:P. :P. Some more text.

is encountered, then, unless both attribute post_skip and attribute pre_skip of LAYOUT tag P are "0", the space between the paragraphs will twice what it would normally be plus one blank line and, when they are both "0", a single blank line will appear. Indeed, testing with a test device shows that wgml 4.0 actually repositions the device for each tag P and that a character-oriented device will output lines at that point. Since the only character device whose output is changed is TASA, it was hoped that our wgml would not have to do this; however, further testing showed otherwise.

The first indication that an empty element was needed was with tag AUTHOR. If the first AUTHOR tag contains no text, but it will fit entirely on the first page, then the second AUTHOR lines appears following the "skip" at the top of the next page. However, if the top-skip of the first AUTHOR tag will fit but the empty text line will not, then the top-skip is used at the top of the next page. Simply merging the skips does reproduce this distinction: an empty element is the simplest way to do it.

Further testing using GRAPHIC showed that the TITLE tag worked the same way. Indeed, so far every tag used with TITLEP works this way, in part because every attribute pre_skip and skip actually works the way attribute pre_top_skip works. Once this effect was found, it turned out that something similar applies to tag P as well: when there is no top-skip, the subs-skip itself does not appear at the top of the next page, but the blank line does, followed by the subs-skip for the following paragraph. So another way of looking at this is that the blank line must be output to position the following elements properly.

But this did not apply to tag PC: an empty PC has no effect on vertical spacing (or, so far as I can tell, anything else). And an sk-skip before a PC tag with no text is still effective, so the effect is as if the PC did not exist. And tag LI appears to work the same way.

When the LAYOUT is altered so that NOTE does not have a prefix string, then it appeared that NOTE worked as P and the TITLEP tags did, but it doesn't: only the blank line is emitted, the skips are not used, unless preceded by an sk-skip, in which case the .sk is used but the blank line is not emitted. It is, in other words, as if "1" were substituted for whatever the original pre-skip was, and is merged normally with the sk-skip. Tag LP appears to work the same way.

It seems unlikely that this confusion was intentional. It seems more likely that, in each case, it is a consequence of how the tag is implemented, and may depend on as-yet undiscovered factors.

For now, tags PC, LI, NOTE (with no text), and LP are all ignored when empty, but swallow any preceding sk-skip. If the Open Watcom documents turn out not to use this "feature", all tags to which the issue applies should be treated the same way. Which way is not, however, clear at this point.

As to the list blocks SL/eSL, UL/eUL, and OL/eOL: these are not allowed to be empty by wgml 4.0. They must contain at least one LI or LP tag. Indeed, the first actual tag following the SL, UL, or OL tag (as opposed to a control word or a macro, whether the macro is used as a control word or a tag, all of which are allowed) must be LI, LP, or one of the General Elements (CMT, IMBED, INCLUDE, and SET); bare text is also not allowed. This has been implemented.

The ADDRESS/eADDRESS block revealed a peculiarity: if it is small enough to fit on an empty page, but cannot fit on the current page, it is placed on the next page instead. The address, in other words, is kept together if possible. If this is not possible, it appears to be output as a sequence of normal text elements. This much has been implemented.

Other parts of the ADDRESS/eADDRESS block handling have not been implemented:

  • When the block will not fit on any page, wgml 4.0 was observed placing the text of an ALINE on line 59 of a text device with only 58 lines per page, and our wgml, with the same test block, was observed to go to line 60. This still needs to be investigated; our wgml should honor the specified page depth, unless the OW docs require a more relaxed approach.
  • In another test, a GRAPHIC tag appeared to cause the block to be split even if it would all fit on the current page. This will be explored when the GRAPHIC tag is implemented.
  • If a .co off/.co on block containing blank lines follows eADDRESS or eTITLEP, and it is large enough that both the blank lines and the ADDRESS/eADDRESS block would not fit on the current page, then the ADDRESS/eADDRESS block appears on the next page, as if the blank lines were considered part of the block. This is likely to be implemented only if needed by the OW documents, since it makes no sense.

Similarly, with the PS device, when blank lines occur in the middle of a page, wgml 4.0 can place text below the stated end of the printable area (as defined by the value of attribute depth in the PAGE LAYOUT tag). It is not entirely clear when this happens; what is clear is that the presence of a bottom banner prevents it. It is possible that the value of $lc, which appears to count the number of lines per page without regard to the height of the default font, may be allowing lines to be output until one of them is entirely below the stated lower edge of the printable area. Or it may be something else altogether. This, also, is unlikely to be implemented unless needed by the OW documents.

Vertical Line Spacing

This section discusses how the value of attribute spacing of several tags in the LAYOUT section is used. The WGML Reference describes the use of these attributes in this way:

This attribute accepts a positive integer number. The spacing
determines the number of blank lines that are output between text
lines. If the line spacing is two, each text line will take two lines in
the output. The number of blank lines between text lines will
therefore be the spacing value minus one. 

This is very nearly true for devices for which every font is defined to have the same height, but doesn't work quite as stated for devices for which fonts of different heights are defined.

It is only "very nearly true" because the top line of a page does not use the spacing. This, of course, is simply the way double-spaced typed documents, for example, were generally expected to behave: the extra blank line was placed between lines on the page, but not before the first one. However, for wgml 4.0, the "first line" is the first line following the top banner, if a top banner is present.

This may not appear to be the case in some situations: if a vertical line skip applies to the first line, that skip will occur and may be affected by the spacing as noted above; however, the spacing as such is not used with the first line.

There is a certain ambiguity in the use of the term "spacing":

  • It can mean the value of the currently-applicable attribute, an integer number of lines.
  • It can mean the actual distance, in vertical base units, to be inserted between lines.

These are distinct even for a character-mode device, where each line has height "1": the second value is one less than the first in this case.

For a device such as PS, the decremented number of lines is multiplied times a line height. However, when a device uses fonts of different heights, then the height of a line is the largest height associated with any font appearing in that line. Testing with varying font heights shows that the font height used is the default height applicable to the element in which the line appears.

Thus, if a section heading is made large enough to require two lines, and that section is given the value of "2" for its spacing attribute, the lines will be separated by an additional number of vertical base units equal to the height of whatever font is specified for the heading. The text in that section will also be double-spaced, but the font will be the default font or a font specified for a hightlighted phrase or other applicable element.

For lines which are not part of an element with a specific spacing value, the value used will be that given to the spacing attribute of the DEFAULT tag. Similarly, for lines which are not part of an element with a specific spacing value, the value used will be that given to the the height will be that associated with the font attribute of the DEFAULT tag. Under no conditions will the height used be the height of that line, except by coincidence.

Blank Input Lines

Blank input line handling depends on two items:

  • Whether "script", "noscript", or "wscript" is in effect.
  • Whether concatenation is on or off.

Ignoring "script", since the Open Watcom documents all use "wscript", and noting that, if "noscript" is in effect, concatenation will be on throughout the document since that is the default and there is no way to change it except the use of control word CO, this leaves the interaction of "wscript" and CO to be considered.

If concatenation is on, blank input lines are treated as whitespace. Since line-end characters are also treated as whitespace, and since, with "wscript" in effect, the amount of whitespace between non-whitespace characters is not relevant, this amounts to ignoring blank input lines.

So blank input lines only affect the Open Watcom documents when concatenation is off.

The basic effect is that a blank line appears in the output file. This is not done by actually emitting a blank line; instead the vertical position of the next output line is adjusted to skip the required space.

However, this is no more a "skip" in the sense discussed above than it is an empty text line: when the empty line would appear at the bottom of a page but the following text would not, then the following text uses the appropriate top skip; but when the empty line would appear at the top of the next page, then both it and the appropriate subsequent skip are used. So the vertical positioning is set as if the line were actually being output, but actual output does not occur.

This applies to each blank line separately: if there are two such lines, and one can fit on the bottom of a page, the second will go to the top of the next page, as least as far as the effect on the vertical position of the following text line is concerned.

More precisely, it applies to the space occupied by the skipped lines in vertical base units: the space is applied to the bottom of the first page and, if not exhausted, the remainder is applied to the top of the next page.

The implementation section above has some notes on how this was implemented.

A few additional peculiarities were noted:

  1. When testing with the PS device, a situation in which there must be enough space on the current page for not only the blank lines, but also for the subs-skip and the text itself as well to prevent the text from appearing on the next page was seen one day but not the next.
  2. When testing blank lines at the end of the ABSTRACT section, an instance was found in which wgml 4.0 did not move a line to the next page, even though there was not enough room to print it properly. It is as if, in some contexts, wgml 4.0 uses a different skip distance for the blank lines than it does in other contexts.
  3. When testing blank lines between the eTITLEP and ABSTRACT tags, as the number of blank lines was increased, at some point the ADDRESS/eADDRESS block moved to a second page. It is as if the blank lines following the block are somehow used to determine whether it will fit on the current page.

No attempt has been made to implement these.

That wgml 4.0 does different things on different days is so strange that it must be incorrect, and yet, while testing new section processing, I suddenly found wgml 4.0 putting the section heading, on multi-column pages, at the top of the first column regardless of the value of attribute page_eject. I had earlier confirmed, by explicit and detailed testing, that wgml 4.0 was, in this same situation, placing the section heading full-width at the top of the page except when the value of page_eject was "no". This sort of behavior can best be attributed to the influence of other factors which I did not take notice of but which varied from one occasion to the other.

When a Page is "Full"

While this might seem to be a no-brainer, there are actually two definitions in use:

  1. within a section; and
  2. at the end of a section.

This discussion depends on the page parts discussed above.

Within a section, a page is "full" when the main part of the page is full, that is, when no further elements can be added to it. In practical terms, this means that at least one element that would normally be placed in the main part of the last column on the page has been deferred. The point, of course, is that, if such a deferred element does not exist, then continuing to process the document specification may produce additional elements that can be placed in the main part of the last column on the page.

However, at the end of a section (including the end of the document) there will be no more elements from the document specification until the new section starts (at the end of the document, there won't be any more at all). A page is "full" in this situation as long as any element, for any part of the page or last column, is still deferred.

Thus, if multiple FIGs with place TOP and width PAGE are placed in a short document section, enough pages containing the banners and exactly one such FIG will be produced at the end of that section. This is most obvious when the next section forces a new page; when it does not, preliminary testing suggests that multiple "pages" may be combined into one. Clearly, this point requires more investigation.

Personal tools