GML Tag Notes

From Open Watcom

Jump to: navigation, search

Contents

Introduction

This page is intended to accumulate notes on how wgml 4.0 uses the various tags. The information presented here should be used to update the WGML Reference so that it actually describes wgml 4.0.

Since I have been working with those tags used with the device library, the entries will include and may well be dominated by those tags. It might be wondered how the information here differs from that presented in discussing the device library.

The pages which deal with the device library are concerned primarily with where the tags appear in the source file and how they are encoded in the binary file. This page is intended to discuss how those tags (or, more correctly, the blocks defined by these tags) are used by wgml 4.0.

Attribute/Value Pairs

This section is based on research done for tags :BINCLUDE and :GRAPHIC, but is believed to apply to all tags, or, at least, to all non-LAYOUT tags. The rules presented are consistent with the WGML Reference and the behavior of wgml 4.0.

The values used with control words appear to have different rules. In particular, control word .tb is said to allow "/" as a delimiter.

The rules for attribute/value pairs used with tags appear to be:

  1. An equal sign (=) must follow the attribute name.
  2. White space may occur before or after the equal sign.
  3. White space, a period (.), or the end of the input line terminates the value.
  4. The attribute name, equal sign, and value must all appear on the same input line. (If multiple attribute/value pairs exist for a given tag, each attribute/value pair must be on the same input line, but different attribute/value pairs may be on different lines.)
  5. Any value may be delimited. It may then contain white space.
  6. There are two delimiters: the double quote (") and the single quote or apostrophe (').
  7. If a value starts with a delimiter, it must end with the same delimiter.
  8. A value containing a period (.) must be delimited, as the period will be taken as the end of the tag if it is not.
  9. If the delimiter in use occurs twice within the value as written, it becomes a single instance of the same delimiter within the value as used.

Thus, the value "'p''age'" in the document specification will cause the value "p'age" to be assigned to that attribute.

Function get_att_value() implements these rules.

Odd and Even

This section discusses the terms "odd" and "even" as used in the WGML Reference.

The WGML Reference describes the use of "odd" and "even" as values of attribute page_eject, used by such sections as ABSTRACT, in this way:

The values odd and even will place the heading on a new page if the
parity (odd or even) of the current page number does not match the
specified value. 

However, wgml 4.0 actually starts the new section with a new page of the proper parity and places the heading, if any, on that page. If that is not the next page after the end of the prior section, then the prior section is extended by one page: the banners for the prior section appear on this page, which is otherwise blank.

And the page number used here is not what "current page number" might imply: it is, in fact, the absolute page number, which starts at "1" for the first page of the document and is incremented by 1 for each subsequent page. It is never reset.

The banners which use "topodd", "topeven", "botodd", or "boteven", on the other hand, use the value that "current page number" does imply: the page number of the current section. This value can be reset at the start of quite a few sections.

The control word PA also uses the value of the "current page number" to determine the parity of the next page. It inserts a blank page when needed to give that page a page number of the correct parity.

This is easily verified using the PS device: this device requires wgml 4.0 to insert into the output file the absolute page number, which GhostView will display at the bottom of its window. It is quite easy to see that, with a one-page Title Page and an ABSTRACT section that is marked "even" and which resets the page count, the first page in the ABSTRACT section is on page 2 of the document but uses the "topodd" and "botodd" banners with the $pgnuma symbol showing "1".

BINCLUDE and GRAPHIC

These tags are discussed together because they are said to work with "graphic or non-textual data" (tag BINCLUDE) or "a graphic image file" (tag GRAPHIC). It would, then seem reasonable that they share at least some behavioral characteristics.

Including graphic data in a text document, at the time wgml 4.0 was developed, was still a problem awaiting a general solution. It should come as no surprise, then, that these tags are very much device-specific solutions.

What may be surprising is that, in the Open Watcom documents, they are not used with binary data, as such, at all. Instead, they are used with text files. It should be kept in mind that wgml 4.0 treats input files as a sequence of bytes, and so this discussion ignores multi-byte encodings. And, since wgml 4.0 only exists in DOS and OS/2 versions, the only "newline character" known is the newline sequence consisting of the two bytes "0x0d 0x0a".

Binary Data

To say that a file contains binary data is to say that any byte value may appear in the file. In particular, it means that byte values below 0x20 may appear in the file, including several which can cause problems if the data is processed as text.

Some binary files encode the binary data so that it can be processed as text. The phrase true binary data will be used to exclude such data, since it does not pose the same problems.

The Open Watcom documents do use files containing true binary data: when processed for device WHELP, BMP image files are specified with HBMP. However, HBMP is not a tag but rather a macro, and all it does is pass the name of the BMP file on to whlpcvt. The file itself is not processed in any way by wgml 4.0.

When a BMP file was used with tag BINCLUDE, the output was the same as that documented below.

When a record type was used with the file, these results occurred only with the BMP file, not the text files, probably because the text files used with BINCLUDE have lines that fit in the relevant buffers:

  • If device TERM is used, then output halts and this message is produced:
IO--004: System message is 'No space left on device'
         Error number is 12
         Output operation failed
  • If device PS is used, then output halts and this message is produced:
IO--011: Output file's record size is too small for
         the device 'ps'
  • The remaining devices tested (mostly TASA, but also HELP and WHELP as a check) reported no such problems, yet each of them has an output file record size which is certainly too small for the amount of data produced, especially when a record type was specified (81,383 bytes from the start of the second line to the end at the first 0x0a byte).

When the BMP file was used with tag GRAPHIC with device PS, this error message resulted:

IO--011: Output file's record size is too small for
         the device 'ps'

Otherwise, the behavior is the same as is discussed below.

The WGML Reference has this to say about tag GRAPHIC:

If the image file is not a PostScript graphic, a special validity check is performed on the file to 
determine if it is a WATCOM GKS PXA image file. If it is not a PXA file, it is assumed to be a 
PostScript graphic file. PXA files are supported with PostScript, HP LaserJet Plus, and IBM PC 
Graphic printers, although grey scales are only supported with a PostScript device.

This suggests that GRAPHIC, at one time, did process a graphics file, although whether the PXA file contained true binary data or not is unknown.

Some information on GKS is available: the file DOCS\DOC\FG\fgkslib.old documents it. It was (essentially) a 16-bit DOS TSR ("Graphics Kernal System") which (as other programs did at the time) was oriented toward displaying graphics on as many graphics adapters (Hercules, Tandy, Monochrome, CGA, VGA, perhaps others) as possible, and which included an internal device, the Pixel Array, which produced PXA files. Since the same function that produced PXA files also produced metafiles (MET), it is likely that PXA was a true binary format. This one file (DOCS\DOC\FG\fgkslib.old) is the only trace found in the repository: neither GKS, nor source for GKS, nor any PXA files can be found. It is, of course, possible that the DOS graphics package was developed from GKS.

Image Files Actually Used in the Open Watcom Documents

Since BINCLUDE and GRAPHIC are only used with device PS in the Open Watcom Document Build System, and searching the repository shows that they are invariably used with PS (or EPS) files, examining those files may be useful. There are several sets of PS or EPS files which produce images (there are many more PS files which do other things), all of which contain files which are definitely used except as noted:

  • Those stored in docs\doc\gml, created by various programs.
  • Those stored in docs\doc\wgmlref\screens, no creator given.
  • Those stored in docs\doc\lr\gp, created by CSG Graphics Screen Capture.
  • That stored in docs\gml\help, no creator given, and not used.
  • Those stored in docs\ps\tmp, created by bmp2eps.

Those stored in docs\doc\gml are:

  • light2.eps, created by CSG Bitmap to EPS Converter
  • ltning.eps, created by FreeHand 7.0
  • owlogo.eps, created by Corel Draw
  • pwrs.eps, named "UNTITLED.CDR from CorelDRAW!"
  • rule6x8.eps, no creator given.
  • rule7x9.eps, no creator given.
  • rule.eps, no creator given.

Of these, light2.eps is not used; the others are used at least once each.

None of these contain true binary data.

The files

rule6x8.eps rule7x9.eps rule.eps (in docs\doc\gml)

are the only files used with BINCLUDE. A different version of rule.eps is found in docs\gml\help. but is not actually used. These files draw the double line at the top of the first page of each chapter. They contain no binary data in any form, and so are pure text files.

The files in docs\doc\wgmlref\screens use a lot of PS macros defined in ezamble.ps (found in docs\doc\gml); however, consulting a PostScript manual shows that it is using hexadecimal string literals. These objects are strings enclosed in "< >" pairs and consisting of two-hexadecimal-digit pairs which together define a single hexadecimal byte value. In other words, thse are text files with encoded binary data.

The files in docs\doc\lr\gp, created by CSG Graphics Screen Capture, also provide the data in two-hexadecimal-digit pairs, but without the "< >". It appears that these files define macros which process this data and make it useable by the PS interpreter; what is clear, however, is that these are also text files with encoded binary data.

The files in docs\ps\tmp, created by bmp2eps, specify two filters to be used in decoding the data:

  • the filter ASCII85Decode and
  • the filter RunLengthDecode

Although the data clearly does not consist of two-hexadecimal-digit pairs, it is also not true binary data: the lines are completely even (no unprintable characters are displayed, the only character values below 0x20 are those involved in marking the end of each line). The binary data was first encoded by the RunLengthEncode filter and then by the ASCII85Encode filter. The filters are all part of the PostScript language, and so these files also are text files with encoded binary data.

Indeed, it seems likely that PostScript is limited to text files, although, given the existence of the filters, this is not much of a limitation.

The result is clear: we need only consider how to implement BINCLUDE and GRAPHIC for text files. There is no need to worry about processing true binary data.

BINCLUDE

As noted above, it is only necessary here to consider text files and how they are processed by tag BINCLUDE.

The attributes for BINCLUDE are all required. They are used in this way:

  • attribute file provides the name of the input file to process;
  • attribute depth provides the vertical space the contents of the input file will occupy when those contents are processed; and
  • attribute reposition advises wgml whether or not it needs to reposition the print position.

Note that attribute reposition is device-specific, which is an unusual feature for a tag.

The value of attribute reposition is not, however, the only device-specific aspect of BINCLUDE: since the text in the input file appears in the output file as-is, it must be compatible with the device being used. In particular, because a PostScript interpreter expects to see output text enclosed in parentheses, any text not so enclosed will be treated as PostScript language statements, which means that the text in the input file must be, in fact, composed of PostScript language statements, which, of course, would generally be meaningless to any other device.

Three files, all in docs\doc\gml, are actually used with BINCLUDE:

rule6x8.eps rule7x9.eps rule.eps

These contain PostScript language statements. Although some testing was done with a text file containing ordinary text (that is, not containing PostScript language statements), the discussion will be mostly concerned with these three files.

Attributes depth and reposition

The attribute reposition is used only to determine how the vertical position of the text following the BINCLUDE tag is computed. The precise behavior depends on the value of the attributes reposition and depth:

  1. When the value of attribute reposition is "start", then the vertical position of the following text is adjusted by the value of depth. If the value of attribute depth is is not "0", then a character device will output blank lines. A value of "0" for attribute depth, of course, will have no effect.
  2. When the value of attribute reposition is "end", then then value of attribute depth is ignored and the vertical position of the following text is not affected. With a character device, of course, any newline sequences output by BINCLUDE will, in fact, change the vertical position on the printed page or in the output file.

The vertical positioning of the following text is done in the normal fashion; BINCLUDE behaves as if it is merely incrementing a value that produces the vertical position of the next text item.

Record Types

Record types have some surprising effects.

If the record type used is "(t:0)", then the following error appears:

IO--001: For file '(t:0)rule.eps'
         System message is 'No such file or directory'
         Cannot open file

That is, the record type is treated as part of the file name instead of being detached and (in the case shown) "rule.eps" being passed to the operating system as the file to open (which is far more likely to work than "(t:0)rule.eps" is).

When any of these forms of the record type:

(t) (t:) (t:<n>) (where <n> is any positive integer)

then the basic action of BINCLUDE can be given as:

the text in the input file is placed in the output file almost without alteration

The exception is:

every "0x0a" byte is replaced by a newline sequence ("0x0d 0x0a")

However, this exception has an exception:

if a "0x0a" byte is immediately preceded by a "0x0d", then it is not replaced

That is, existing newline sequences ("0x0d 0x0a") are not affected. For DOS and OS/2 (and Windows), this means that an actual text file will be emitted exactly as-is.

Unless otherwise noted, "record type" will refer hereafter to a record type of the form "(t)", "(t:)", or "(t:<n>)".

The file rule7x9.eps does not have a newline sequence ("0x0d 0x0a") at the end of its second line, and that line does not appear in the output file when a record type is given. It does appear when no record type is given, so another effect of providing a record type is:

the last line of the file is ignored

It should be noted that, when the last line of text ends with a newline sequence ("0x0d 0x0a"), then the file ends in an empty line which does not end with a newline sequence ("0x0d 0x0a"), which then becomes the line which is ignored.

Output Sequencing

The output produced by the BINCLUDE tag can be divided into three steps:

  1. Prior text is flushed.
  2. A prefix, which may be empty, is emitted.
  3. The data from the input file is placed in the output file.

The first step simply ensures that all text occurring before the BINCLUDE tag is written to the output file before the BINCLUDE tag writes any. The text is not followed by a newline sequence ("0x0d 0x0a").

The second step depends on a variety of factors. For most elements produced by the BINCLUDE tag, that is, those that follow another element, these conditions apply:

Condition     PS device                      character device
depth = 0     positioned to the last print position (both devices)
no skips      uses ABSOLUTEADDRESS block     does nothing at all, which has the same effect
depth = 0     same as with no skips
skips         skips "pass through" to following element
depth > 0     positioned to left margin of current line (both devices)
no skips
depth > 0     positioned to left margin of a line reflecting the skips (both devices)
skips

The skips referred to are any or all of:

  • blank lines;
  • SK values greater than "0"; and
  • attribute post_skip values greater than "0" from the preceding block

These are merged normally.

For those elements produced by the BINCLUDE tag that are the first element on the page (which, in this case, means that there is no top banner as well as no section heading or other element before the BINCLUDE element) these conditions apply:

Condition     PS device                            character device
depth = 0     positioned to the values in          does nothing at all; effect is to position
              the PAGESTART block                  to "0,0"
depth > 0     positioned to the left margin and the vertical position in the PAGESTART block
              (both devices)

When positioning occurs, it is done using the usual methods: the ABSOLUTEADDRESS block for the PS device, and the appropriate NEWLINE block and perhaps the TAB block for character devices. When resetting to the left margin of the same line, a character device will treat this as an overprint.

The third step, that is, output of the input file, depends on whether or not a record type is provided, as described in the WGML Reference:

The required attribute file specifies the name of the file to include. The value of the
attribute is a character string, and may be any valid file name. The input file is
processed as containing binary data. If the input is text data, a record type such as
"(t:80)" must be prefixed to the file name.

More precisely, if no record type is given, then output is done in this way:

  1. The data is broken up into 80-byte blocks.
  2. A newline sequence ("0x0d 0x0a") is emitted after each 80-byte block except the last.
  3. A newline sequence ("0x0d 0x0a") is emitted after the last 80-byte block if the value of the attribute depth is not "0".

This, it appears, is what "processed as containing binary data" means. As noted in the discussion of record types above, when a record type is given, and the file is, in fact, a text file, then output is done this way:

The data is output as-is, with almost no processing at all.

The exception, discovered during implmentation is:

When the value of the attribute depth is "0", then any final newline sequence ("0x0a 0x0d") 
found in the input file is omitted.

This is all quite apparent in the test devices, where the text just appears, inserted into the normal sequencing used to output any preceding text, to position to print head, and that used to set up for any following text, and where comparison with the original will show "extra" newline sequences ("0x0d 0x0a") in various locations. This is especially clear when wdump /b is used to display the contents byte by byte.

Breaking the input into 80-byte blocks can, of course, cause words to be split between the blocks, an effect clearly visible with both character devices and the PS device and which generally, with the PS device, results in the PostScript interpreter encountering an error and halting.

Of the three files actually used with BINCLUDE, rule6x8.eps and rule7x9.eps are always used without a record type; rule.eps is used both with and without a record type.

When used without a record type, they work because the first line is empty, that is, contains two bytes ("0x0d 0x0a"), and the second line has 78 characters in it so that the first 80-character buffer corresponds to the first line of text, at least in operating systems using a two-character sequence for end-of-line. Since the second line is followed by a newline sequence ("0x0d 0x0a"), they would probably work even in Linux, the first block starting and ending with a Linux newline character. The newline sequences ("0x0d 0x0a") inserted in the output causes a blank line to appear in the output file that was not in the input file. Since there is only one more line, and it has 77 characters, no PostScript language elements get split.

The file rule7x9.eps does not have a newline at the end of its second line, and, as noted above when the effects of providing a record type were discussed, that means that that line does not appear in the output file when a record type is given. It does appear when no record type is given, which is probably why it is used that way in the Open Watcom documents.

The Following Element

The following element is positioned normally, taking into account that any skips preceding the BINCLUDE tag may have been used in positioning the BINCLUDE element and so no longer be in effect or may have been passed through and so still available.

Depending on the value of the attribute depth of the BINCLUDE block, both the value of the attribute post_skip of the element before the BINCLUDE block and the value of the attribute pre_skip of the current block will, in some cases, will have separate effects (before and after the BINCLUDE block) and, in other cases, will only affect the element following the BINCLUDE block, but will be merged. And similarly for the other skips: how they affect the BINCLUDE block and the element following the BINCLUDE block will differ depending on the value of the BINCLUDE attribute depth.

Text Following the BINCLUDE Tag

While the README file for wgml33.zip lists BINCLUDE as one of a set of tags that are treated as being followed by tag PC, even if no tag is present, in point of fact, text following the end of the BINCLUDE tag up to the end of the current logical input record is ignored by wgml 4.0.

That is to say, given this line:

:BINCLUDE file='rule.eps' depth=0 reposition=start. Text one :P.Text two

"Text one" will be ignored completely, being part of the same logical record as the BINCLUDE tag, which ended with the "." following "start", but "Text two" will be processed normally, as the P tag starts a new logical record.

An Unresolved Problem

There is an alternate version of rule.eps in docs\gml\help. This version differs from that in docs\doc\gml by spreading the PostScript commands over multiple lines. It also draws three lines rather than two; at least, the PostScript language statements appear to do so: the actual output is indistinguishable from that of the version in docs\doc\gml. The last line does end with a newline sequence ("0x0d 0x0a").

This file must be used with a record type, since, if it is not, at least one PostScript operator is split over two 80-byte groups and the PostScript interpreter halts and reports and error.

When this file is used with a record type, then the last operator "restore" is merged with the print-positioning PostScript code following, producing the token "restore1000" (for a one-inch margin) and the PostScript interpreter halts and reports and error.

Several workarounds have been found:

  • a space may be added after the "restore";
  • a second newline sequence ("0x0d 0x0a") may be inserted at the end of the file; or
  • the sequence of PostScript language commands which draw the third line can be reformatted so that they are all on one line in the file.

There is no apparent reason for this behavior. Examining the file with wdump does not reveal any suggestive 80-byte group boundaries. The workarounds do not suggest anything either. Of course, this version of the file is not actually used, apparently with good reason.

Incidentally, the "third line" overlays one of the other two, and so does not actually appear in the document.

GRAPHIC

As noted above, it is only necessary here to consider text files and how they are processed by tag GRAPHIC.

Since we are concerned only with PostScript files used with the PS device, GRAPHIC has two required attributes and four optional attributes. The required attributes are used in this way:

  • attribute file provides the name of the input file to process; and
  • attribute depth provides the vertical space the contents of the input file will occupy when those contents are processed.

The optional attributes are:

  • attribute width is not, in fact, used in the Open Watcom Documents; if it were, it would specify the width of the graphic;
  • attribute scale is used to increase or decrease the size of the image;
  • attribute xoff is used to give the horizontal position of the lower-left corner of the graphic on the page; and
  • attribute yoff is used to give the vertical position of the lower-left corner of the graphic on the page.

Output Sequencing

The output produced by the GRAPHIC tag can be divided into four steps:

  1. Prior text is flushed.
  2. A prefix is emitted.
  3. The data from the input file is placed in the output file.
  4. A postfix is emitted.

The first step simply ensures that all text occurring before the GRAPHIC tag is written to the output file before the GRAPHIC tag writes any.

For character mode devices, the space which the image would occupy in the PS device is skipped. This is documented in the WGML Reference:

Documents can be proofed on devices which are not supported by the graphic tag. If the device is 
not supported, the appropriate amount of white space is left for the graphic. All space value 
attributes are linked to the current font being used in the document. This was confirmed through 
testing.

The net effect is that the last three steps are replaced by a sequence of NEWLINE blocks. This creates an interesting situation: while blank lines (discussed next) affect the position of the element following the element resulting from the GRAPHIC tag in all devices, the question of whether they or the other skips preceed or follow the GRAPHIC element only makes sense for the PS device. In a character mode device, only the number of NEWLINE blocks used to position the next element is affected.

Since one of the skips applies generally to all devices, all of the skips will be discussed before the prefix specific to the PS device is discussed. The skips referred to are any or all of:

  • blank lines;
  • SK values greater than "0"; and
  • attribute post_skip values greater than "0" from the preceding block

These are treated in the normal manner, when they are used at all.

At the top of the page, skips are treated as they usually are, so this table documents how skips are treated when the element resulting from the GRAPHIC tag is not at the top of the page but rather follows a line of text:

Condition     PS device                      character device
SK            preceeds GRAPHIC element       ignored (not used at all)
blank lines   preceeds GRAPHIC element       affects position of next element
post_skip     preceeds GRAPHIC element       ignored (not used at all)

Turning now to the PS device, the PS prefix consists of these lines:

1000 10800 am 
/graphobj save def /showpage { } def
1000 10800 6000 1000 0 -1000 100 graphhead
%%BeginDocument: acc4.ps

The first line is an ABSOLUTEADDRESS block positioning the graphic at the left margin of the same line on which the last bit of text appeared. If there was no preceding text, then the value of attribute y_start of the PAGESTART block is used. (This was tested at the very start of the document, but it should apply whenever there is no preceding text on the current page.)

The second line never varies.

The third line is clearly feeding parameters to the macro "graphhead", which is defined in the material emitted by the INIT block of the PS device driver. The parameters turn out to be (in order, from left to right):

  1. The left margin of the current column (which .in, at least, does affect).
  2. The vertical position of the last line of text preceding the GRAPHIC modified appropriately by any skips that apply.
  3. The value of attribute width, or the current column width (which .in, at least, does affect on both the left and the right).
  4. The value of attribute depth (which is required for the PS device).
  5. The value of attribute xoff, or "0".
  6. The value of attribute depth plus the value of attribute yoff, multiplied by "-1". If attribute yoff is not given a value, then this is "-1" times the value of attribute depth.
  7. The value of attribute scale, or "100".

When a page has only one column, then "column width" becomes "page width".

The fourth line will vary depending on the file name.

The input file content is then streamed to the output device.

The PS suffix consists of these lines:

%%EndDocument
graphobj restore

Both lines never vary.

Multiple Columns

Using a test file with two GRAPHIC lines separated by text and changing the number of columns to "2" had a very interesting effect:

  • The first page showed the second column, which started with the second image, properly positioned on the right side of the page.
  • The second page showed the first column, which started with the first image, properly positioned on the left side of the page.

The values passed to PS macro graphhead clearly showed that the column left margins and widths were being used, and that they responded to the left and right values provided by control word .in.

Some experimentation showed this effect to be quite stable:

  1. If output text preceded the first image, then that text appeared in the first column of the first page. The first image and the text following it still appear in the first column of the second page.
  2. If the depth of the first image was reduced to so that both images fit into the same column, they both appeared in the first column of the second page.
  3. If "passes 2" was used on the command line, the output was unchanged.

This does, however, appear to be restricted to the PS driver: when used with a character device (TERM), both columns did appear on the same page, with the text (and space reserved for the images) properly distributed between the columns.

Required Attributes

Attribute file can be used with or without a record type. Using record type "(t:0)" produces this error:

IO--001: For file '(t:0)acc4.ps'
         System message is 'No such file or directory'
         Cannot open file

That is, the record type is treated as part of the file name, making it unlikely that the file will be found. Otherwise, the record type, if specified, does not have any effect on the output produced.

Attribute depth is required for PostScript but not for the Watcom GKS PXA format; since that format is not used in the Open Watcom documents, attribute depth is required for our purposes. When the value "0" is used for attribute depth, this error is produced:

SN--087: The GRAPHIC depth must be greater than zero

The WGML Reference also describes what happens if the value of attribute depth is not as large as the image:

If the specified depth is less than the size of the actual graphic, the difference in size is taken 
off the top of the graphic image.

Testing confirmed that, when the value of attribute depth is less than '5.71i' high, the top of the image is clipped. Since the Open Watcom documents use '2.5i' for most images (including this one) and are unlikely to only be displaying part of the image, attribute scale must be effective in reducing the image size to fit within 2.5 inches.

Optional Attributes

Attribute width is described in the WGML Reference, in relevant part, in this way:

The width attribute allows you to specify the width of the graphic. The attribute value
page specifies that the graphic will be as wide as the page, even if the document is
formatted for more than one column. The attribute value column specifies that the
graphic shall be one column wide. If a horizontal space unit is used as the attribute
value, the graphic will have the width specified by the attribute value. If the graphic is
larger than the specified width, the difference in size is taken off the right hand side of
the graphic image.

The Open Watcom documents do not, so far as I can tell, use attribute width. Brief testing showed that:

  • The value "page" has no effect when more than one column is specified.
  • If the value given is too small, the image is indeed cut off on the right side.
  • If the width specified is larger than the page width, this error message results:
SN--092: The GRAPHIC width is greater than the page width
  • If the width is specified but the value is "0", this error message results:
SN--093: The GRAPHIC width must be greater than zero

It appears that the values "page" and "column" are superfluous in wgml 4.0: either of them or no value at all are treated as "column" is said to behave in the the WGML Reference. Except, of course, for the image being cut off, since the image does not appear, this applies to character devices as well as to the PS device.

Attribute scale works as documented in the WGML Reference: it is taken as the numerator of a fraction whose denominator is "100". Thus, if it is omitted, the image is reproduced at full size. If "0" is used, it is passed to the PostScript interpreter, and the image does not appear (the space is still reserved). This error message:

SN--001: Number is too large or contains invalid characters

appears in three situations:

  1. The value begins with a plus sign ('+').
  2. The value begins with a minus sign ('-').
  3. The value is greater that 0x7FFFFFFF.

The third situation can also be expressed as: the value, taken as a 32-bit signed integer, is negative.

For values of attribute scale over "100", the image is clipped on the top and right side to the depth and width of the image given to the graphhead macro by the PS interpreter.

Attributes xoff and yoff are documented this way in the WGML Reference:

The xoff and yoff attributes specify an offset into the graphic. Some images are saved
so that they will print in the middle of a blank page. By specifying the amount of space
from the lower left corner of this blank page to the lower left hand corner of the
printable graphic with the offset attributes, WATCOM Script/GML can shift the
graphic to position it properly on the page. The value of the attributes can be a vertical
space unit, with negative values being allowed.

If attribute xoff is omitted, the value "0" is used; the value "0" may be given explicitly for this attribute. A positive value will clip the left side of the image and shift the result to the left side of the image area. A negative value will clip the right side of the image and shift the result to the right side of the image area.

The value of attribute yoff is first added to the value of attribute depth and then multiplied by -1. If it is omitted, the value "0" is used; the value "0" may be given explicitly for this attribute. A positive value will clip the bottom of the image and shift the result to the bottom of the image area. A negative value will clip the top of the image and shift the result to the top of the image area. If the negative value is greater than the value of attribute depth, then the value passed to graphhead will be positive. This will, of course, cause the entire image to be clipped, but the space is still reserved.

The clipping will only be visible if the image itself is large enough that part of it would otherwise extend out of the specified area; that is to say, the actual image size, if smaller than the size given to graphhead, may result in no visible clipping, just shifting of the entire image.

It should be clear from this that GRAPHIC allows a great deal of control over an image: how much space it can occupy, which part of the image actually appears, and how it is scaled. Fortunately, this control is implemented in the Post Script language; all wgml has to do is pass on the values of the attributes correctly.

The Following Element

The element following the GRAPHIC tag will usually be positioned in this way:

  1. The vertical starting position of the element produced by the GRAPHIC tag is the starting point.
  2. The value of the attribute depth is then used to adjust the vertical position.
  3. Any skips between the GRAPHIC tag and the following element will then be applied normally.

The positioning itself is done normally: if the ABSOLUTEADDRESS block is defined, then that block is used; otherwise, the normal pattern of NEWLINE blocks appears. The horizontal position will be whatever is appropriate, generally the left margin or paragraph indent.

When the default font is not "0", then the font will be changed to font 0 (not the default font) before the next element is positioned. This can affect the position of the next element, in particular, the vertical staring position of the output of an HLINE block is affected (and the line itself is done in font 0, although that probably doesn't make any difference since no font is involved).

Since all Open Watcom documents appear to use "0" as the default font, this behavior is not implemented. Also, it is not clear that it makes sense to do this, so not implementing it might reasonably be considered to be fixing a bug in wgml 4.0.

Text Following the GRAPHIC Tag

The README file for wgml33.zip lists GRAPHIC as one of a set of tags that are treated as being followed by tag PC, even if no tag is present, and text following the GRAPHIC tag is indeed treated this way, up to a point, by wgml 4.0.

That is to say, given this line:

:GRAPHIC file='acc4.ps' depth='5.71i' width=page. test text

"test text" will appear in the normal vertical position at the left margin. This matches the PC tag only if the values of all three of its attributes happen to be "0".

This appears to be true generally: text, not preceded by a tag, which follows the various tags listed in the README file for wgml33.zip is placed at the left margin of the next line (which control word IN does affect). The PC tag attributes appear to be ignored.

Implementation

These are notes about the tags that were discovered during implementation. The details may not be entirely correct!

Both tags are forbidden in the LAYOUT section by wgml 4.0. This, of course, makes sense, since they are not LAYOUT tags.

If they appear before or after the LAYOUT section, but before the GDOC tag, then wgml 4.0 also objects. This means that a document specification cannot start with either of these tags; if nothing else, text must come first (to set the state to GDOC BODY). A BODY or FRONTM tag, with no preceding GDOC tag, does allow these tags to be used without any preceding text (presumably by setting the state to GDOC plus the tag used).

Both tags appear to be allowed in all other segments (FRONTM, BODY, APPENDIX, BACKM) by wgml 4.0. Some interesting effects were seen because tag GRAPHIC responds to the number of columns specified for a page: between the GDOC tag and the FRONTM tag, and between the FRONTM tag and the TITLEP tag, the width of the image was affected by the number of columns -- but only when just one pass was done. When two passes were done, the width of the image showed that the page had only one column. If placed in both locations, all four images will, if their depths permit, appear on the same page, although up to four pages can be produced if their depths prevent any two of them appearing on the same page. The same remarks apply if the ABSTRACT tag or the PREFACE tag appears immediately after the FRONTM tag.

Also, although a detailed examination was not done, both tags were accepted by wgml 4.0 between the TITLEP and TITLE tags.

When these tags are placed in the INDEX section, the actual Index did not start on the same page, but rather on the next page. In contrast to this, when these tags are placed between the TITLEP and TITLE tags, the TITLE tag's text appeared on the same page, below the images. The same applies to text following this tags in the APPENDIX and BACKM sections, as well as text following the ABSTRACT tag and the PREFACE tag.

Both are allowed after the eGDOC tag; however, only BINCLUDE actually produced an image. If there is only one pass and a TOC or FIGLIST tag appeared, the Table of Contents and/or List of Figures appear(s) between the last part of the document (in the test file, the Index) and this image. This may provide a clue to the otherwise elusive distinction between the "END" and "DOCUMENT" FINISH blocks.

BINCLUDE

BINCLUDE was implemented for both the PS device and for character devices, in part because some details are easier to check with a character mode device. Although I write the following notes as if they applied absolutely, it is always possible that future experience will require some adjustment to the code.

The effect of the depth on the position of the following element, and the treatment of any skips between the preceding element and the BINCLUDE tag, have been implemented. This was done easily using the existing facilities of the page-oriented output system.

The prior text and prefix output are done as described above. As it happens, the actual flushing of prior text occurs after any prefix output has been done; in effect, it must be flushed as well. Such are the perils of working with the internals of buffered output!

The output was done almost as wgml 4.0 does it, as described above, with one exception: no newline characters were suppressed or last lines ignored when processing files with record types. Record types are not used by the Open Watcom documents except with the HELP device, which is associated with an earlier version of the DOS help system. The Open Watcom documents use device WHELP for help file production, so these differences should not cause a problem.

No attempt was made to reproduce, avoid, or even check on the behavior of our wgml when the version of rule.eps in docs\gml\help. This file is not used by the Open Watcom documents.

The code is distributed over several files: binclude.c contains the code that processes the attributes and sets up the doc_element, handling the depth and skips in the usual manner if appropriate; docpage.c contains code to insert the doc_element into the proper page, compute the proper vertical position, and invoke the implementing function on page outpute; outbuff.c contains the implementing function, which calls a helper function in devfuncs.c to hancle the prefix output but which flushes the prior output and outputs the file itself.

Skipping trailing text in the same logical record, while processing the next logical record normally, was implemented. This could be easily changed to match how GRAPHIC behaves, if it is desired to bring our wgml into better conformance with the README file for wgml33.zip.

GRAPHIC

When processing the attributes, the appropriate default values are assigned.

The positioning has been implemented to match the behavior documented above. The treatment of SK and the value of attribute post_skip in character mode devices was handled by zeroing out the globals g_skip and g_post_skip before resolving the skips using the usual function.

The bulk of implementing GRAPHIC concerned the PS device, as might be expected since that is where it actually does more than insert blank space into the output.

This turned out to be fairly simple: the prefix and suffix were a matter of forming and outputting text. The graphhead line required the use of sprintf_s, but this turned out to work quite well. These lines were not subjected to output translation, although they are in wgml 4.0, because doing so really doesn't make any sense: they are PostScript language statements and changing characters in them can only lead to trouble.

The input file itself was streamed from input to output; no checks of buffer length were done. This can, of course, be replaced with a more elaborate system if one is needed in the future.

The wierd multicolumn effects require some comments.

The Open Watcom documents, so far as I can tell, except in the Indexes, do not use multiple columns. The Indexes do not include GRAPHIC statements. So the problem is not likely to be seen in the Open Watcom documents. This means that it does not need to be duplicated by our wgml: our wgml can do rather better.

I should note that, when tested with enough text to fill several pages, the effects became even wierder: the first n-1 pages consist of 2nd columns containing the rest of the text, and the nth page contains the graphic plus however much of the text that will fit which follows it. Note that none of the pages contained two columns unless text preceded the GRAPHIC statement, in which case the first page still showed the text preceding the GRAPHIC statement in its first column. Testing with 3 columns shows each page having two columns (middle and right) except the first (when text precedes the GRAPHIC statement), the last (which contains the graphic) and, for this file, the two before the last, which had individual sentences from the text that appeared on the last page with 2 columns in the middle column at the top.

Fortunately, there does not appear to be any necessity to do things this way: if the PS file from the 2 column test is edited to move the lines

.072 .072 scale
2 setlinecap
showpage
%%Page: #2 2

to just before the final "showpage", then the first column does appear on the first page; however, it starts at the top of the page, even if it was preceded by text (the text is overwritten). The implication is clear: if we output the page correctly (everything on one page in the proper position), there is no reason to believe that the PostScript interpreter will have any problems displaying it. Since wgml 4.0 does this for character devices, it should be possible for our wgml to do it as well.

A brief test with tag FIG showed none of these problems, and also that the value "page" for attribute width works as documented with tag FIG. These problems are specific to tag GRAPHIC.

Trailing text in the same logical record is set up for processing normally by our wgml; the resulting behavior matches that of wgml 4.0 and could easily be extended to the BINCLUDE tag.

There was one interesting detail discovered: our wgml was passing the skips through (with the PS device) to the following text. This was because since the additional text is processed at the same point as any logical input record consisting entirely of text. These logical input records are usually part of a paragraph and resetting the skips at this point would be inappropriate. So, GRAPHIC zeroes out g_subs_skip and g_top_skip immediately after using them. All tags that allow text following them to be processed (rather than skipped) will probably need to do the same.

FIGLIST

This information was discovered while exploring rounding.

FIGLIST, as such, is pretty straightforward and this discussion is included so that it may be contrasted with INDEX and TOC.

Since the FIGLIST LAYOUT tag does not have the attributes used with a heading, the default layout uses a top banner to put the "List of Figures" line at the top of the FIGLIST output.

Each line in the FIGLIST can be treated as a text line; the entire list can be treated as a single element. To be sure, the attribute left_adjust is doubled before being applied, but the other attributes tested all worked as expected.

FONTPAUSE

The WGML Reference states in section 15.10.3.3 FONTPAUSE Attribute:

The fontpause attribute specifies a character value which is the 
font pausing method to be used when switching into the font.

and in section 15.10.5 FONTPAUSE Block:

In some cases, the font switch may require physical intervention 
at the output device by the operator. Examples of such an
intervention would be changing a print wheel or color ribbon. 

This section uses terminology discussed here to describe the various blocks.

With my test device and test driver files I used ten test font files. For each of these fonts, a separate FONTPAUSE was defined. Since the fonts were numbered from "01" to "10", the FONTPAUSE blocks were named (that is, had for their value of attribute type) "pause01" through "pause10". These fonts were used in specific contexts:

  • "01" through "06" were used with DEFAULTFONT blocks 0 through 5 (and so paired with font styles "plain", "bold", "uline", "uscore", "ulbold", and "usbold" respectively).
  • "07" and "08" were used with the BOX and UNDERSCORE blocks, respectively.
  • "09" and "10" were reserved for used with the command-line FONT option.

Each FONTPAUSE block was configured to identify itself when interpreted by wgml 4.0. The "pause02" block was also used to contain the function sequences being tested; a "FONT" line in default.opt, was used to vary the font used with the DEFAULTFONT 0 (and so interpreted first) between font "01" and font "02", which aided in testing.

The results reported here can be expanded to show which FONTPAUSE corresponded to which intance:

The output for the two situations with minimal FONTPAUSE blocks was:

instance "pause01" is first   "pause02" is first
   1     pause01              pause02  
   2     pause04              pause04
   3     pause08              pause08
   4     pause04              pause04
   5     pause08              pause08
   6     pause04              pause04
   7     pause01              pause02
   8     pause02              pause02
   9     pause01              pause02
  10     pause02              pause02
  11     pause01              pause02

The variation between "pause04" and "pause08" is the result of using the corresponding DEVICEFONTs for the FONTSTYLE with the value "uscore" for its attribute type ("pause04") and for the value of attribute font in the UNDERSCORE block ("pause08").

As discussed here, the only change occurs when both fonts (those using "pause01" and "pause02" in the first column and those using "pause02" in the second column) use the style "plain": the last two "pause02" lines disappear, a result of the fact that font style "plain" only requires one line pass while font style "bold" requires two. This, of course, means that when a FONTPAUSE is interpreted depends not only on the DEVICEFONT it is associated with and the font switching process but also on the font style it is associated with in the DEFAULTFONT block, making the description quoted above not quite complete. Of course, the font style does this by requiring multiple line passes, which in turn require additional font switches, so the description is correct as far as it goes.

The discussion here also notes that a FONTPAUSE will be interpreted, in some instances, even when the FONTSWITCH blocks are not. One of those situations, as might be expected, is that the fonts being switched are the same font. The problem is that they can be associated with different font styles.

If the example given above of manually changing the ribbon, so that, for example, FONTSTYLE "bold" prints text in red while FONTSTYLE "plain" does not, then associating the same FONTPAUSE with both FONTSTYLE instances is going to cause problems: the operator will not be able to tell whether to change the ribbon or not.

The only tool available to distinguish between the two FONTSTYLE instances is the device function %font_number(). Unfortunately, the command line option FONT can remap both the font and the font style assigned to a given DEFAULTFONT and so to a particular %font_number(). What is really needed is a %font_style() function, but none exists.

The net effect is that, if a FONTPAUSE is needed, it may be a very bad idea to use the corresponding font (that is, DEVICEFONT, which maps the font name to the font pause) with more than one FONTSTLYE or more than one DEFAULTFONT (which maps the DEVICEFONT to a FONTSTYLE), depending on just what the FONTPAUSE is intended to accomplish.

Implementation Notes

The FONTPAUSE block occupies a very odd position: there is no need to implement it at all, since it is not used in any DEVICE block known to me; and yet it is so useful in analysing the use of the FONTSWITCH and FONTSTYLE blocks that, inevitably, it's implementation in wgml 4.0 is also made quite clear:

The FONTPAUSE block is interpreted whenever a font switch is 
called for. When the font switch occurs, then the FONTPAUSE 
is interpreted after the ENDVALUE block of the font being switched
from (if any) and before the STARTVALUE block of the font being
switched to; even if the font switch does not actually occur, the
FONTPAUSE block is still interpreted. 

The situations in which a font switch does not actually occur when called for are discussed here.

FONTSTYLE

This block is not documented in the WGML Reference. As a result, a detailed examination of how it and each of its sub-blocks is used is unavoidable. This may take some time to assemble and organize properly.

The STARTVALUE Block

This section discusses the FONTSTYLE block STARTVALUE block. The LINEPROC block also has a STARTVALUE block; it is discussed in its own section.

There is reason to believe that the FONTSTYLE block STARTVALUE and ENDVALUE blocks are not intended to be used as an ON/OFF pair: when an extremely simple test file, one containing nothing but text organized into paragraphs with the P tag (that is, no header, no title, no TOC, no index, no footers, no markup), was processed and examined, the FONTSTYLE block ENDVALUE block never appeared. The FONTSTYLE block STARTVALUE block, on the other hand, appeared at the start of each text line.

The FONTSTYLE block STARTVALUE block is always followed immediately by the LINEPROC block STARTVALUE block. The LINEPROC block STARTVALUE block is usually preceeded immediately by the FONTSTYLE block STARTVALUE block; the exceptions are discussed [http:/www.openwatcom.org/index.php/GML_Tag_Notes#The_STARTVALUE_Block_2 here].

The FONTSTYLE block STARTVALUE block appears in these contexts:

  • As part of the action of device function %enterfont().
  • As part of the normal font switch sequence (but not of the alternate font switch, used with device functions %ulineon()/%ulineoff()).
  • As part of the first line pass font style application sequence, when a font switch is not required.
  • As part of the subsequent line pass font style application sequence, when a font switch is not required.
  • As part of the alternate font style application sequence, when a font switch is not required.

The ENDVALUE Block

This section discusses the FONTSTYLE block ENDVALUE block. The LINEPROC block also has an ENDVALUE block in the LINEPROC block; it is discussed in its own section.

The ENDVALUE block occurs in these contexts:

  • During a font switch.
  • During a subsequent line pass, under certain conditions.

It does not appear in this context:

  • The current font style is the last (or only) font style used in the text_line.

It is, of course, this fact that prevents the FONTSTYLE block STARTVALUE and ENDVALUE blocks from being used as an ON/OFF switch.

The context in which it is interpreted differs in the two cases listed above:

  • During a font switch, the ENDVALUE block of the font style associated with the font being switched from is interpreted in the context of the font being switched to, with which it may or may not be associated (nothing prevents two DEFAULTFONT blocks from associating the same font style with two different fonts).
  • During a subsequent line pass, the ENDVALUE block is interpreted outside of a font switch, and then it is interpreted in the context of the font it is associated with.

This suggests that the ENDVALUE block should not depend on any of the device functions which return values associated with the current font.

Usage Notes

It is reasonably clear from the two prior sections that the FONTSTYLE block STARTVALUE and ENDVALUE blocks are not, in fact, used by wgml 4.0 as an ON/OFF switch.

On the other hand, the DRIVER block in HELPDRV.PCD in the Open Watcom repository does this in font style "bold"

  • the STARTVALUE block emits "0x1b" followed by "b", and
  • the ENDVALUE block emits "0x1b" followed by "p"

which certainly looks like an ON/OFF switch switching the style to "bold" and back to "plain".

This can only work if the targeted device does not require that the FONTSTYLE block STARTVALUE and ENDVALUE blocks be used as an ON/OFF switch to function properly. Possible examples of how a device might do this are:

  • the device resets itself to its default state at the end of each line; or
  • the device has no memory: it can process these codes repeatedly and the effect is exactly the same as if it processed them once.

The LINEPROC Block

These blocks define exactly what actions the device is to take to implement the font style. Each LINEPROC block defines the actions to take on one specified line pass.

No LINEPROC Present

The LINEPROC block is entirely optional; if none is present, then wgml 4.0 behaves exactly as if this LINEPROC block was present:

:LINEPROC
   pass = 1
   :STARTVALUE
       %textpass()
   :eSTARTVALUE
:eLINEPROC

Empty LINEPROC Instances

At the very end of this section, it is noted that a LINEPROC of this form:

:LINEPROC
   pass = 1
:eLINEPROC

is accepted and compiled by gendev 4.1 as if it contained an ENDVALUE block with no device functions present.

When a font style using such a LINEPROC block is used by wgml 4.0, however, the result is this message:

Abnormal program termination: Memory protection fault

regardless of which line pass it is assigned to.

Examination of the output file shows that wgml 4.0 does not produce this error until it reaches the line pass affected while printing out the text to which the font style is being applied.

Sub-block Usage

Since each LINEPROC block must contain at least one sub-block, and since, as discussed here, each sub-block must contain at least one device function, the discussion now turns to the various sub-blocks, starting with an overview.

The LINEPROC block contains five sub-blocks. Examination of the test documents show that, when a text_chars instance is being processed, they generally appear in these positions:

  • The STARTVALUE block and FIRSTWORD block appear either before the first text_chars instance of the line or before the first text_chars instance with a new value for field font_number is processed. If the FIRSTWORD block is not defined, then the STARTWORD block appears in its place. These blocks appear before the STARTWORD block as such.
  • The STARTWORD block appears before each text_chars instance which does not follow a font switch, even if this results in it appearing twice in a row because the FIRSTWORD block is not defined.
  • The ENDWORD block appears after each text_chars instance.
  • The ENDVALUE block appears after the last text_chars instance (and after the ENDWORD block).

To be specific, blocks which appear before the text_chars instance appear before the spaces (or HTAB block or ABSOLUTEADDRESS block) used to position the print head at the point where the first non-space character is to appear, unless, of course, device function %dotab() is involved.

From this, it appears that three ON/OFF switches exist:

  • The first pairs STARTVALUE with ENDVALUE, and applies to each set of consecutive text_chars instances with the same value for field font_number.
  • The second pairs FIRSTWORD with ENDVALUE, and applies to each set of consecutive text_chars instances with the same value for field font_number.
  • The third pairs STARTWORD with ENDWORD, and applies to most text_chars instances (those whose associated font style defines a FIRSTWORD block and which follow a font switch are the exception).

The above reflects two observed rules:

  1. If the FIRSTWORD block is not defined, then the STARTWORD block appears in every context where the FIRSTWORD block appears when it is defined, without known exception.
  2. When a font switch occurs, there is no STARTWORD block (if a FIRSTWORD block exists) or no second STARTWORD block (if no FIRSTWORD block exists).

It does not matter if the FIRSTWORD block consists entirely of "%image('')", which produces no output or side effects of any kind; the only requirement is that it be defined.

The only exception to the second rule involves the drawing of the top line using BOX block characters with tag FIG and the Index (that is, in these cases no font switch occurred, and in no case did the second STARTWORD block appear in drawing such lines). This probably means that the STARTWORD block, as such, is not used when drawing horizontal or vertical lines using the characters defined in the BOX block.

The FIRSTWORD block and ENDVALUE block are regularly used to implement underlining, that is, where every character in the affected phrase (but not any preceding whitespace) is underlined, including internal spaces.

The STARTWORD block and ENDWORD block are regularly used to implement underscoring, that is, where every non-space character in the affected phrase is underlined, but whitespace is not.

Considering the above information, these suggestions might be made:

  • When a FIRSTWORD block is called for, if no such block is defined, then a STARTWORD block is used instead. This suggests that LINEPROC blocks which actually implement underscoring should not define a FIRSTWORD block.
  • If a FIRSTWORD block is defined, then, in some cases, no STARTWORD block will appear. This suggests that LINEPROC blocks which actually implement underlining should not define a STARTWORD block.
  • If no FIRSTWORD block is defined, then, in some cases, the STARTWORD block will be interpreted twice in succession. This suggests that the STARTWORD block, if defined, should be defined in such a way that it can be interpreted twice in succession without causing problems for the device.

The STARTVALUE Block

As shown here, this block is the only place where device function %textpass() may be used; as noted here, whether or not that function is present determines whether or not the output text actually appears in the output file.

The LINEPROC block STARTVALUE block is usually preceded immediately by a FONTSTYLE block STARTVALUE block; known exceptions are:

  • In some cases, as part of the preparation for the first text line, as discussed here.
  • In some cases, as part of drawing a box using the characters in the BOX block when processing tag .FIG, as discussed here.

The LINEPROC block STARTVALUE block is always followed immediately by the LINEPROC block FIRSTWORD block. Furthermore, the LINEPROC block FIRSTWORD block only appears when immediately preceded by the LINEPROC block STARTVALUE block.

Device function %ulineon() can also be placed in this block. The effect, as shown by the tests done so far, is indistinguishable from placing device function %ulineon() in the FIRSTWORD block instead.

Unlike the FIRSTWORD block, a FONTSTYLE block which differs from the overprint "uscore" FONTSTYLE block discussed below only in that the line pass 2 LINEPROC block has a STARTVALUE block works normally, i.e., the first word is underscored.

This block is interpreted at the start of each text_chars instance which has a value for the field font_number which is different than the value in the prior text_chars instance. However, there is at least one context in which it is intepreted at the start of each text_chars instance, as discussed at the end of this section; although the appearance of the STARTVALUE block is not mentioned, it does in fact appear each time just as the ENDVALUE block does. A closer examination of this issue will eventually be done.

The FIRSTWORD Block

The LINEPROC block FIRSTWORD block only appears when immediately preceded by the LINEPROC block STARTVALUE block. Furthermore, the LINEPROC block STARTVALUE block is always followed immediately by the LINEPROC block FIRSTWORD block.

As shown here, this block can contain device function %ulineon(); indeed, the overprint FONTSTYLE "uline" discussed below does exactly that.

However, device function %ulineon() can also be placed in the STARTVALUE block. The effect, as shown by the tests done so far, is indistinguishable from placing device function %ulineon() in the FIRSTWORD block.

A FONTSTYLE block which differs from the overprint "uscore" FONTSTYLE block discussed below only in that the line pass 2 LINEPROC block has a FIRSTWORD block results in the first word not being underscored. As noted at the end of the section on sub-block usage, it it generally best to implement only one of the FIRSTWORD and STARTWORD blocks.

This block can is also allowed to contain device function %ulineoff(). Since device function %ulineoff() must be preceded by device function %ulineon() in the same LINEPROC block, the STARTVALUE block must contain %ulineon() or gendev 4.0 will not process the source file.

When a FONTSTYLE block with a line pass 2 LINEPROC block with a STARTVALUE block containing device function %ulineon() and a FIRSTWORD block containing device function %ulineoff() is tested, then the result is:

  • If the %ulineon() function is preceded by %dotab(), then the initial horizontal positioning (left margin) is output. Nothing else appears, although various LINEPROC block sub-blocks are interepreted.
  • If the %ulineon() function is not preceded by %dotab(), then nothing whatsoever appears on the second line pass, although various LINEPROC block sub-blocks are interpreted (that is, a second line pass does occur).

This block is interpreted at the start of each text_chars instance which has a different value for field font_number than the previous text_chars instance had. However, there is at least one context in which it is intepreted at the start of each text_chars instance, as discussed here. A closer examination of this issue will eventually be done.

The STARTWORD Block

As shown here, this block can contain device function %ulineon(); indeed, the overprint FONTSTYLE "uscore" discussed below does exactly that. And, if by "uscore" is meant a font style which underscores words but not spaces, then the STARTWORD block is where device function %ulineon() needs to be.

This block is also able to contain device function %ulineoff(). Three cases exist, and the results are recorded here.

When a FONTSTYLE block with a line pass 2 LINEPROC block with a STARTVALUE block containing device function %ulineon() and a STARTWORD block containing device function %ulineoff() is tested, and no FIRSTWORD block is present, then the result is:

  • If the %ulineon() function is preceded by %dotab(), then the initial horizontal positioning (left margin) is output. Nothing else appears, although various LINEPROC block sub-blocks are interepreted.
  • If the %ulineon() function is not preceded by %dotab(), then nothing whatsoever appears on the second line pass, although various LINEPROC block sub-blocks are interepreted (that is, a second line pass does occur).

When a FONTSTYLE block with a line pass 2 LINEPROC block with a STARTVALUE block containing device function %ulineon() and a STARTWORD block containing device function %ulineoff() is tested, and a FIRSTWORD block is present, then the result is:

  • If the %ulineon() function is preceded by %dotab(), then the first word (only) is underscored.
  • If the %ulineon() function is not preceded by %dotab(), then the initial horizontal positioning (left margin) and the first word (only) are underscored.

These results also occur when the %ulineon() (with or without preceding %dotab()) is in the FIRSTWORD block rather than the STARTVALUE block.

This block is interpreted at the start of each text_chars instance, except as documented here.

The ENDWORD Block

As shown here, this block can contain device function %ulineoff(). Indeed, the overprint "uscore" discussed below requires this block to contain device function %ulineoff() in order to work properly.

This block is interpreted at the end of each text_chars instance, that is, after the text has been output. When used with %ulineoff(), the observed behavior is much less clear, although the effect (stopping the output of underscore characters with the last character output previously) is quite clear. Additional research will need to be done.

The ENDVALUE Block

As shown here, this block can contain device function %ulineoff(). Indeed, the overprint "uline" discussed below requires this block to contain device function %ulineoff() in order to work properly.

This block is only interpreted under certain conditions. It has been observed in these contexts:

  1. When a NEWPAGE block is interpreted.
  2. When a NEWLINE block is interpreted.
  3. As part of establishing the left margin before text output begins.
  4. As part of processing the first text line.
  5. As part of the sequence for processing text lines.
  6. Presumably as part of the sequence(s) for boxing, although this needs more work.
  7. As part of the "new font text_chars instance" sequence used in the first line pass sequence.
  8. As part of the "new font text_chars instance" sequence used in the subsequent line pass sequence as discussed here.
  9. As part of the sequence used with device function %ulineon() and %ulineoff(), as discussed here.
  10. When a FINISH block is processed.
  11. If no FINISH block is defined, at the very end of text output.

In general, it occurs when something new starts and text has been output and it has not already been done. This is implemented by encapsulating it into a function, which interprets this block if either of the textpass or uline flags have the value "true". This function also sets the value of the textpass flag to "false". So far, this appears to work quite well.

This block is also interpreted by extremely specialized functions or parts of functions with use the at_start and set_margin flags. These functions or parts of functions are only active at the very start of the final document pass because these flags are only "true" for a very brief period at the start of the final document pass.

All four of these flags are also discussed here.

Implementing Font Styles

The implementation of a particular font style depends on the characteristics of the device.

Some devices define separate fonts for each style, which are then paired in the DEFAULTFONT instances with font style "plain". Of course, this can lead to a very large number of FONT blocks. The PS DEVICE block does this for the "times" font.

Some devices perform some actions themselves. Thus, the WHELP DEVICE block pairs the same DEVICEFONT block with various FONTSTYLE blocks -- and then implements those styles with a single LINEPROC which uses the STARTVALUE block and ENDVALUE block as an ON/OFF switch to cause (presumably) the program WHLPCVT to implement the desired style.

Device PSDRV also provides definitions of the usual font styles which vary between having one LINEPROC and using the FIRSTWORD block and ENDVALUE block for underlining and the STARTWORD block and ENDWORD block for underscoring, but in both cases emitting PostScript commands rather than using device functions %ulineon() and %ulineoff(). On the other hand, for font styles involving "bold", the two-line-pass approach discussed below is used.

It is clearly not possible to discuss all possible implementations of any particular style. The following sections will focus on implementations of font styles which rely entirely on the device functions and the behavior of wgml 4.0. For one thing, these are the definitions which are referred to by the page on sequencing, particularly (but not necessarily exclusively) the section on applying font styles.

A Font Style That Prints Nothing Out

This may seem like an odd choice, since no examples exist and it would seem to have no value beyond testing, but it does represent a minimal case.

This font style:

:FONTSTYLE
   type='redact'
   :LINEPROC
      pass=1
      :STARTVALUE
         %image('')
      :eSTARTVALUE
   :eLINEPROC
:eFONTSTYLE

is accepted by both gendev 4.1 and wgml 4.0, and, since the argument to device function %image() is an empty string, produces precisely nothing when applied.

As to utility, in theory, this could be used whenever it is necessary to maintain two versions of a document: one with all information included for internal use, and one with certain information redacted for external use. If the external version uses the above FONTSTYLE for font style "redact", then blank places will occur where the material to be removed would otherwise have been. If the internal version uses a different version of FONTSTYLE "redact", one which does an explicit or implicit %textpass(), then the internal version will show all the information. In practise, this would require a fair amount of thought and planning; however, text that is never printed is more secure than text which is printed and then blacked out.

Overprint "bold"

This LINEPROC prints the same text twice, starting at the same position each time:

:FONTSTYLE
   type=bold
   :LINEPROC
      pass=1
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
   :LINEPROC
      pass=2
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
:eFONTSTYLE

The version used in testing had additional %image() statements to help in detecting the sequence of events.

Overprint "uline"

This LINEPROC prints the text line once, and then prints underscore characters, starting at the same position each time:

:FONTSTYLE
   type=uline
   :LINEPROC
      pass=1 
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
   :LINEPROC
      pass=2
      :FIRSTWORD
         %dotab()
         %ulineon()
      :eFIRSTWORD
      :ENDVALUE
         %dotab()
         %ulineoff()
      :eENDVALUE
   :eLINEPROC
:eFONTSTYLE

Preliminary testing showed that this does, indeed, place the underscore character under every character included in the set of contiguous text_chars instances using this font style, including spaces between text_chars instances and any final text_chars instances which have no text but only generate spaces. Each text_chars instance is underlined separately. The initial horizontal positioning (left margin plus indentation), however, was not underlined.

Overprint "uscore"

This LINEPROC prints the text line once, and then prints underscore characters under each word, starting at the same position each time:

:FONTSTYLE
   type=uscore
   :LINEPROC
      pass=1 
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
   :LINEPROC
      pass=2
      :STARTWORD
         %dotab()
         %ulineon()
      :eSTARTWORD
      :ENDWORD
         %dotab()
         %ulineoff()
      :eENDWORD
   :eLINEPROC
:eFONTSTYLE

Preliminary testing showed that this does, indeed, place the underscore character under each text_chars intance's text. For each text_chars instance, the horizontal positioning is done (using spaces only, never HTAB, apparently) and then enough underscore characters are emitted to place one under each non-blank character in the text_chars instance's text.

Overprint "ulbold"

This LINEPROC prints the text line once, and then prints underscore characters, and then prints the text line again, starting at the same position each time:

:FONTSTYLE
   type=ulbold
   :LINEPROC
      pass=1 
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
   :LINEPROC
      pass=2
      :FIRSTWORD
         %dotab()
         %ulineon()
      :eFIRSTWORD
      :ENDVALUE
         %dotab()
         %ulineoff()
      :eENDVALUE
   :eLINEPROC
   :LINEPROC
      pass=3
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
:eFONTSTYLE

At least, that is what should do. Testing proceeds.

In some cases, the "bold" part is done by issuing control codes to the device while the underlining is done as shown in line pass 2 above.

Overprint "usbold"

This LINEPROC prints the text line once, and then prints underscore characters under each word, and then prints the text line again, starting at the same position each time:

:FONTSTYLE
   type=usbold
   :LINEPROC
      pass=1 
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
   :LINEPROC
      pass=2
      :STARTWORD
         %dotab()
         %ulineon()
      :eSTARTWORD
      :ENDWORD
         %dotab()
         %ulineoff()
      :eENDWORD
   :eLINEPROC
   :LINEPROC
      pass=3
      :STARTVALUE
         %textpass()
      :eSTARTVALUE
   :eLINEPROC
:eFONTSTYLE

At least, that is what should do. Testing proceeds.

In some cases, the "bold" part is done by issuing control codes to the device while the underscoring is done as shown in line pass 2 above.

Overprint Font Style Notes

These notes are based on examining how overprint font styles work with the PostScript device.

None of this has been implemented. The primary reason for this is that the Open Watcom document build system does not, so far as I can tell, actually use any overprint font styles. All font effects seen are produced by using specific fonts with font style plain, except for shading, which uses font style shade. Both plain and shade have only one pass.

The overprint font styles "bold", "ulbold", and "usbold" in the PostScript DRIVER block use two line passes with device function %textpass() to produce the "bold" effect. The font styles which do underlining or underscoring, however, use PostScript macros uline and euline instead of device functions %ulineon() and %unlineoff(). How wgml 4.0 would, in theory, use %ulineon() and %ulineoff() with the PostScript device had to be investigated with a test device whose DRIVER block's defined name starts with "ps", which causes wgml 4.0 to treat it as a PostScript device.

The simplest part of this is discovering why the PostScript DRIVER block does not use device functions to produce underscores. Consider this phrase:

the first sentence

for underlining, this should be produced:

the first sentence
__________________

for underscoring, this should be produced:

the first sentence
___ _____ ________

however, what wgml 4.0 produces for underlining would look more like this (the actual start position of the underscore blocks is very hard to relate to the positions of the words, but the underscore blocks almost certainly do not form a continuous line):

the first sentence
__  ____  _______

and, what wgml 4.0 produces for underscoring would look like this:

the first sentence
__  ___   ______

while what our wgml produces for underlining would look more like this (the actual start position of the underscore blocks is very hard to relate to the positions of the words, but the underscore blocks almost certainly do not form a continuous line):

the first sentence
__  ____  ________

and, what our wgml produces for underscoring would look like this:

the first sentence
__  ___   ______

The results for underscoring are identical. The results for underlining are a bit different. But neither is correct.

For our wgml, at least, this is the result of dividing the space to be underscored by the width of an underscore character using integer math with no rounding. Since wgml 4.0 gets much the same result, it is, most likely, doing the same thing. What is seen above is the result of truncation errors, which are unavoidable when using a single character to fill the space occupied by text in a variable-width font. Hence the use of PostScript macros uline and euline, which turned out to work quite well when tested.

Creating bold text by overprinting would not, one would think, work very well with a page-drawing device such as PostScript: it works with a typewriter or impact printer because more ink is deposited at the same position, but, with Postscript, one would think that each position either has a dot or does not have dot and so would look the same no matter how many times the same text is output starting at the same position.

It should be no surprise, then, to learn that wgml 4.0 does not output the second pass text to the same position as is used on the first pass. Instead, for PostScript devices, wgml 4.0 adds 6 horizontal base units (that is, 0.006 inch) to the address used in the first pass. This can be seen when using the PS device with justification on (so that each word is positioned individually); it is not simply an effect of a test device.

This initially seemed to produce a very fertile topic for investigation. These items would need to be investigated to fully understand what wgml 4.0 is doing:

  1. Does this apply to all devices, at least potentially? Or is it specific to either devices which do page addressing (that is, define an ABSOLUTEADDRESS block) or PostScript devices only?
  2. Does this apply to all font styles with "bold" in their name? Or does it apply only to certain font styles, specifically bold, ulbold, and usbold?
  3. Does this apply to all subsequent passes? Or only to those on which text is emitted? Or only on specific passes: pass 2 for "bold" and pass 3 for "ulbold, usbold"?
  4. Is the value added always "6"? Or does it vary with the number of horizontal base units per inch? If it does vary with the number of horizontal base units per inch, how is it computed?

Questions 2 and 3 have the potential to provide reasons for three of the standard font style names and for why it is the third pass of font styles uline and uscore which contain the second use of device function %textpass().

When the output of wgml 4.0 and our wgml produced using an option file containing

( font 0 times-roman			    2 10
( font 1 times-roman bold		    2 10
( font 2 times-roman uline		    2 10
( font 3 times-roman uscore		    2 10

(that is, configured to actually use the overprint font styles) was examined, it became clear that this was far too complicated to work on unless and until it becomes necessary. The results leading to this conclusion were:

  1. Our wgml, despite printing the text in the same position both times, produced output that looked very similar to wgml 4.0. The output from wgml 4.0 was a bit darker, so the slight offset does have an effect.
  2. With some files, overprint bold looks quite nice, especially when produced by wgml 4.0 (that is, with the offset).
  3. With other files, overprint bold looks horrible: it looks as if each letter were separately printed as opposed to a single letter printed a bit wider than usual. The gap is so large that the result can barely be read. This happened with our wgml as well as with wgml 4.0.

The files used were a bit different; the one producing a good result was very simple, the other file was less simple. And the first and third items suggest that PostScript itself may be doing something in this situation, something that doesn't always look very good. There seems to be little point in implementing a feature that is never used by the Open Watcom document build system and which may depend more on how PostScript behaves then on how wgml behaves.

INDEX Structure

This information was discovered while exploring rounding.

Although the INDEX LAYOUT tag does have the attributes used with a heading, and they do work as might be expected, the default layout uses a top banner to put the "Index" line at the top of the index. This is probably because the heading would only occur on the first page, but the banner appears at the top of every page in the index.

Both attribute left_adjust and right_adjust are doubled before being applied, but the other attributes tested all worked as expected.

The main insight provided by the testing was into what constitutes an "element". This turned to be, at a minimum:

  • The single-letter headings which introduce each part of the index.
  • Entries generated by different In tags.

This applies to the attributes post_skip and pre_skip of the In tags. Attribute skip controls the spacing within a set of entries generated by the same In tag.

Thus, an index might have:

      A
<I1 pre_skip>
<several I1 tag-produced lines>
<I1 post_skip merged with I2 pre_skip>
<several I2 tag-produced lines>
<I2 post_skip>
      B

This should be helpful when the time comes to generate the index.

LAYOUT

This is a collection of notes on the use of layout files.

  1. There should only be one layout file, at most, provided on the command-line. Providing a second causes them to be listed as "current", in inverse order, but also causes problems with document formatting later on, so this should probably be an error.
  2. Files intended for inclusion in LAYOUT sections should not start with the LAYOUT tag; it will cause an error if present: it appears that LAYOUT sections do not nest. The error message is not very specific.
  3. If a file intended for inclusion in a LAYOUT section ends with the eLAYOUT tag, the effect appears to be indistinguishable from what would happen it if were not present, so this should probably be an error.
  4. Both the file given with the command-line option LAYOUT and the document specification file may contain the LAYOUT and eLAYOUT tags. These layout sections are not nested, but are processed sequentially.

The layout file specified on the command line, if any, is processed before the layout section, if any, in the document specification.

A clarification on the first point above: what appears to have happened in the specific case tested is this:

  1. The document specification was made "current".
  2. The second LAYOUT file was made "current".
  3. The first LAYOUT file was made "current".
  4. The first LAYOUT file was processed.
  5. The LAYOUT section in the document specification was processed.
  6. wgml 4.0 produced the specified header (top banner).
  7. wgml 4.0 then reacted to the second LAYOUT file, which contained the H0 layout tag and attributes, by complaining that an attribute was found instead of text.

This suggests that, once wgml 4.0 concluded the LAYOUT section was done, it proceeded to process the remaining items as part of the document, starting with the second LAYOUT file, presumably because it was still part of a linked list of such files. It accepted H0 as a valid tag -- but the document tag H0 does not take the same set of attributes as the layout tag H0. That wgml 4.0, in this context, recognized a layout attribute as an (illegal) attribute shows that the tag processing code is always aware all of the attribute names.

TOC

This information was discovered while exploring rounding.

Since the TOC LAYOUT tag does not have the attributes used with a heading, the default layout uses a top banner to put the "Table of Contents" line at the top of the TOC output.

The attribute left_adjust is doubled before being applied, but the other attributes tested all worked as expected in the sense of not being doubled.

In this case, an "element" is a set of lines generated by the same Hn tag. The main insight provided by the testing was into how the attributes post_skip, pre_skip, and skip actually work. While completeness cannot be guaranteed, these rules appear to be correct:

  • Attribute post_skip is only applied when the level of the TOCHn tag changes.
  • Attribute pre_skip is only used under these conditions:
    • a non-zero value must be given; and
    • the value for skip is "0" or
    • the transition must be from a higher-priority heading to a lower-priority heading

Thus, if TOCH1 has non-zero values for both pre_skip and skip, then skip is always used between contiguous H1-produced lines but pre_skip is used when an H1-produced line follows an H0-produced line while skip is used when an H1-produced line follows an H2-produced line.

  • No "merge" occurs.
  • If either attribute pre_skip or attribute skip is "0" the other is used normally, except that skip appears where pre_skip would normally be expected to be in effect, except for TOCH0.
  • The TOCH0 pre_skip is ignored even when skip is "0".
  • The lowest-priority headings cannot follow a heading of lower priority, so the effect will not be seen with the lowest-priority allowed, and never with TOCH6.
  • The attribute post_skip of the lowest level displayed is treated as if its value were "0".
  • Since TOCH6, when displayed in the TOC, is always the lowest level displayed, it's attribute post_skip is always treated as as if the value were "0".

This is quite a mess; finding attribute skip being used as if it were attribute pre_skip was a bit of a shock. The key organizing principle appears to be the heading level, that is, which of the Hn tags was used to generate the heading (and so the TOC entry), with H0 being the highest level and H6 the lowest. The default LAYOUT uses indents (literally -- the TOCHn attribute is indent) to show these levels, and the behavior probably makes more sense when that is done than it did in the testing, which used a flat structure to keep it as simple as possible.

Personal tools