Wgml Fonts

From Open Watcom

Revision as of 19:54, 28 December 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Introduction

The WGML Reference uses the term "font" in several different contexts:

  • The :FONT block, which is said to "define" a font.
  • The :DEVICEFONT block.
  • The :DEFAULTFONT block.
  • The :FONTPAUSE block.
  • The :FONTSWITCH block.
  • The :FONTSTYLE block (not actually in the WGML Reference, but clearly related).
  • The FONT command-line option.
  • A set of available or selected fonts.

The question, then, is: just what is a "font" in wgml 4.0?

The :DEFAULTFONT Block

The WGML Reference Tutorial, in section 3.2.6 Selecting Device Information, has this informative statement:

The set of fonts define the character sets and their attributes 
used to produce the text in the document. The fonts numbered zero 
through three are used by the GML tags :HP0 through :HP3. Font 
numbers greater than three may be referenced in the layout or
by the :SF GML tag.

Indeed, these tag attributes:

  • font
  • number_font
  • string_font

always have the value shown as a number and are normally described as:

This attribute accepts a non-negative integer number. If a font
number is used for which no font has been defined, WATCOM
Script/GML will use font zero.

Every device is required to provide a :DEFAULTFONT block with the value "0" for its font attribute.

The WGML Reference Section 14.3.9 FONT states:

The specified font-number is assigned a particular font. The font 
numbers zero through three correspond to the highlight-phrase 
tags :hp0 through :hp3. Font numbers greater than three (up to a 
maximum of 255) may be used in the layout section or with the :sf
tag.

Each device has a list of available fonts defined with it. The font-
name value is selected from these defined fonts, and must be 
specified.

The font-attribute value specifies an attribute for the defined 
font. If the font attribute is not specified, the attribute PLAIN 
is set. 

This refers to an "attribute" because the WGML Reference predates the :FONTSTYLE block. For wgml 4.0, the font-attribute value must name a :FONTSTYLE block.

In the context of the device library, the WGML Reference Section 15.10.7 BOX Block describes the attribute font in part this way:

The font attribute may be either a non-negative integer number or a 
character value. If a number is specified, the font of the box will 
be the default font with the corresponding font number. A character 
attribute value must be a font name defined in a device font block. 

Examination of the device library information accessible here shows this structure in a device definition for fonts:

:DEFAULTFONT block
   :DEVICEFONT block
      :FONT block
      :FONTPAUSE block
      :FONTSWITCH block
   :FONTSTYLE block

The :DEFAULTFONT block contains a numeric attribute font; in binary form, these blocks appear as an array with the value of attribute font acting as the array index. It is, therefore, tempting to equate "font" with ":DEFAULTFONT instance"; indeed, in most of the sections which deal with this issue, that is taken for granted.

But it does not quite work, and the proof of that is the result of exploring the concept of "selected fonts", as they relate to the multiple invocations of the :INIT block :FONTVALUE block, which are documented here.

Multiple :FONTVALUE Block Instances

The WGML Reference, in Section 15.9.2.3 FONTVALUE Section states, in part:

WATCOM Script/GML selects the fonts being used in the document. For 
each of the selected fonts, the fontvalue section is evaluated. 
Device functions, such %default_width, will return the values 
appropriate for the selected font.

By using distinctive values for their font_out_name1 attributes and placing %image(%font_outname1()) in the :FONTVALUE blocks, it is possible to be a bit more precise about which fonts are "selected":

  • each :DEFAULTFONT;
  • the font named for use in the :UNDERSCORE block, if any; and
  • the font named for use in the :BOX block, if any.

As discussed here, not only is the :UNDERSCORE block itself entirely optional, but the value of its attribute font may be a font number or an empty string. The :FONTVALUE blocks, however, are only interpreted using the font for the :UNDERSCORE block if a font name is specified.

As discussed here, the :BOX block, which is mandatory, can take either a font name or a font number as the value of its attribute font. The :FONTVALUE blocks, however, are only interpreted using the font for the :BOX block if a font name is specified.

Using the same font name multiple times showed, first, that the :FONTVALUE blocks are interpreted for each :DEFAULTFONT block, even if the same font name is used in more than one :DEFAULTFONT block. This results in multiple instances with the same font's values being found in the output file.

Using the same font name in an :UNDERSCORE block or the :BOX block as in a :DEFAULTFONT block will cause the :FONTVALUE blocks to be interpreted multiple times for that font, with multiple instances appearing in the file. So long as the font names used in the :UNDERSCORE block or the :BOX block are different from each other, this will be done with each of them separately.

Using the same font name in both the :UNDERSCORE block and the :BOX block cause the :FONTVALUE blocks to only be interpreted once for that font name for those two blocks rather than twice. However, if the same font name is also used in one or more :DEFAULTFONT blocks, then multiple instances will still be found in the output file -- all but one from the :DEFAULTFONT blocks.

Modifying the :FONTVALUE blocks to display the value returned by device function %font_number() showed that, if the :BOX and :UNDERSCORE blocks used different font names and if the :DEFAULTFONT blocks were numbered from "0" through "5", then the :UNDERSCORE block's font was associated with a value of "6" and the :BOX block's font was associated with a value of "7". When both the :BOX and :UNDERSCORE blocks used the same name, it was associated with a value of "6". So, the "selected fonts" consist of the :DEFAULTFONT blocks defined in the :DEVICE block plus a generated :DEFAULTFONT block for each distinct font name used with the :BOX or :UNDERSCORE block (if any). In the course of investigating other topics, it became apparent that the font style used for these generated :DEFAULTFONT blocks is "plain".

The reference to :FONTVALUE blocks is a reminder that, as discussed here, these blocks may occur within an :INIT block in any order and in any number, and will be interpreted in the order in which they appear in the :INIT block.

The FONT Command-Line Option

Another factor here is the use of command-line option FONT. Comparison with the :DEFAULTFONT block shows that it is very much the same thing. When the font number is the same as an existing :DEFAULTFONT block, then the information from FONT replaces that in the :DEFAULTFONT block. When the font number is higher, then the set of :DEFAULTFONT blocks is extended to include the highest number used, and then any :DEFAULTFONT blocks needed for the :BOX block or the :UNDERSCORE block are created.

When it was implemented, these details became apparent:

  • Only the first two values (font_number and font_name) are required.
  • If there are no additional values, then font_style is set to NULL and font_space and font_height to "0".
  • If there is one additional value, there are two alternatives:
    1. Is is a font_style.
    2. It is a font_space.

It cannot be a font_height because a font_space of one sort or another must be present for a font_height to be recognized.

  • If there are two additional values, there are two alternatives:
    1. They are a font_style and a font_space.
    2. They are a font_space and a font_height.

They cannot be a font_style and a font_height because a font_space of one sort or another must be present for a font_height to be recognized.

  • If there are three additional values, then, of course, they are the font_style, font_space, and font_height, in that order.

The font_space may be entered as "0" (or, as seen in the Open Watcom document build option files, ".0" or as an empty string,"''" (which is documented in the WGML Reference). Either produces a font_space of "0" and allows the next value to be interpreted as the font_height.

Non-contiguous Font Numbers

Nothing forces the font numbers to be contiguous; gaps can be introduced in the source file, as discussed here, and also by the font-number values used with command-line option FONT. This is not, however, how they are intended to be used.

The WGML Reference states in many locations that

If a font number is used for which no font has been defined, 
WATCOM Script/GML will use font zero.

This is only true if no gaps exist in the font numbers. Testing showed that the :FONTVALUE block of the :INIT block is interpreted for each of these font numbers. Further testing using these fonts showed that they have these characteristics:

  1. The font name, font height, and font space are the same as those of the next-to-the-last actual font preceding the first skipped font.
  2. The font style is "plain".

In the binary format itself, of course, these fonts have a distinctive (and empty) entry. The values of the fields are supplied by wgml 4.0, not by gendev 4.1.

The wgml 4.0 Font

From the above, it should be clear that the reason a "wgml font" cannot simply be a :DEFAULTFONT instance is because wgml 4.0 creates new "wgml fonts" under some conditions.

So, then, what is a wgml 4.0 font? These characteristics appear to apply:

  • It is part of an array, with documented limits from "0" to "255" (that is, the range of a uint8_t).
  • It contains at least the information in a :DEFAULTFONT instance, which means that it can be used to access a lot of other items, as shown above.
  • It is not the same as a :DEFAULTFONT instance -- or, rather, the array is not the same as the :DEFAULTFONT instance array.
  • The concept of an "empty" array entry does not appear to exist.
  • It must provide access to the :INTRANS, :OUTTRANS, and :WIDTH blocks.

Taken as a whole, the array of wgml 4.0 fonts is what is referred to as the "available fonts".

This is the wgml_font typedef struct:

typdef struct {
    cop_font            *   bin_font;
    fontswitch_block    *   font_switch;
    code_text           *   font_pause;
    fontstyle_block     *   font_style;
    outtrans_block      *   outtrans;
    uint32_t                default_width;
    uint32_t                em_base;
    uint32_t                font_height;
    uint32_t                font_space;
    uint32_t                line_height;   
    uint32_t                line_space;
    uint32_t                spc_width;
    uint32_t                width_table[0x100];
    char                    font_resident;
    uint8_t                 shift_count;
    char                    shift_height[4];
} wgml_font;

These fields can be filled using information from any of three sources:

  1. The fields in a DefaultFont struct.
  2. The values provided to a FONT command-line option.
  3. The FontAttribute of a BoxBlock or Underscore block (either or both), if that FontAttribute provides a field font_name which (in the UnderscoreBlock) has a non-empty value.

The field bin_font points to the binary :FONT block designated by the field font_name or the FONT command-line option value "font-name".

The fields font_switch and font_pause are based on the DeviceFont instance whose font_name has the same value as the defined name which designated the value of the field bin_font.

The field font_switch points to the binary :FONTSWITCH block designated by the field DeviceFont.font_switch.

The field font_pause points to the binary :FONTPAUSE block designated by the field DeviceFont.font_pause.

The field font_style points to the binary :FONTSTYLE block designated by the field DefaultFont.font_style, or the FONT command-line option value "font-attribute", or the value "plain" if this wgml_font is being created from a BoxBlock or an UnderscoreBlock.

The field outtrans points to the correct table specified by an :OUTTRANS block. If the FontBlock encoding the :FONT block associated with this wgml_font contained a non-empty :OUTTRANS block, then that :OUTTRANS block will be used. Otherwise, if the DeviceBlock encoding the :DEVICE block contained a non-empty :OUTTRANS block, then that :OUTTRANS block will be used. If no non-empty :OUTTRANS block exists in either location, then the field will be NULL.

The field default_width is computed as explained below. It may or may not contain the same value as the attribute char_width. If this wgml_font is being created from a BoxBlock or an UnderscoreBlock, then this field will have the value "1".

The field em_base contain the width, in horizontal_base_units, of the character capital M ("M"), which is the basis for the Horizontal Space Unit and "Em", as stated in section 8.2 of the WGML Reference.

The field font_height contains the value of the field DefaultFont.font_height, or the FONT command-line option value "font-height", or the value "0" if this wgml_font is being created from a BoxBlock or an UnderscoreBlock or no other value was specified.

The field font_space contains the value of the field DefaultFont.font_space, or the FONT command-line option value "font-space", or the value "0" if this wgml_font is being created from a BoxBlock or an UnderscoreBlock or no other value was specified.

The fields line_height and line_space are computed as explained in the next section. They may or may not contain the same values as the corresponding attributes. If this wgml_font is being created from a BoxBlock or an UnderscoreBlock, then the field line_height will have the value "1" and the field line_space will have the value "0".

The field spc_width contains the width, in horizontal_base_units, of the space character (" "). Since at least one space must be allowed for between each pair of text_chars instances, this value should prove helpful.

The field width_table contains the width table, in horizontal base units. This will differ from the width table in the struct pointed to by bin_font when the value of field scale_basis is not "0", since in that case the table in the struct pointed to by bin_font is in terms of the value of the field scale_basis.

The field font_resident will contain either 'y' or 'n', depending on the value of the field DeviceFont.resident or the value 'n' if this wgml_font is being created from a BoxBlock or an UnderscoreBlock.

The fields shift_count and shift_height are used in creating subscripted or superscripted text for a PS device and the value used computed as explained here. This format is intended to make the value as easy and fast to use as possible. If this wgml_font is being created from a BoxBlock or an UnderscoreBlock, or when the device being used is not a PS device, then the field shift_count will have the value "0" and the field shift_height will be an empty string.

It is, of course, at the point that the wgml_font instances are created that any skipped fonts are given their values. The code for our wgml uses the font 0 information to initialize any skipped fonts. These will produce :INIT block :FONTVALUE blocks just as the other wgml_font instances do. Font numbers greater than the number of available fonts will be replaced with font number 0.

Computing Line Heights

This section attempts to relate the material in several sections of the WGML Reference (15.7.13 FONT_HEIGHT, 15.7.14 FONT_SPACE, 15.7.21 LINE_HEIGHT, 15.7.22 LINE_SPACE, 15.8.1.4 LINE_HEIGHT Attribute, 15.8.1.5 LINE_SPACE Attribute, 15.8.1.6 SCALE_BASIS Attribute, 15.10.1.9 VERTICAL_BASE_UNITS Attribute, 15.10.4.3 FONT_HEIGHT Attribute, and 15.10.4.4 FONT_SPACE Attribute) and the documented relation of 72 points per inch produces to the actual behavior of wgml 4.0 so that it can be duplicated by our wgml when the array of wgml fonts (that is, the array of "available fonts") is initialized during the loading of the binary device library.

The value of attribute vertical_base_units is specified in the :DEVICE block; that of attribute scale_basis, if given, is specified in the :FONT block. Neither appears to be altered in any way by wgml 4.0.

The values of attributes font_height and font_space are specified, if they are specified at all, in the :DEFAULTFONT block. The attributes line_height and line_space are specified, if they are specified at all, in the :FONT block. They are available to the driver through device functions %font_height(), %font_space(), %line_height(), and %line_space().

The manual references would suggest that there are two main factors:

  • The use of the value of attribute vertical_base_units versus the use of the value of attribute scale_basis.
  • The use of the values of attributes font_height and font_space versus the use of the values of attributes line_height and line_space.

It must be understood that all of these attributes have values in the binary device library: if nothing else was provided in the source file, then the value is "0".

Some of these values become required when others are or are not present.

  1. As documented here, is that at least one of the attributes line_height and scale_basis must be present.
  2. If attribute scale_basis has a non-zero value, then attribute font_height must also have a non-zero value.
  3. Any font generated because the :BOX block or :UNDERSCORE block provide a font name will have the value "0" for attribute font_height. Thus, any font used in this way must not have a non-zero value for scale_basis.
  4. The value of attribute vertical_base_units must be non-zero.

A value of "0" for attribute vertical_base_units appears to have various effects:

  1. If a font is generated because the :BOX block or :UNDERSCORE block provide a font name, the result is a divide-by-zero error.
  2. If the :BOX block or :UNDERSCORE block provide a font number, or the :UNDERSCORE block does not appear, then font name "''" is searched for and not found.

If the attributes font_height, font_space, and scale_basis are all "0", then the situation is quite simple if not, perhaps, what might be expected:

  1. The values returned by the device functions %font_height() and %font_space() are "0", that is, the default value of the corresponding attributes.
  2. The documented use of the values of attributes line_height and line_space is correct: adding them together does provide the vertical distance moved by the print head, that is, the height of the line used in the document.
  3. The value returned by device function %line_height() is the sum of the values of the attributes line_height and line_space.
  4. The value returned by device function %line_space() is the value of the attribute line_space, and so represents that part of the value returned by device function %line_height() which is not occupied by text.
  5. The value of attribute vertical_base_units has no effect on the computed value returned by device function %line_height().

The design implication is that, if the font is not scaled, the values of attributes line_height and line_space must represent the correct amount of space, in vertical base units.

The manual clearly states that the values returned by device function %line_space() and by device function %line_height() are to be added together to give the height of the line on the page. This is clearly not the case.

From the error produced when attribute scale_basis has a non-zero value and attribute font_height has the value "0", and the manual references it might be believed that changing the value of attribute scale_basis has an effect when attributes font_height (and, optionally, font_space) are specified. Nothing could be farther from the observed reality.

For any allowed value whatsoever of attributes line_height, line_space, and scale_basis, then, for the values shown for attributes font_height and font_space, the device functions %font_height() and %font_space() show the value of the corresponding attribute and the device functions %line_height() and %line_space() show the values given. The first column ("vertical") is for "vertical_base_units". The last column is the size of the change in the value returned by %y_address() when the vertical position is changed by one line.

vertical font_height  font_space  line_height  line_space  delta
 1000        10            2        167           28         167
 6(12)        6            0          1            0           1
 6(12)       12            0          1            0           1
 6(12)       12           12          2            1           2
 8(9)         6            0          1            0           1
 8(9)        12            0          2            0           2
 8(9)        12           12          3            2           3

The first line is for the PS device. The parenthesized values show the equivalent in points, at 72 points per inch, for the test devices. While initially confusing, this appears to be what is happening:

  • The value returned by device function %line_height() in this case is computed this way:
(font_height + font_space) * vertical_base_units
------------------------------------------------
                        72

It is currently rounded up to the next integer, as this gives the same results as wgml for the values tested.

  • The units involved are:
points * vertical_base_units/inch
---------------------------------
         points/inch

so the value is clearly in vertical_base_units.

  • If the value of attribute font_space is "0", then the value returned by device function %line_space() will be "0" as well.
  • If the value of attribute font_space is not "0", then the value returned by device function %line_space() in this case is computed first by computing how much of the value returned by %line_height() is occupied by the space specified by attribute font_height:
font_height * vertical_base_units
--------------------------------
              72

This is then rounded normally and subtracted from the value returned by device function %line_height(); this produces the correct results for the test cases. This value will also be in vertical_base_units.

The rounding criteria may, or may not, need to be adjusted when side-by-side comparisons of wgml 4.0 and our wgml are possible. Detailed investigation was thwarted by this error message:

SN--011: Left and right margins are too close together

which at first appears to make no sense at all, since the vertical spacing would not normally be expected to affect the space between the left and right margins; however, as discussed below it appears that it does, in some situations.

From the above results these conclusions follow:

  1. If attribute font_height is given a non-zero value, then the values, if any, given for attributes line_height, line_space, and scale_basis have no effect on the values returned by device functions %line_height() and %line_space().
  2. The "delta", that is, the number of device lines used by wgml 4.0 for a single line on the page, is equal to the value returned by %line_height().
  3. The value returned by %line_space() is the amount of space in %line_height() which is not occupied by the text.

Investigation of fonts generated as a result of using a font name with the :BOX or :UNDERSCORE block shows that the value "1" is returned by device function %line_height() and "0" by device functions %line_space(), %font_height(), and %font_space().

Computing Character Widths

This section, like that above, attempts to relate the material in several sections of the WGML Reference (15.7.10 DEFAULT_WIDTH, 15.7.13 FONT_HEIGHT, 15.7.14 FONT_SPACE, 15.7.21, 15.8.1.6 SCALE_BASIS Attribute, 15.8.1.9 CHAR_WIDTH Attribute, 15.10.1.8 HORIZONTAL_BASE_UNITS Attribute, 15.10.4.3 FONT_HEIGHT Attribute, and 15.10.4.4 FONT_SPACE Attribute) and the documented relation of 72 points per inch produces to the actual behavior of wgml 4.0 so that it can be duplicated by our wgml when the array of wgml fonts (that is, the array of "available fonts") is initialized during the loading of the binary device library.

The explanation of device function %default_width() provides a useful starting point. It states that

The result of this device function is a numeric value which 
represents the default width of a character in the current font. 
When the font changes in the document, the value returned by this 
function will change accordingly. 

Unfortunately, it does not specify what units are used.

The value of attribute horizontal_base_units is specified in the :DEVICE block; that of attribute scale_basis, if given, and of attribute char_width are specified in the :FONT block. None of these appears to be altered in any way by wgml 4.0.

The attributes font_height and font_space are specified, if they are specified at all, in the :DEFAULTFONT block. They are available to the driver through device functions %font_height() and %font_space().

It must be understood that all of these attributes have values in the binary device library: if nothing else was provided in the source file, then the value is "0".

If the value of attribute horizontal_base_units is "0" , then wgml 4.0 produces this message:

Abnormal program termination: Divide overflow

and halts. Thus, this attribute must have a non-zero value.

If the value of attribute scale_basis is "0", then the value returned by device function %default_width() is simply the value of attribute char_width. It is clearly given in horizontal_base_units.

If the value of attribute scale_basis is not "0", then the value returned by device function %default_width() is not so easily described. Here are some values ("horizontal" is for "horizontal_base_units"; "result" is the value returned by device function %default_width()):

horizontal  scale_basis  char_width  font_height  result
1000        72000        250         10           35
10 (7.2)    8 ( 9)       1           12           15
10 (7.2)    6 (12)       1            9           15
10 (7.2)    6 (12)       1           12           20
 6 (12)     8 ( 9)       1           12            9
 6 (12)     6 (12)       1            9            9
 6 (12)     6 (12)       1           12           12

The first line is for the PS device; the others are various values used in the test devices. The parenthesized values show the equivalent in points, at 72 points per inch, for the test devices. While initially confusing, this appears to be what is happening:

  • The value returned by device function %default_width() in this case is computed this way:
horizontal_base_units * font_height * char_width
------------------------------------------------
                     scale_basis

The value is then rounded, but not in the normal sense of the word, as discussed below.

  • The units involved are:
(horizontal_base_units/inch) * points * scale_basis_units
---------------------------------------------------------
              (scale_basis_units/inch)

from which it would appear that the value returned by device function %default_width() is in horizontal_base_units-points. However, testing with PS makes it clear that the value is, in fact, in horizontal_base_units: apparently, the value of attribute font_height is treated as a dimensionless factor when used horizontally.

The rounding algorithm was changed when investigation of right-justification revealed a problem with the procedure for computing the width of groups of characters ("words"). Examining the problem itself suggested that it was the result of adding the character widths in scale_basis_units first and then converting the sum to horizontal_base_units and that converting each character first and then adding the results together would give more accurate results. But discrepancies still remained, and they turned out to be caused by the way the character widths were being rounded.

The existing rounding procedure was very simple: it always rounded up. Using a spreadsheet, the widths computed for ten identical characters (every character from the space character to '~' was used) and using a special document specification and test driver to determine what width wgml 4.0 used for each set of ten identical characters confirmed that adding the converted character widths was definitely correct, and showed that the characters whose computed widths did not match those used by wgml 4.0 all were very close to integer values. In particular, if characters whose computed widths had decimal parts up to 0.06 were rounded down rather than up, while those whose computed widths had decimal parts of 0.11 or more were rounded up, then the results of adding together the character widths matched the values used by wgml 4.0 exactly.

If the font_height was reduced to "1", then, to four decimal places, each digit added to the base size of a character added this to the result:

 1     .0139
 2     .0278
 3     .0417
 4     .0556
 5     .0694
 6     .0833
 7     .0972
 8     .1111
 9     .1250
10     .1389

Since the behavior of wgml 4.0 clearly showed that the criterion for rounding up must be greater than 0.06 and less than or equal to 0.11, it must fall within the range of values shown. Further testing placed the point of rounding between .0972 and .1111. In the code, 0.1 is used: values with decimal parts less than 0.1 are left as-is (rounded down); those with decimal parts greater than or equal to 0.1 are incremented (rounded up). Since only one font was investigated, it is possible that further refinement of the rounding criterion will be needed when other fonts come into play,

It is worth observing that, for the PS device, the value of attribute char_width is clearly in scale basis units, while that returned by device function %default_width() is clearly in horizontal_base_units. For the test devices, the value of attribute char_width is clearly in horizontal_base_units. This ambiguity is, of course, exactly how the WGML Reference describes the situation, and it applies to the :WIDTH table as well.

Control Words .TI and .TR

This section discusses apparent differences between the descriptions of these control words in the document Waterloo SCRIPT and their behavior as observed when investigating their effects on the :INTRANS block and the :OUTTRANS block.

Some items are puzzling but correct: thus, in the discussion of .DC, we read:

TI (Translate on Input): defines an escape character that causes the
character immediately following it to be translated according to
the .TI translate table currently in effect.  The initial default
value and the "OFF" value is a blank. 

using the :CONVERT tag to produce the default format shows that a blank, that is, a space character, is still the initial default value. This, of course, makes no sense at all: if a space were actually the default escape character, then the first letter of every word would be subject to input translation. This is clearly not happening in wgml 4.0! The implication is that wgml 4.0 by default doesn't consider the space character (or, by extension, any "whitespace" character) to be significant -- that is, it supports the model developed here in that spaces are, generally, not included in the text controlled by a text_chars instance.

The control word .TI itself has three forms; .TR also uses two of them. The first, unique to .TI, involving the operand SET, is documented in part with this statement:

If .TI is used without any operands, the translate table specified by 
the TRANSLATE option when SCRIPT was invoked will be reinstated and 
any previously defined escape character will be nullified. 

Testing clearly shows that, in wgml 4.0, this does not happen: the table associated specifically with .TI is cleared but the escape character is still active and any :INTRANS block tables still exist and are used. Similarly, if .TR is used without any operands, then the table associated with .TR is cleared but any active table associated with an :OUTTRANS block is still in effect.

On the other hand, if the operand SET is used with no argument, then "the previously defined escape character will be nullified". That is to say, no input translation at all will occur after

.ti set

is encountered.

Control word .TR has no effect on the width table: the width used by wgml 4.0 is the width of the unconverted character, which may or may not match the width of the character converted to. This issue is discussed in the context of the :OUTTRANS block.

The strangest statement, which applies to both .TI and .TR, is:

s <s|t>: This form of the control word allows a single source character
to be translated into itself if no "target" character is specified.

No examples are given, and wgml 4.0, when faced with either of

.ti t t|u
.tr t t|u

produces this error message:

SC--049: A single character or a two character hexadecimal
         value must be specified

Using "<t|u>" in place of "t|u" does not improve matters. Separating the characters with spaces avoids the error; however, wgml 4.0 treats this as mapping "t" to "t" and "|" to "u", hardly the effect desired. When these control words are added to the WGML Reference, this variant should be omitted or listed as not supported.

The documentation of .TR includes some information on the character translation list, which includes this statement:

The last pair in a .TR line may consist of only a "source" character,
which indicates that the character is to be translated into itself.

This effect does occur, either after a ".TR" (clearing the table) or not (redefining the character to itself), but only for the table associated with the .TR control word: any active table associated with an :OUTTRANS block is still in effect. Testing shows that this applies to .TI as well: the net effect of

.ti i v j k
.ti i

is that the translation talbe associated with .TI will convert "j" to "k" but "i" will remain "i".

One possible explanation of this concern with specifying that a character is to be mapped to itself involves the TRANSLATE option of the SCRIPT program, which is not available in wgml 4.0:

TRanslate | NOTRanslate
Causes character translation using a translate table that contains
as default lower to uppercase mapping. Other types of translation
may be specified with the .TR control word.

The default was NOTRanslate; however, if TRanslate was specified in the command-line, then the mechanisms for specifying that a character returns itself when no translation exists might have had the effect of modifying this default table. There is, of course, no way to be sure.

The :INTRANS Block

The :INTRANS block poses a problem because there can be two of them; there is also an input translation table associated with command word .TI to consider. This section will start, however, with a discussion of the :INTRANS blocks.

In the source file, multiple :INTRANS blocks can exist but these are consolidated into a single compiled block as discussed here. This is not the problem. The problem is that both :DEVICE blocks and :FONT blocks can have an :INTRANS block, and these are not combined in their compiled form, since they are in two different blocks.

To provide a concrete example, consider these :INTRANS blocks, first in the :DEVICE block:

:INTRANS
   a b
   c d
:eINTRANS

and now in the :FONT block:

:INTRANS
   a c
   b e
:eINTRANS

Characters to be translated are preceded by an escape character. If we take "/" as the escape character and ask how the sequence "///a" will be translated by those :INTRANS blocks, then these possibilities exist:

  • "/b" if the tables are merged and the font table is the base table ("a c" is overwritten by "a b");
  • "/c" if the tables are merged and the device table is the base table ("a b" is overwritten by "a c");
  • "e" if the tables are applied sequentially and the device table is applied first ("///a" becomes "/b" which becomes "e");
  • "d" if the tables are applied sequentially and the font table is applied first ("///a" becomes "/c" which becomes "d").

Performing a test with the layout modified to use "/" as the escape character (the default is no escape character), the :INTRANS blocks shown, and the sequence "///a" produces "/c": the tables are merged and the device table is the base table.

Further testing with .TI showed clearly that the input translation table associated with .TI is applied first, but not sequentially: the effect is to merge all three tables, starting with the device table, then adding the font table, and finally the .TI table.

However, the implementation of the merger is actually done by first looking the character up in the table associated with .TI and then, if not found, in the table in the :FONT block and then, if not found, in the table in the :DEVICE block.

This greatly simplifies things: a simple lookup can be used by wgml only when the input translation escape character appears. When to use it would be hard to determine through testing, but, fortunately, the WGML Reference clearly states when it is done in Section 8.8 Input Translation:

Input translation is performed when text is separated into words. 
The translated character is not examined during these operations, 
providing a method for bypassing the normal processing rules of 
WATCOM Script/GML. 

Testing with various :WIDTH blocks shows that the width used is that of the translated character, not the original character.

The example illustrates using "/ " to avoid "the normal space expansion", a term not otherwise explained. Testing these items:

///a( ) and ///a(/ )

shows that, without the translation escape character, the closing parenthesis is treated as a separate word (that is, is controlled by a separate text_chars instance), and so is subject to having additional space inserted by the justification process, while, with the translation escape character, the space and closing parenthesis are treated as part of the same word (that is, are controlled by the same text_chars instance) as the "///a(", so that no additional spaces are inserted between the parentheses even if justication is on.

Implementation note: testing with wgml 4.0 in connection with tabbing suggests that wgml 4.0 will not replace an input character with a null byte, at least not when .ti is used to provide the translation. Our wgml will allow this. Perhaps it should not; but then, perhaps it should not allow any value less than 0x20 from being used.

The :OUTTRANS Block

The :OUTTRANS block poses a problem because there can be two of them.

In the source file, multiple :OUTTRANS blocks can exist but these are consolidated into a single compiled block as discussed here. This is not the problem. The problem is that both :DEVICE blocks and :FONT blocks can have an :OUTTRANS block, and these are not combined in their compiled form, since they are in two different blocks.

To provide a concrete example, consider these :OUTTRANS blocks, first in the :DEVICE block:

:OUTTRANS
   a b
   c d
:eOUTTRANS

and now in the :FONT block:

:OUTTRANS
   a c
   b e
:eOUTTRANS

Output translation differs from input translation in two ways:

  1. For output subject to output translation, every character is checked.
  2. More than one character may be used to replace a single character.

When the above :OUTTRANS tables are implemented, the character "a" becomes:

  • "c" if both are provided; and
  • "b" if only the table in the :DEVICE block is provided.

From which the rule for multiple :OUTTRANS tables is seen to be:

If the :FONT block contains an :OUTTRANS table, then that table, 
and that table alone, is used. If the :FONT block does not contain
an :OUTTRANS table but the :DEVICE block does, then the :OUTTRANS
table in the :DEVICE block is used.

This applies to the output of device function %text() when used in the :DRIVER block and to the output of text intended to become part of the document.

Since more than one character can be used to replace a given character, which presumably produce exactly one character when the output device encounters them, an investigation of the :WIDTH table was made with these results:

  1. The amount added to the value returned by device function %x_address() does not depend on how many characters replace a single character during output translation.
  2. The width used is the width of the untranslated character, even when only one character is used for the translation.

Thus, the width table must contain the width of the character that will result when the translation is done.

The :OUTTRANS blocks take one of two forms in the binary device file; our parser puts both forms into the more-complicated multiple-byte form. However, only the PS device uses an :OUTTRANS table, and it uses multiple characters. Thus, so far as the Open Watcom documentation build system goes, only the multiple-byte form exists. It would be possible to produce either (basically, there would be two pointers, one to the 1-byte form and one to the 2-byte form, at most one of which would be non-NULL), and this might be considered if our wgml is released more broadly, but the overhead of determining which was present on a character-by-character basis or even a text_chars instance by text_chars instance basis might make it simpler to accept that the table will always be in the multi-byte form.

The :WIDTH Block

There is only one :WIDTH block in the compiled file, since, as noted here, multiple :WIDTH blocks in the source file are merged into a single block in the compiled file.

The binary :WIDTH block can be created by gendev 4.1 in various sizes, however, the binary form provided by our parser is always an array of 4-byte values since wgml will have to accomodate 4-byte values and it makes little sense to use more than one size of variable for the same purpose in wgml.

Both the field width in the struct text_chars and the fields em_base, dv_base, and spc_width of typedef struct wgml_font require the use of the :WIDTH block. The material presented here on computing the value returned by device function %default_width() suggests that this is not necessarily a straightforward operation.

A function has been implemented which takes takes a pointer to one or more characters, a count of the number of characters involved, and the number of the wgml_font involved, and returns the correct result of adding up the values from the :WIDTH table and, if the attribute scale_basis has a non-zero value, converting the result to horizontal base units.

Font Styles

Because the :FONTSTYLE block is not documented in the WGML Reference, some notes on how they are used may be appropriate.

Testing confirms that the value of the attribute type of the :FONTSTYLE block is matched to either of two items:

  • the attribute fontstyle of the :DEFAULTFONT block; and
  • the "font-attribute" used with the FONT command-line option.

This match is used by wgml 4.0 to identify and apply the desired font style as needed. Indeed, if a nonexistent font style is named, then wgml 4.0 produces this message:

SN--098: For font style 'fred'
         Font style name is not defined

These restrictions, found by testing, on the value of the type field of the :FONTSTYLE block should apply wherever the value is used:

  • It cannot contain embedded spaces.
  • It cannot contain more than 79 characters.

Testing shows that the font style name is converted to all-lower-case, at least in :FONTSTYLE blocks.

It is clear from the WGML Reference that, prior to version 3.33, wgml used a defined set of keywords for font styles:

in :DEFAULTFONT   in FONT
  PLAIN             PLAIN
  UNDERSCORE        USCORE
  BOLD              BOLD
  USBOLD            USBOLD
  UNDERLINE         ULINE
  ULBOLD            ULBOLD

However, it is equally clear from the discussion of the :UNDERLINE block that USCORE, at least, is no longer a keyword and that a :FONTSTYLE for USCORE must be defined if it is to be used.

It appears that, with version 3.33, these terms lost their keyword status, and all font styles had to be defined explicitly in the the :DRIVER blocks. Since the forms used with the command-line option FONT were likely to be embedded in existing document build systems, while those associated with the :DEFAULTFONT block were entirely contained in the source files for the binary device library, implementations for the command-line set were provided: USCORE and ULINE but not UNDERSCORE or UNDERLINE.

The separation of the document specification from the device library and the provision of definitions for the font styles "plain", "bold", "uline", "uscore", "ulbold", and "usbold" in most drivers implies the ability to allow any document to be output to any device. Some devices have capabilities which other devices lack, and so additional, device-specific font styles are appropriate for some devices. This is less of a problem than might at first appear, for two reasons:

  • The specialized font styles can be used with :DEFAULTFONT instances, so that a document specification can use the same :DEFAULTFONT instance and get the appropriate result, but not the same result, on different devices.
  • The FONT command-line option can be used to change the font style used by a specific document specification with a specific device. This does require that the command lines used with different devices differ.

A specific site, such as the Open Watcom build system, may use only a few devices (WHELP, PS) for all of its output. If the system is elaborate enough (if it uses batch files or make files to select command-line option files for instance), then the command-line options may vary depending on both the device and the document specification to produce the desired use of font styles.

Personal tools