Wgml Fonts

From Open Watcom

Jump to: navigation, search

Contents

Introduction

The WGML Reference uses the term "font" in several different contexts:

  • The :FONT block, which is said to "define" a font.
  • The :DEVICEFONT block.
  • The :DEFAULTFONT block.
  • The :FONTPAUSE block.
  • The :FONTSWITCH block.
  • The :FONTSTYLE block (not actually in the WGML Reference, but clearly related).
  • The FONT command-line option.
  • A set of available or selected fonts.

The question, then, is: just what is a "font" in wgml 4.0?

The :DEFAULTFONT Block

The WGML Reference Tutorial, in section 3.2.6 Selecting Device Information, has this informative statement:

The set of fonts define the character sets and their attributes 
used to produce the text in the document. The fonts numbered zero 
through three are used by the GML tags :HP0 through :HP3. Font 
numbers greater than three may be referenced in the layout or
by the :SF GML tag.

Indeed, these tag attributes:

  • font
  • number_font
  • string_font

always have the value shown as a number and are normally described as:

This attribute accepts a non-negative integer number. If a font
number is used for which no font has been defined, WATCOM
Script/GML will use font zero.

Every device is required to provide a :DEFAULTFONT block with the value "0" for its font attribute.

The WGML Reference Section 14.3.9 FONT states:

The specified font-number is assigned a particular font. The font 
numbers zero through three correspond to the highlight-phrase 
tags :hp0 through :hp3. Font numbers greater than three (up to a 
maximum of 255) may be used in the layout section or with the :sf
tag.

Each device has a list of available fonts defined with it. The font-
name value is selected from these defined fonts, and must be 
specified.

The font-attribute value specifies an attribute for the defined 
font. If the font attribute is not specified, the attribute PLAIN 
is set. 

This refers to an "attribute" because the WGML Reference predates the :FONTSTYLE block. For wgml 4.0, the font-attribute value must name a :FONTSTYLE block.

In the context of the device library, the WGML Reference Section 15.10.7 BOX Block describes the attribute font in part this way:

The font attribute may be either a non-negative integer number or a 
character value. If a number is specified, the font of the box will 
be the default font with the corresponding font number. A character 
attribute value must be a font name defined in a device font block. 

Examination of the device library information accessible here shows this structure in a device definition for fonts:

:DEFAULTFONT block
   :DEVICEFONT block
      :FONT block
      :FONTPAUSE block
      :FONTSWITCH block
   :FONTSTYLE block

The :DEFAULTFONT block contains a numeric attribute font; in binary form, these blocks appear as an array with the value of attribute font acting as the array index. It is, therefore, tempting to equate "font" with ":DEFAULTFONT instance"; indeed, in most of the sections which deal with this issue, that is taken for granted.

But it does not quite work, and the proof of that is the result of exploring the concept of "selected fonts", as they relate to the multiple invocations of the :INIT block :FONTVALUE block, which are documented here.

Multiple :FONTVALUE Block Instances

The WGML Reference, in Section 15.9.2.3 FONTVALUE Section states, in part:

WATCOM Script/GML selects the fonts being used in the document. For 
each of the selected fonts, the fontvalue section is evaluated. 
Device functions, such %default_width, will return the values 
appropriate for the selected font.

By using distinctive values for their font_out_name1 attributes and placing %image(%font_outname1()) in the :FONTVALUE blocks, it is possible to be a bit more precise about which fonts are "selected":

  • each :DEFAULTFONT;
  • the font named for use in the :UNDERSCORE block, if any; and
  • the font named for use in the :BOX block, if any.

As discussed here, not only is the :UNDERSCORE block itself entirely optional, but the value of its attribute font may be a font number or an empty string. The :FONTVALUE blocks, however, are only interpreted using the font for the :UNDERSCORE block if a font name is specified.

As discussed here, the :BOX block, which is mandatory, can take either a font name or a font number as the value of its attribute font. The :FONTVALUE blocks, however, are only interpreted using the font for the :BOX block if a font name is specified.

Using the same font name multiple times showed, first, that the :FONTVALUE blocks are interpreted for each :DEFAULTFONT block, even if the same font name is used in more than one :DEFAULTFONT block. This results in multiple instances with the same font's values being found in the output file.

Using the same font name in an :UNDERSCORE block or the :BOX block as in a :DEFAULTFONT block will cause the :FONTVALUE blocks to be interpreted multiple times for that font, with multiple instances appearing in the file. So long as the font names used in the :UNDERSCORE block or the :BOX block are different from each other, this will be done with each of them separately.

Using the same font name in both the :UNDERSCORE block and the :BOX block cause the :FONTVALUE blocks to only be interpreted once for that font name for those two blocks rather than twice. However, if the same font name is also used in one or more :DEFAULTFONT blocks, then multiple instances will still be found in the output file -- all but one from the :DEFAULTFONT blocks.

Modifying the :FONTVALUE blocks to display the value returned by device function %font_number() showed that, if the :BOX and :UNDERSCORE blocks used different font names and if the :DEFAULTFONT blocks were numbered from "0" through "5", then the :UNDERSCORE block's font was associated with a value of "6" and the :BOX block's font was associated with a value of "7". When both the :BOX and :UNDERSCORE blocks used the same name, it was associated with a value of "6". So, the "selected fonts" consist of the :DEFAULTFONT blocks defined in the :DEVICE block plus a generated :DEFAULTFONT block for each distinct font name used with the :BOX or :UNDERSCORE block (if any). In the course of investigating other topics, it became apparent that the font style used for these generated :DEFAULTFONT blocks is "plain".

The reference to :FONTVALUE blocks is a reminder that, as discussed here, these blocks may occur within an :INIT block in any order and in any number, and will be interpreted in the order in which they appear in the :INIT block.

The FONT Command-Line Option

Another factor here is the use of command-line option FONT. Comparison with the :DEFAULTFONT block shows that it is very much the same thing. When the font number is the same as an existing :DEFAULTFONT block, then the information from FONT replaces that in the :DEFAULTFONT block. When the font number is higher, then the set of :DEFAULTFONT blocks is extended to include the highest number used, and then any :DEFAULTFONT blocks needed for the :BOX block or the :UNDERSCORE block are created.

When it was implemented, these details became apparent:

  • Only the first two values (font_number and font_name) are required.
  • If there are no additional values, then font_style is set to NULL and font_space and font_height to "0".
  • If there is one additional value, there are two alternatives:
    1. Is is a font_style.
    2. It is a font_space.

It cannot be a font_height because a font_space of one sort or another must be present for a font_height to be recognized.

  • If there are two additional values, there are two alternatives:
    1. They are a font_style and a font_space.
    2. They are a font_space and a font_height.

They cannot be a font_style and a font_height because a font_space of one sort or another must be present for a font_height to be recognized.

  • If there are three additional values, then, of course, they are the font_style, font_space, and font_height, in that order.

The font_space may be entered as "0" (or, as seen in the Open Watcom document build option files, ".0" or as an empty string,"''" (which is documented in the WGML Reference). Either produces a font_space of "0" and allows the next value to be interpreted as the font_height.

Non-contiguous Font Numbers

Nothing forces the font numbers to be contiguous; gaps can be introduced in the source file, as discussed here, and also by the font-number values used with command-line option FONT. This is not, however, how they are intended to be used.

The WGML Reference states in many locations that

If a font number is used for which no font has been defined, 
WATCOM Script/GML will use font zero.

This is only true if no gaps exist in the font numbers. Testing showed that the :FONTVALUE block of the :INIT block is interpreted for each of these font numbers. Further testing using these fonts showed that they have these characteristics:

  1. The font name, font height, and font space are the same as those of the next-to-the-last actual font preceding the first skipped font.
  2. The font style is "plain".

In the binary format itself, of course, these fonts have a distinctive (and empty) entry. The values of the fields are supplied by wgml 4.0, not by gendev 4.1.

The wgml 4.0 Font

From the above, it should be clear that the reason a "wgml font" cannot simply be a :DEFAULTFONT instance is because wgml 4.0 creates new "wgml fonts" under some conditions.

So, then, what is a wgml 4.0 font? These characteristics appear to apply:

  • It is part of an array, with documented limits from "0" to "255" (that is, the range of a uint8_t).
  • It contains at least the information in a :DEFAULTFONT instance, which means that it can be used to access a lot of other items, as shown above.
  • It is not the same as a :DEFAULTFONT instance -- or, rather, the array is not the same as the :DEFAULTFONT instance array.
  • The concept of an "empty" array entry does not appear to exist.
  • It must provide access to the :INTRANS, :OUTTRANS, and :WIDTH blocks.

This is the current version in our wgml code:

WgmlFont {
    cop_font            *   bin_font;
    fontswitch_block    *   font_switch;
    code_text           *   font_pause;
    fontstyle_block     *   font_style;
    uint16_t                font_height;
    uint16_t                font_space;    
};

These fields can be filled using information from any of three sources:

  1. The fields in a DefaultFont struct.
  2. The values provided to a FONT command-line option.
  3. The FontAttribute of a BoxBlock or Underscore block (either or both), if that FontAttribute provides a field font_name which (in the UnderscoreBlock) has a non-empty value.

The field bin_font points to the binary :FONT block designated by the field font_name or the FONT command-line option value "font-name".

The fields font_switch and font_pause are based on the DeviceFont instance whose font_name has the same value as the defined name which designated the value of the field bin_font.

The field font_switch points to the binary :FONTSWITCH block designated by the field DeviceFont.font_switch.

The field font_pause points to the binary :FONTPAUSE block designated by the field DeviceFont.font_pause.

The field font_style points to the binary :FONTSTYLE block designated by the field DefaultFont.font_style, or the FONT command-line option value "font-attribute", or the value "plain" if this WgmlFont is being created from a BoxBlock or an UnderscoreBlock.

The field font_height contains the value of the field DefaultFont.font_height, or the FONT command-line option value "font-height", or the value "0" if this WgmlFont is being created from a BoxBlock or an UnderscoreBlock.

The field font_space contains the value of the field DefaultFont.font_space, or the FONT command-line option value "font-space", or the value "0" if this WgmlFont is being created from a BoxBlock or an UnderscoreBlock.

It is, of course, at the point that the WgmlFont instances are created that any skipped fonts are given their values. The code for our wgml uses the font 0 information to initialize any skipped fonts. These will still be used with :FONTVALUE blocks. Font numbers greater than the number of available fonts will be replaced with font number 0.

The :INTRANS Block

The :INTRANS block poses a problem because there can be two of them.

In the source file, multiple :INTRANS blocks can exist but these are consolidated into a single compiled block as discussed here. This is not the problem. The problem is that both :DEVICE blocks and :FONT blocks can have an :INTRANS block, and these are not combined in their compiled form, since they are in two different blocks.

To provide a concrete example, consider these :INTRANS blocks, first in the :DEVICE block:

:INTRANS
   a b
   c d
:eINTRANS

and now in the :FONT block:

:INTRANS
   a c
   b e
:eINTRANS

Characters to be translated are preceded by an escape character. If we take "/" as the escape character and ask how the sequence "///a" will be translated by those :INTRANS blocks, then these possibilities exist:

  • "/b" if the tables are merged and the font table is the base table ("a c" is overwritten by "a b");
  • "/c" if the tables are merged and the device table is the base table ("a b" is overwritten by "a c");
  • "e" if the tables are applied sequentially and the device table is applied first ("///a" becomes "/b" which becomes "e");
  • "d" if the tables are applied sequentially and the font table is applied first ("///a" becomes "/c" which becomes "d").

Performing a test with the layout modified to use "/" as the escape character (the default is no escape character), the :INTRANS blocks shown, and the sequence "///a" produces "/c": the tables are merged and the device table is the base table.

The "merger" of the two input translation tables could be virtual: the function doing the translation could look the character up in the table in the :FONT block first and then, if not found, check the table in the :DEVICE block. Alternately, the merger can be done physically and the merged table made accessible through the wgmlFont struct; this might be more efficient. This greatly simplifies things: a simple lookup can be used by wgml only when the input translation escape character appears. When to use it would be hard to determine through testing, but, fortunately, the WGML Reference clearly states when it is done in Section 8.8 Input Translation:

Input translation is performed when text is separated into words. 
The translated character is not examined during these operations, 
providing a method for bypassing the normal processing rules of 
WATCOM Script/GML. 

Testing with various :WIDTH blocks shows that the width used is that of the translated character, not the original character.

The example illustrates using "/ " to avoid "the normal space expansion", a term not otherwise explained. Testing these items:

///a( ) and ///a(/ )

shows that, without the translation escape character, the closing parenthesis is treated as a separate word (that is, is controlled by a separate TextChars instance), and so is subject to having additional space inserted by the justification process, while, with the translation escape character, the space and closing parenthesis are treated as part of the same word (that is, are controlled by the same TextChars instance) as the "///a(" (which, with the above test setup, appears as "/c(" in the output).

The :OUTTRANS Block

The :OUTTRANS block poses a problem because there can be two of them.

In the source file, multiple :OUTTRANS blocks can exist but these are consolidated into a single compiled block as discussed here. This is not the problem. The problem is that both :DEVICE blocks and :FONT blocks can have an :OUTTRANS block, and these are not combined in their compiled form, since they are in two different blocks.

To provide a concrete example, consider these :OUTTRANS blocks, first in the :DEVICE block:

:OUTTRANS
   a b
   c d
:eOUTTRANS

and now in the :FONT block:

:OUTTRANS
   a c
   b e
:eOUTTRANS

Output translation differs from input translation in two ways:

  1. For output subject to output translation, every character is checked.
  2. More than one character may be used to replace a single character.

When the above :OUTTRANS tables are implemented, the character "a" becomes:

  • "c" if both are provided; and
  • "b" if only the table in the :DEVICE block is provided.

From which the rule for multiple :OUTTRANS tables is seen to be:

If the :FONT block contains an :OUTTRANS table, then that table, 
and that table alone, is used. If the :FONT block does not contain
an :OUTTRANS table but the :DEVICE block does, then the :OUTTRANS
table in the :DEVICE block is used.

This applies to the output of device function %text() when used in the :DRIVER block and to the output of text intended to become part of the document.

Since more than one character can be used to replace a given character, which presumably produce exactly one character when the output device encounters them, an investigation of the :WIDTH table was made with these results:

  1. The amount added to the value returned by device function %x_address() does not depend on how many characters replace a single character during output translation.
  2. The width used is the width of the untranslated character, even when only one character is used for the translation.

Thus, the width table must contain the width of the character that will result when the translation is done.

The :OUTTRANS blocks take one of two forms in the binary device file; our parser puts both forms into the more-complicated multiple-byte form. However, only the PS device uses an :OUTTRANS table, and it uses multiple characters. Thus, so far as the Open Watcom documentation build system goes, only the multiple-byte form exists. It would be possible to produce either (basically, there would be two pointers, one to the 1-byte form and one to the 2-byte form, at most one of which would be non-NULL), and this might be considered if our wgml is released more broadly, but the overhead of determining which was present on a character-by-character basis or even a TextChars instance by TextChars instance basis might make it simpler to accept that the table will always be in the multi-byte form.

The :WIDTH Block

There is only one :WIDTH block in the compiled file, since, as noted here, multiple :WIDTH blocks in the source file are merged into a single block in the compiled file.

The binary :WIDTH block can be created by gendev 4.1 in various sizes, however, the binary form provided by our parser is always an array of 4-byte values since wgml will have to accomodate 4-byte values and it makes little sense to use more than one size of variable for the same purpose in wgml.

The preliminary sequence for forming TextLine instances discussed here suggest that a function which takes a TextChars instance as its parameter and sets the value of its length field to the total length of the characters to be output is the most likely solution to work properly.

Font Styles

Because the :FONTSTYLE block is not documented in the WGML Reference, some notes on how they are used may be appropriate.

Testing confirms that the value of the attribute type of the :FONTSTYLE block is matched to either of two items:

  • the attribute fontstyle of the :DEFAULTFONT block; and
  • the "font-attribute" used with the FONT command-line option.

This match is used by wgml 4.0 to identify and apply the desired font style as needed. Indeed, if a nonexistent font style is named, then wgml 4.0 produces this message:

SN--098: For font style 'fred'
         Font style name is not defined

These restrictions, found by testing, on the value of the type field of the :FONTSTYLE block should apply wherever the value is used:

  • It cannot contain embedded spaces.
  • It cannot contain more than 79 characters.

Testing shows that the font style name is converted to all-lower-case, at least in :FONTSTYLE blocks.

It is clear from the WGML Reference that, prior to version 3.33, wgml used a defined set of keywords for font styles:

in :DEFAULTFONT   in FONT
  PLAIN             PLAIN
  UNDERSCORE        USCORE
  BOLD              BOLD
  USBOLD            USBOLD
  UNDERLINE         ULINE
  ULBOLD            ULBOLD

However, it is equally clear from the discussion of the :UNDERLINE block that USCORE, at least, is no longer a keyword and that a :FONTSTYLE for USCORE must be defined if it is to be used.

It appears that, with version 3.33, these terms lost their keyword status, and all font styles had to be defined explicitly in the the :DRIVER blocks. Since the forms used with the command-line option FONT were likely to be embedded in existing document build systems, while those associated with the :DEFAULTFONT block were entirely contained in the source files for the binary device library, implementations for the command-line set were provided: USCORE and ULINE but not UNDERSCORE or UNDERLINE.

The separation of the document specification from the device library and the provision of definitions for the font styles "plain", "bold", "uline", "uscore", "ulbold", and "usbold" in most drivers implies the ability to allow any document to be output to any device. Some devices have capabilities which other devices lack, and so additional, device-specific font styles are appropriate for some devices. This is less of a problem than might at first appear, for two reasons:

  • The specialized font styles can be used with :DEFAULTFONT instances, so that a document specification can use the same :DEFAULTFONT instance and get the appropriate result, but not the same result, on different devices.
  • The FONT command-line option can be used to change the font style used by a specific document specification with a specific device. This does require that the command lines used with different devices differ.

A specific site, such as the Open Watcom build system, may use only a few devices (WHELP, PS) for all of its output. If the system is elaborate enough (if it uses batch files or make files to select command-line option files for instance), then the command-line options may vary depending on both the device and the document specification to produce the desired use of font styles.

Personal tools