Wgml Sequencing
From Open Watcom
Contents |
Introduction
This page is intended to consolidate information developed while working on other topics on the sequence in which wgml 4.0 performs various actions. Although some "rounding out" of the topics is unavoidable, a comprehensive discussion of these topics lies in the future.
Duplicating the steps shown is mandatory only to the extent that following the same sequence as wgml 4.0 is needed to ensure that our wgml produces the same output file from the same input.
From time to time, statements are made about where and when text output occurred. This always refers to output intended to be part of the document, as opposed to the control codes or, in the test framework, identifying text, emitted as a result of interpreting the various compiled function blocks. It might be wondered how it was possible to be certain that no text output occurred when that output included space characters. These steps were taken to ensure accuracy in this matter:
- The :DEVICE block was given an :OUTTRANS block which translated " " to "|".
- Only %image() (never %text()) was used for the function block output, and any embedded spaces were not converted.
- All text output was interpreted.
Thus, spaces intended to appear in the document when printed appear as "|" characters and were quite obvious -- as was their absence.
The test framework used implemented all of the function blocks. When a block is identified as not being interpreted when it is expected to be, that applies to the situation when that block exists: that is, the block exists, a context exists in which it is usually interpreted, and yet in this particular case it is not.
Blocks which do not exist, of course, cannot be interpreted. Every statement that a block is interpreted at a particular point must be understood as qualified by "if that block exists". Unless otherwise noted, the effect of a block not existing is identical to it existing and doing nothing whatsoever when interpreted.
Startup
While wgml 4.0 can be assumed to begin by processing its command line, this is not something which, at present, can be investigated. However, by using the START and DOCUMENT :PAUSE and :INIT blocks, some of the initial steps can be identified and placed in their proper order.
Consider this output from wgml 4.0 in processing an existing document specification with command-line option "incl" specified (the output continues with items which depend on the document specification):
Processing device information *** START PAUSE block. Current file is 'e:\progdev\cpp\owtest\wgml\docs\ex1.gml' Current file is 'e:\progdev\cpp\owtest\wgml\docs\testlay.gml' Processing layout Formatting document *** DOCUMENT PAUSE block.
examination of the output file shows this (redacted, and representing items corresponding to those shown above):
*START INIT VALUE block* *START INIT FONTVALUE block* (multiple instances) *DOCUMENT INIT VALUE block* *DOCUMENT INIT FONTVALUE block* (multiple instances) >SW01 (:FONTSWITCH :STARTVALUE for :DEFAULTFONT 0)
By using %setsymbol() and %image(%getstrsymbol()) (which returns a non-null result only when after the %setsymbol(), and so the block it is in, has been interpreted) it can be shown that the blocks are interpreted in this order:
START :PAUSE block START :INIT block DOCUMENT :PAUSE block DOCUMENT :INIT block :FONTPAUSE for :DEFAULTFONT 0 :FONTSWITCH :STARTVALUE for :DEFAULTFONT 0
If an invalid document specification is used, then the screen output is:
Processing device information
*** START PAUSE block.
****ERROR**** IO--001: For file 'none'
System message is 'No such file or directory'
Cannot open file
the output file contains:
*START INIT VALUE block* *START INIT FONTVALUE block* (multiple instances)
The WGML Reference, in Section 15.9.2.3 FONTVALUE Section states, in part:
WATCOM Script/GML selects the fonts being used in the document. For each of the selected fonts, the fontvalue section is evaluated. Device functions, such %default_width, will return the values appropriate for the selected font.
By using distinctive values for their font_out_name1 attributes and placing %image(%font_outname1()) in the :FONTVALUE blocks, it is possible to be a bit more precise about which fonts are "selected":
- each :DEFAULTFONT;
- the font named for use in the :UNDERSCORE block, if any; and
- the font named for use in the :BOX block, if any.
As discussed here, not only is the :UNDERSCORE block itself entirely optional, but the value of its attribute font may be a font number or an empty string. The :FONTVALUE blocks, however, are only interpreted using the font for the :UNDERSCORE block if a font name is specified.
As discussed here, the :BOX block, which is mandatory, can take either a font name or a font number as the value of its attribute font. The :FONTVALUE blocks, however, are only interpreted using the font for the :BOX block if a font name is specified.
Using the same font name multiple times showed, first, that the :FONTVALUE blocks are interpreted for each :DEFAULTFONT block, even if the same font name is used in more than one :DEFAULTFONT block. This results in multiple instances with the same font's values being found in the output file.
Using the same font name in an :UNDERSCORE block or the :BOX block as in a :DEFAULTFONT block will cause the :FONTVALUE blocks to be interpreted multiple times for that font, with multiple instances appearing in the file. So long as the font names used in the :UNDERSCORE block or the :BOX block are different from each other, this will be done with each of them separately.
Using the same font name in both the :UNDERSCORE block and the :BOX block cause the :FONTVALUE blocks to only be interpreted once for that font name for those two blocks rather than twice. However, if the same font name is also used in one or more :DEFAULTFONT blocks, then multiple instances will still be found in the output file -- all but one from the :DEFAULTFONT blocks.
Modifying the :FONTVALUE blocks to display the value returned by device function %font_number() showed that, if the :BOX and :UNDERSCORE blocks used different font names and if the :DEFAULTFONT blocks were numbered from "0" through "5", then the :UNDERSCORE block's font was associated with a value of "6" and the :BOX block's font was associated with a value of "7". When both the :BOX and :UNDERSCORE blocks used the same name, it was associated with a value of "6". So, the "selected fonts" consist of the :DEFAULTFONT blocks defined in the :DEVICE block plus a generated :DEFAULTFONT block for each distinct font name used with the :BOX or :UNDERSCORE block (if any). In the course of investigating other topics, it became apparent that the font style used for these generated :DEFAULTFONT blocks is "plain".
As the reference to :FONTVALUE blocks might suggest, the discussion so far reflects an artificiality in that it shows a specific order of :VALUE and :FONTVALUE blocks. As discussed here, these blocks may occur within an :INIT block in any order and in any number, and will be interpreted in the order in which they appear in the :INIT block. Henceforth, each :INIT block will be treated as an opaque unit.
The order of events on startup can be summarized as:
- Extract the information for the specified device from the binary device library.
- Interpret the START :PAUSE block.
- Interpret the START :INIT block.
- Find and open the document specification.
- Find and open any layout files, whether included from within the document specification or with the command line option LAYOUT.
- Process the layout.
- Begin formatting the document.
- Interpret the DOCUMENT :PAUSE block.
- Interpret the DOCUMENT :INIT block.
- Perform an implicit %enterfont(0) invocation.
The last step occurs whether device function %enterfont(0) is present or not. See %enterfont(0) for more information on this. Also see this section for additional items that appear in conjunction with %enterfont(0) and afterwards for which no good explanation exists.
Possible Future Research
The WGML Reference discusses the various values of the various place attributes in terms of when the function blocks are interpreted, and the terminology poses some questions that may need to looked at in the future. Consider this table, where the third column summarizes the event which causes the corresponding block to be interpreted:
Block Place Location :INIT START wgml starts processing the input source :INIT DOCUMENT wgml starts processing a document :FINISH DOCUMENT wgml finishes processing a document :FINISH END wgml finishes processing the input source :PAUSE START wgml begins processing the source input :PAUSE DOCUMENT wgml begins processing the document text :PAUSE DOCUMENT_PAGE the beginning of each document page :PAUSE DEVICE_PAGE wgml begins a new page on the output device
The last two lines will be discussed below.
It is an open question whether or not "the input source" and "the source input" refer to the same concept; it is quite likely that they do, although, technically, the first would refer to the stream from which the input is received, and the latter to the input itself.
It is also an open question whether or not "a document" and "the document text" refer to the same concept; and this case is less clear, since "a document" might refer to the "document specification", of the "document text" is but a part. Unless, of course, what is meant by "the document text" is "the document specification text".
Whether it is worth while to investigate these questions is anybody's guess at this point.
A more interesting question is the distinction between "the input source/a source input" and "a document/the document text".
As far as the :PAUSE and :INIT blocks are concerned, the sequencing above suggests a straightforward interpretation: "input" includes everything wgml 4.0 uses to produce the document, including all command line options, while "document" refers to a subset of the "input", that is, the document specification itself.
And it has long been known (see the discussion toward the end of this section) that, if an END :FINISH block is present, any DOCUMENT :FINISH block will be ignored by wgml 4.1 -- which implies that these two events occur at the same time.
So, since the order of events is known, it is not clear that this question needs to be further examined either. Only time will tell how important it is to investigate these issues.
Turning now to DOCUMENT_PAGE and DEVICE_PAGE, the distinction between "document page" and "device page" is also a question that may or may not require further study. The WGML Reference defines a document page in this way in Section 15.10.2.1 PLACE Attribute:
A document page is the amount of output that WATCOM Script/GML formats for a page in the document. The document page may be smaller or larger than the physical page produced by the output device. If the page being printed is both the document page and the device page, the document page pause block takes precedence over the device page pause block.
A good example of this difference can be seen by using device TERM: generally, the document pages are longer than the screens (device pages), and the pauses are written to reflect the difference between starting a new page and continuing the current page.
This was also observed using the very simple document specification described here. Each DEVICE_PAGE or DOCUMENT_PAGE :PAUSE block interpretation was paired with an interpretation of the :NEWPAGE block in the output file. When the header and footer were re-enabled, headers and footers only appeared in conjunction with a DOCUMENT_PAGE :PAUSE block.
A related topic is the apparent ability of device function %flushpage() to produce a footer/header when invoked within an :HTAB block, as noted here. This, of course, raises the possibility that each function block is interpreted on each pass, if only to catch page breaks introduced by device function %flushpage(). Ultimately, these questions lead to an exploration of how wgml 4.0 divides the document into pages, a topic which will have to be explored eventually, but for which no foundation has yet been laid.
Other aspects of the concept of "selected fonts", such as the use of the command line option FILE to associate font names and font styles with font numbers for which no :DEFAULTFONT block was defined in the :DEVICE block will need to be explored: do these become generated :DEFAULTFONT instances? How do they interact with any such instances resulting from the use of a font name with the :BOX block or :UNDERSCORE block? Other ways of pairing, in effect, :DEVICEFONT instances and :FONTSTYLE instances will also have to be explored for their effect on the number of :DEFAULTFONT instances used for a given document specification.
Outputting Lines
This section attempts to state the order in which the various function blocks are interpreted in the course of outputting a single line of text. At any given time, it should be more-or-less coordinated with the section on applying font styles.
Since our wgml will not only have to emit lines of text but also do so identically to wgml 4.0, the accuracy of this section will need to be constantly improved over the course of the project.
Hypothetical Structs
It has been noted elsewhere that parts of the code written so far necessarily depends on a model of wgml 4.0 which may prove to be, if not actually incorrect, less useful than alternate models which develop in the course of implementing our wgml. Nowhere is that more true than in this section. The structs developed here must not be considered final; they exist to serve two goals:
- To allow a test program to be written that can exercise the line-output function which the research here is expected to produce without having to actually do the work wgml 4.0 does in document formatting and layout; and
- To illustrate the sort of information that our wgml will have to acquire and store, albeit probably not in these exact structs.
That is to say, while the planned line output function will use these structs as input, our wgml may well break this function up and distribute its parts into the parts of wgml that do page and line layout, where the information needed is available as a result of the layout process itself, rather than pack all that information into an actual struct and then pass it to this exact function. Then again, something like these structs may turn out to be something wgml needs in computing the layout anyway. At this point, it is not possible to say.
A line of text to be output will be represented by these structs:
struct TextLine {
int32_t y_start;
uint16_t count;
TextSegment segment[count];
}
struct TextSegment {
uint16_t font;
uint16_t count;
TextChars text[count];
}
struct TextChars {
int32_t x_start;
uint16_t repetitions;
uint16_t count;
uint8_t chars[count];
}
The positional fields are signed in case they need to be used with negative numbers, as for relative positioning or for devices that position the print head relative the bottom or right side of the page. This may turn out to be unnecessary; however, most 32-bit attribute values are limited to $7FFFFFFF, that is, to the positive values of an int32_t and that suggests that negative values may occur within wgml 4.0 in some contexts.
These fields are presumed to be used in this way:
- TextLine.y_start encodes the position on the page of the line on which the text is to appear. Note the distinction between "the line on which the text is to appear", which is a number, and "the text line", which is the text which is to appear.
- TextLine.count contains the number of TextSegment instances (each of which encodes a segment or text segment).
- TextLine.segment is an array of TextSegment instances, one for each segment in the text line to be output.
- TextSegment.font encodes the font number, which, for now at least, is taken to be a binary :DEFAULTFONT block and so to specify a :FONTSTYLE instance, a :DEVICEFONT instance and (through the :DEVICEFONT instance) a :FONTPAUSE instance and a :FONTSWITCH instance.
- TextSegment.count contains the number of TextChars instances in the text segment.
- TextSegment.text is an array of TextChars instances, one for each group of one or more non-space characters.
- TextChars.x_start encodes the position on the line of the first character in the group of non-space characters.
- TextChars.repetitions contains the number of times the group of non-space characters is to be output.
- TextChars.count contains the number of characters in the group of non-space characters.
- TextChars.chars contains the group of one or more non-space characters.
The TextChars struct is a bit misleading in that it implies that all characters are allowed. At this time, I don't know how true that is, as it depends on what wgml 4.0 accepts as input; however, the reason a count is used instead of a char * to a null-terminated string that I expect the value usually to be a pointer into a buffer containing (potentially) the entire text of the document, which will not consist of null-terminated strings. In these cases the value of the field repetitions would be "1". This would allow the text to be processed to be separated, at least conceptually, from the structure encoding how it is to be output.
In some cases, the TextChars field chars would point at a single character, such as one of the boxing characters or the underscore character. In this case, the field repetitions might well contain a value larger than "1", indicating that the character is to be printed multiple times.
In the WGML Reference section 9.36 FIG, one of the values for attribute frame is "'character string'". If specified, this string is used to create the rule lines. A brief test confirms that this is used as might be expected: first the pattern is repeated as many times as needed to produce the rule line, and, if the length of the rule line is not a multiple of the length of the character string, then enough characters from the start of the character string are used to finish the rule line.
This could be accomplished with two TextChars instances: the first with the field chars pointing to the start of the character string, the field count containing the length of the character string, and the field repetitions containing the number of times it needs to be printed out, the second with the field count containing the number of characters needed to complete the rule line, and the field repetitions containing the value "1".
Positioning Model
This material reflects how the situation appeared at the time it was last updated. It has become clear that additional research will be needed to produce a correct model of how positioning is tracked and directed, that is, a model which produces the same output as produced by wgml 4.0.
The term current position will be used for the location specified by values returned by the device functions %x_address() and %y_address(). Examination of the test output clearly shows that these values (in particuler, the value returned by device function %x_address()) always indicate where the next character to be output will appear. These items are updated by every action that changes the current position.
The term desired position will be used to indicate the position to which the print head needs to be moved. The model for this is:
- The vertical value is given in field TextLine.y_start.
- The horizontal value is given in field TextChars.x_start for each TextChars instance. For each TextSegment instance, the value for the first TextChars instance in that TextSegment is used. For the TextLine as such, the value in the first TextChars instance in the first TextSegment is used.
- Various items are used to position the print head. Two of these, such as :ABSOLUTEADDRESS and :HTAB, are function blocks; one of them, %dotab(), is a device function; and an ordinary function must exist, at least conceptually, which emits a given number of spaces.
- The items used to position the print use the current position (that is, the values returned by the device functions %x_address() and %y_address()) to determine where the print head is currently and the values of fields TextLine.y_start and TextChars.x_start (for the appropriate TextChars instance) to determine the desired position.
- During the execution of an item used to position the print head, the values returned by the device functions %x_address() and %y_address() may not be correct; however, both before and after the print head is positioned, these values are always correct.
The Device Model
The device model that works best in describing how wgml 4.0 outputs a text line that of a dot matrix printer. Since the Open Watcom documentation system produces help files with the WHELP device and PDF files with the PS device (in both cases, after some post-processing by other software), the actual output consists of disk files. While the following is probably obvious to everyone, just to be clear, here are the terms used and their meanings when used here:
- The phrase print head refers to the position where the next character output will appear in the final product, that is, to the "current position" as defined above.
- The term pass in this context refers to passes of the print head over the same line on the paper. Here it is used to refer to the processing required specifically for each such pass.
The terminology will be regularized eventually but, for now, a certain flexibility will be noticed. For example, rather than being a synonym of "print head", "current position" will probably become a shortened form of "current position of the print head".
The Sequence
The sequence, at this point, appears to be:
- If no :ABSOLUTEADDRESS block is available, use the :NEWLINE block(s) available to position the print head to the start of the correct line for the current pass and update the current position.
- Identify the segments which have a :LINEPROC for the current pass.
- Merge segments as appropriate.
- Apply the font style to each segment in turn.
Identifying The Segments
For each pass, the first step is to identify the text segments which are to be included in that pass. It should be kept in mind that the :LINEPROC blocks are numbered from "1" to the number of passes required by the enclosing :FONTSTYLE block: if a :FONTSTYLE has any :LINEPROCs, it has a :LINEPROC for pass 1.
For the first pass, every text segment is considered to have a :LINEPROC, either one which it defines, or, if the implicit %textpass() is to be used, this :LINEPROC:
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
For the remaining passes, only those TextSegment instances associated with a :FONTSTYLE block with a :LINEPROC for that pass are included.
Merging Segments
While researching the font switch sequence, an interesting phenomenon was discovered: two segments, with the same font but different font styles ("plain" and "bold"), were merged and treated as a single segment: no font switch, no :FONTSTYLE block :STARTVALUE block.
Both font styles, on their first pass, just did a %textpass() ("bold" doing it explicitly, "plain" doing it implicitly, as discussed here). What needs to be examined is whether or not this only applies to :LINEPROC block :STARTVALUE blocks, whether or not this only applies to function blocks which do nothing but %textpass(), and whether or not the function of the lp_flag (since it is set for :LINEPROC block :STARTVALUE blocks containing only %textpass()) has been discovered.
For now, each pass is presumed to begin with merging segments as appropriate. This will be clarified further.
Notes on :NEWLINE
When a :NEWLINE block with the value "2" for attribute advance was provided, it was used between the paragraphs (the default layout presumably specifies this). This both confirms that wgml 4.0 will make use of the various :NEWLINE blocks and that positioning the print head for the next paragraph is part of this sequence, controlled by the value of TextLine.y_start.
Every :NEWLINE block is presumed to return the print head to its leftmost position. As a result, after any :NEWLINE block has been interpreted, the value returned by the device function %x_address() must be set to "0" to reflect the true position of the print head. However, the test files show that, on pass 1, the value returned by the device function %x_address() is already "0". Thus, on the first pass, these rules appear to apply:
- Some combination of :NEWLINE blocks will be used, as needed, to move the print head from the current position to the leftmost position of the line specified in TextLine.y_start.
- If there is no need to change the print head position (the current position and the desired position are identical), then no :NEWLINE block is interpreted at all.
The use of no :NEWLINE blocks has been seen in the test file output on the first line (whether of the body or of an :H0. heading) since the use of the :NEWLINE blocks discussed here does indeed leave the device at the start of the correct line for the first line of text output.
On the second or subsequent pass, the value returned by the device function %x_address() is not "0" but rather the position after the last character output on the current line. In this case, a :NEWLINE 0 is used if available to reposition the print head to the start of the current line. If a :NEWLINE 0 is not provided, then the required :NEWLINE 1 will be used, and the value returned by the device function %x_address() is changed to "0" subsequently.
Interpreting what is happening here is not easy: on the first pass, the value returned by the device function %x_address() is "0" within the :LINEPROC block, but on the second pass it is not "0" until the next block is interpreted. Indeed, a more detailed examination of when these values change will eventually be needed.
Future Research
Using the default layout, which produces a single-spaced document, the line numbers, as reported by device function %y_address() were successive multiples of "2" whether :ABSOLUTEADDRESS was available or not. When the layout was altered to specify a double-spaced document, the first non-zero line number was still "2", but after that they were incremented by "4". Curiously, the number of lines skipped over by the various :NEWLINE blocks was correct: one for single-spaced, two for double-spaced. An :ABSOLUTEADDRESS block written to use this value as-is would seem to have incorrect line spacing. This may depend on the test devices, that is, it may be an artifact that does not affect real devices.
The value of the :DEVICE block attribute vertical_base_units was "6"; the value of the :FONT block attribute line_height was "1": could wgml 4.0 be trying to fit three lines into one inch?
Applying Font Styles
This section deals with the sequence used by wgml 4.0 to output each segment for the current pass. The discussion of the :FONTSTYLE block found here deals with the how the block is used.
In researching this topic, the :FONTSTYLE block :LINEPROC blocks were set up to indicate when the various sub-blocks were interpreted. These indicators started and ended with a : character and included a two-letter abbreviation of the sub-block name, the pass number, and a one to three letter abbreviation of the font style name. These indicators made it very clear where each block of each pass of each style was being interpreted.
The Single TextSegment Sequence
This sequence describes how to output a single TextSegment instance. The sequence appears, at present, to be:
- If a font switch is required, do one. If not, interpret the :FONTSTYLE block :STARTVALUE block.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block.
- Process each TextChars instance.
- Interpret the :LINEPROC block :ENDVALUE block.
If any of the blocks interpreted in the first three steps (including all blocks interpreted as part of a font switch) include device function %dotab() and certain other conditions are met, the initial horizontal positioning will be done during that step.
At this level, a font switch is "required" whenever the font number for the segment to be processed differs from the current font number. The font switch sequence determines whether an actual "font switch", in sense of actually interpreting any :FONTSWITCH block sub-blocks, is needed.
The TextChars Sequence: One TextSegment
When the TextLine contains exactly one TextSegment, then for each TextChars instance, the sequence is:
- The :LINEPROC block :STARTWORD block is interpreted. If this block includes device function %dotab(), then the horizontal positioning will occur during the interpretation of this block.
- The print head is positioned horizontally to the first non-space character in the current TextChars instance (if an explicit or implicit %textpass() was present in the :LINEPROC block :STARTVALUE block and it has not previously been done as a result of device function %dotab() being encountered in certain prior blocks).
- The text pointed to by the current TextChars instance is then printed out (if an explicit or implicit %textpass() was present in the :LINEPROC block :STARTVALUE block).
- The :LINEPROC block :ENDWORD block is interpreted.
When the implicit %textpass() is used, there are no :LINEPROC blocks. The result is entirely consistent with the sequences shown with nothing appearing when the :LINEPROC blocks are interpreted; that is, the text appears but nothing else does.
Testing was done with three :LINEPROC blocks, identical except for the value of n (see the notes on the test setup above). The results showed that each pass was treated identically, except, of course, that the :LINEPROC sub-blocks were from the appropriate pass.
Testing was done using a "plain" :FONTSTYLE block with a :LINEPROC 1 block with no device function %textpass() in its :STARTVALUE block. The results were entirely consistent with the above, except that neither the horizontal positioning nor the text appeared. Each TextChars instance was clearly marked by a :STARTWORD block/:ENDWORD block pair.
The TextChars Sequence: Multiple TextSegment Instances
When a test file using :HP1. tags was used, with :DEFAULTFONT 0 using font style "plain" and :DEFAULTFONT 1 using font style "bold" was processed by wgml 4.0, different patterns of multi-TextSegment TextLines were generated. The font style "bold" used was the overprint "bold" :FONTSTYLE block with the test markers described above so that the position of each :LINEPROC sub-block could be verified.
The sequence given above still applies; however, examination of these lines revealed some interesting additional features:
- it is possible for a TextChars instance to contain only horizontal positioning information;
- the initial horizontal positioning on pass 1 usually occurs in a manner requiring knowledge of more than one TextLine;
- the :FONTSTYLE block :ENDVALUE block is sometimes, but not always, emitted during the second pass outside of the font switch process.
- the :LINEPROC block :STARTWORD block is sometimes, but by no means always, not interpreted before the current TextChars instance. This, of course, modifies the first step of the sequence given above.
Empty TextChars Instances
When an :HP1. phrase begins (as is only natural) with a non-space character, then the spacing before that character becomes a TextChars instance belonging to the previous TextSegment. This TextChars instance points to no text: only the field x_start contains valid information. Presumably, the field count contains "0" and the field chars is a NULL pointer. This produces one or more spaces surrounded by the :STARTWORD and :ENDWORD blocks (all observed instances were the second or subsequent TextChars instance in the TextSegment instance, so spaces were used without regard to the availability of an :ABSOLUTEADDRESS block or an :HTAB block).
In terms of the model being used, this can be presumed to be done by the page layout code of wgml 4.0 and presented to the text output code with the empty TextChars instances already present in the TextSegment instances affected.
Initial Horizontal Positioning
By "initial horizontal positioning" is meant the use of an :ABSOLUTEADDRESS block, an :HTAB block, or spaces to skip over the left margin plus any indentation and position the print head properly for the first non-space character of the first TextChars instance of the first TextSegment whose associated :FONTSTYLE block has a :LINEPROC defined for the current pass.
During testing, it appeared that this was sometimes done using a a different :DEFAULTFONT instance from that of the first TextSegment (for the first pass) or the first TextSegment using font style "bold" (for the second pass). Further investigation, including the use of "uscore" (for variety) on the very first word of text, suggest that these are the rules:
- The first text line uses, on the first pass, the :DEFAULTFONT of the first TextSegment.
- All subsequent text lines use, on the first pass, the :DEFAULTFONT of the last TextSegment in the preceeding TextLine.
- All text lines use, on the second pass, the :DEFAULTFONT of the first TextSegment which uses a font style which has a :LINEPROC defined for that pass.
In terms of the model being used, the correct :DEFAULTFONT to use can be presumed to be determined by the page layout code of wgml 4.0 (which, after all, deals with text lines not passes and so "sees" all the TextSegments, not just those with :LINEPROC blocks for a particular pass) and presented to the text output code with the appropriate initial TextSegment instance already present in the TextLine provided.
The :FONTSTYLE Block :ENDVALUE Block
As discussed here, this block never appeared in the output file produced by wgml 4.0 from a text file using only font style "plain", but, when the text file using both "plain" and "bold" was used, it appeared in several contexts in addition to a font switch:
- In a line with plain-bold-plain TextSegments, on the second pass, that is, when the TextSegment using "bold", while the last to produce output on that pass, was not the last TextSegment in the TextLine. In this case, the :LINEPROC :ENDVALUE block appeared next, followed by the normal sequence for the next text line, including a complete font switch and so another interpretation of the same :FONSTYLE block :ENDVALUE block (but not of the :LINEPROC block :ENDVALUE block).
- In a line with bold-plain-bold TextSegments, on the second pass, after the first (but not the second) TextSegment using font style "bold"; the :LINEPROC block :ENDVALUE block did not appear. The :FONSTSYLE :STARTVALUE block appeared next (as per the above procedure: since both are :DEFAULTFONT 1 no font switch is needed), followed by the second TextSegment using font style "bold".
- In the subsequent line, which was bold-plain with the TextSegment using "bold" being, in the source, part of the same :HP1. phrase as the last TextSegment of the preceding line, the TextSegment using font style "bold" was followed, on the second pass, by the :FONTSTYLE block :ENDVALUE block, and then the :LINEPROC block :ENDVALUE block. As above, what followed was the next text line being output.
If the second example is considered, then this makes a certain amount of sense, and was what suggested that the :FONTSTYLE block :STARTVALUE block and :ENDVALUE block might be an ON/OFF switch and that it is the omission of the :ENDVALUE block which might be the anomaly. The :LINEPROC block :ENDVALUE block is equally inexplicable under either interpretaion.
Implementation Notes
There appear to be three choices in implementing this behavior:
- Implement the behavior found: the :FONTSTYLE block :ENDVALUE block is translated at the end of each TextSegment except when that TextSegment is the last TextSegment in the TextLine provided for processing.
- Restrict the translation of the :FONTSTYLE block :ENDVALUE block to the font switch sequence.
- Enhance the first step in this sequence so that, when a font switch is not required, the :FONTSTYLE block :ENDVALUE block of the former font style is interpreted before the :FONTSTYLE block :STARTVALUE block of the new font style. The idea here is to make these blocks function as an ON/OFF switch; this would also require the :FONTSTYLE block :ENDVALUE block to be interpreted immediately before the :FINISH block.
Very few :FONTSTYLE blocks use the :FONTSTYLE block :STARTVALUE and :ENDVALUE blocks, and none of them are used in the Open Watcom document build system. So, in a sense, it does not matter which implementation is adopted, as long as our wgml is only used within the Open Watcom build process. If it is released more widely, it might matter. The available examples all appear to expect these blocks to function as an ON/OFF switch, so it might be reasonable to adopt the third option (that is, to treat the non-pairing of these blocks as a wgml 4.0 bug to be fixed in our wgml) and then reconsider if problems are reported by users.
The :LINEPROC Block :STARTWORD Block
Here we have a possible bug in wgml 4.0: the :STARTWORD block is not interpreted in some contexts where it is expected to be. Some examples of these contexts follow. Note that the :ENDWORD block always appeared after the text (or, if there was no text, after the positioning) for the given TextChars instance.
For a line with the pattern plain-bold-plain, this was observed:
- On the first pass, the first TextChars instance in the TextSegment using font style "bold" and the first TextChars instance in the second (but not the first) TextSegment using font style "plain" was not preceded by :STARTWORD.
- On the second pass, the first TextChars instance in the TextSegment using font style "bold" was not preceded by :STARTWORD.
For a line with the pattern bold-plain-bold, this was observed:
- On the first pass, the initial horizontal positioning was not preceded by :STARTWORD.
- On the first pass, for the remaining TextSegment instances, whether using font style "plain" or font style "bold", the first TextChars instance (only) was not preceded by :STARTWORD.
- On the second pass, the horizontal positioning between the two TextSegments using font style "bold" was not preceded by :STARTWORD.
In the source code, the second "bold" TextSegment was actually the first word of a two-word :HP1. block. The next text line started with the second word in this block, followed by a single "plain" TextSeg. The result was interesting:
- On the first pass, the first TextChars instance of the TextSegment using font style "plain" was not preceded by :STARTWORD.
- On the second pass, the initial horizontal positioning was not preceded by :STARTWORD.
Instances where :STARTWORD does appear included:
- On the second or subsequent TextChars in any TextSegment of any TextLine on any pass.
- On the first pass of the plain-bold-plain line, before the initial horizontal positioning.
- On the second pass of the bold-plain-bold line, before the initial horizontal positioning.
- On the first pass of the bold-plain line, before the initial horizontal positioning.
Testing to see whether this is a bug, that is, whether or not a device can successfully use an output file produced by wgml 4.0 when :STARTWORD/:ENDWORD is being used as an ON/OFF switch, will have to be done eventually.
Implementation Notes
If this is a bug, our wgml should fix it. Even if it is not a bug, our wgml should probably be written to always use :STARTWORD at the start of each TextChars instance, unless this causes problems with the Open Watcom documentation system.
Of course, if wgml is released more widely and this causes problems for users, then the behavior of our wgml can be altered as needed -- unless this is a bug which we have fixed, in which case a good explanation of why it makes sense to behave as wgml 4.0 does would be needed.
Supplemental Tests
Testing was done with font style "bold" modified to use three :LINEPROC blocks, identical except for the value of n (see the notes on the test setup above). The results showed that the third pass was treated identically to the second pass, except, of course, that the :LINEPROC sub-blocks were from the appropriate pass. Thus, wherever "second pass" is mentioned, the same remarks can be taken to apply to all subsequent passes as well.
Testing was done using a "plain" :FONTSTYLE block with a :LINEPROC 1 block with no device function %textpass() in its :STARTVALUE block. This affected the first pass only: the TextSegment instances using font style "plain" were processed normally, except that neither the horizontal positioning nor the text appeared. Each TextChars instance was clearly present, marked by the :ENDWORD block (the :STARTWORD block appeared or not as indicated above).
This is, of course, completely different from the behavior on the second pass, where the entire TextSegment using font style "plain" is skipped by either spaces or, if available and more than eight spaces would be needed, the :HTAB block.
Removing device function %textpass() from the first pass, second pass, or both, :LINEPROC blocks of the overprint "bold" :FONTSTYLE produced the same effect: the TextSegment instances using font style "bold" worked normally on the affected pass, except that neither the horizontal positioning nor the text appeared.
Horizontal Positioning
Horizontal positioning is done using two different procedures: one if it results from the use device function %dotab() ("dotab-initiated horizontal positioning") and the other when it does not result from the use of device function %dotab() ("non-dotab-initiated horizontal positioning").
Horizontal positioning only occurs when needed. Even when :ABSOLUTEADDRESS (which could presumably be used multiple times without causing a problem) is available, if the current location equals the desired location, no horizontal positioning will occur. The remainder of this section applies when horizontal positioning is needed.
Device function %dotab() will produce horizontal positioning whenever encountered in these blocks:
- the :FONTSTYLE block :STARTVALUE block
- the :LINEPROC block :STARTVALUE block
- the :LINEPROC block :FIRSTWORD block
- the :LINEPROC block :STARTWORD block
Device function %dotab() will produce horizontal positioning in these blocks under certain conditions:
- the :FONTSWITCH block :STARTVALUE block
- the :FONTSWITCH block :ENDVALUE block
- the :FONTSTYLE block :ENDVALUE block
The conditions are not entirely clear at this point. The observed characteristics are:
- this is the first pass of the current line;
- the prior pass was a second pass of the previous line;
- an "extra" instance of the :FONTSTYLE block :ENDVALUE block occurred; and
- the last TextSegment in the prior pass was not the last TextSegment in the first pass of the previous line.
The last two conditions, of course, are, for wgml 4.0, identical: wgml 4.0 interprets the :FONTSTYLE block :ENDVALUE block of the last TextSegment in a second pass precisely when that TextSegment was not the last segment in the first pass of the TextLine.
Device function %dotab() was never observed in the test files to produce horizontal positioning in these blocks, possibly because it was not needed under the conditions tested:
- the :LINEPROC block :ENDWORD block
- the :LINEPROC block :ENDVALUE block
The :LINEPROC block :ENDWORD block does sometimes appear immediately after horizontal positioning (in those cases where the TextChars instance has no non-space characters associated with it); however, since the procedures used for dotab-initiated horizontal positioning and non-dotab-initiated horizontal positioning differ, it was possible to determine that these cases were not produced by device function %dotab().
Horizontal positioning initiated by device function %dotab() appears, at present, to follow these rules:
- If the :ABSOLUTEADDRSS block is available, it is used.
- Otherwise, spaces are used.
Horizontal positioning not initiated by device function %dotab() appears, at present, to follow these rules:
- If the :ABSOLUTEADDRSS block is available, it is used for the initial horizontal positioning (left margin plus indentation).
- Otherwise, if the :HTAB block is available, it is used if the number of spaces needed would be greater than eight.
- Otherwise, spaces are used.
These are quite different. Thus, if the :ABSOLUTEADDRESS block is available, it will be used between TextChars instances if the horizontal positioning results from device function %dotab(); otherwise, it will only be used before the first TextChars instance in the TextLine. It is this difference that made it clear that device function %dotab() in the :LINEPROC block :ENDWORD block was not, in fact, responsible for the horizontal positioning (spaces) immediately before the marker of that block in the file: had %dotab() been responsible, the :ABSOLUTEADDRESS block would have appeared instead of spaces. Also, the :HTAB block, even if available, is never used for horizontal positioning if it results from device function dotab(); otherwise, it is used as indicated above.
In this context, the term "indentation" includes at least these cases:
- The layout specifies an indentation for the given line (for example, the first line of a paragraph may have an indentation specified).
- On the second pass, the horizontal spacing from the left margin to the start of the first TextSegment using a font style which has a :LINEPROC for that pass is treated as an indentation.
Other contexts may exist.
Switching Fonts
Initial observations suggested that the sequence for a font switch would not be hard to determine:
- Interpret the :FONTSWITCH block :ENDVALUE block for the :DEFAULTFONT instance being switched from.
- Interpret the :FONTPAUSE block for the :DEFAULTFONT instance being switched to.
- Interpret the :FONSTSWITCH block :STARTVALUE block for the :DEFAULTFONT instance being switched to.
As a result, it was believed that the real issue here would come from the behavior observed when working on the :FONTPAUSE block, where, in certain situations, only the :FONTPAUSE was interpreted and, in other situations, none of the three function blocks was interpreted, even when all three existed. However, the sequence shown turned out to incomplete, which became another issue.
The WGML Reference states in part in Section 15.9.11.2 STARTVALUE Section:
When a switch between two fonts is necessary, the startvalue sections of the two fonts are evaluated. The font switch is only performed if the results of the two evaluations are different.
The verb "evaluate" is used in connection with a function block in the WGML Reference in several locations (most of them discussing the meaning of the allowed values of the various attributes place, but also in discussing the interpretation of the :FONTVALUE blocks found in :INIT blocks). In these cases, "evaluating the function block" clearly results in the expected text appearing in the output file. Here, however, it appears to mean "save the results of intepreting each :FONTSWITCH :STARTVALUE block in a temporary buffer and compare them".
To investigate the meaning of "evaluate" here, the following items will be varied:
- Same :DEVICEFONT instance (and, therefore, same :FONTSWITCH block and :FONTPAUSE block) versus different :DEVICEFONT instances.
- Same :FONTSTYLE instance versus different :FONTSTYLE instances.
- A :FONTSTYLE instance with no function blocks versus a :FONTSTYLE instance with function blocks present.
- A uniform :FONTSWITCH block :STARTVALUE block versus a variform :FONTSWITCH :FONTSWITCH block :STARTVALUE block.
and here are some notes on those items:
- The :FONTSTYLE is involved because the fonts switched between are :DEFAULTFONTs, which associate a :DEVICEFONT with a :FONTSTYLE. Also, this was a potential area where wgml 4.0 might have used the 21 flags discussed here; since the flags are associated with the :STARTVALUE blocks of both :FONTSWITCH and :FONTSTYLE blocks, including the :FONTSTYLE blocks was considered prudent.
- "Uniform" means that the block produces the same output regardless of which :DEFAULTFONT instance it is used with; "variform" means that the block produces different output depending on which :DEFAULTFONT instance it is used with.
The basic test framework is set up so that each :DEVICEFONT associates a unique :FONTPAUSE and :FONTSWITCH with the font it names, and so that each :DEFAULTFONT associates a different :FONTSTYLE with each of the :DEVICEFONTs named in a :DEFAULTFONT. This was enhanced by having each :FONTPAUSE increment a symbol and having each :FONTPAUSE block and the :STARTVALUE block of each :FONTSWITCH block and each :FONTSTYLE block print it out as an "Instance" number. This allowed the :FONTPAUSE, :FONTSWITCH, and :FONTSTYLE blocks to be associated with each other unambiguously. The file "default.opt" was used to set up each test case, using two FONT options.
The test cases were:
- Test Case A:
( font 0 tfon01 plain 9.0 9.0 ( font 1 tfon02 bold 9.0 9.0
which, so far as font name and font style go, is identical to the unaltered test setup.
- Test Case B:
( font 0 tfon01 bold 9.0 9.0 ( font 1 tfon02 bold 9.0 9.0
that is, different fonts, the same font style, and the :FONTSTYLE block does contain function blocks.
- Test Case C:
( font 0 tfon01 plain 9.0 9.0 ( font 1 tfon02 plain 9.0 9.0
that is, different fonts, the same font style, and the :FONTSTYLE block does not contain function blocks
- Test Case D:
( font 0 tfon02 plain 9.0 9.0 ( font 1 tfon02 bold 9.0 9.0
that is, identical fonts, different font styles.
- Test Case E:
( font 0 tfon02 bold 9.0 9.0 ( font 1 tfon02 bold 9.0 9.0
that is, identical fonts, the same font style, and the :FONTSTYLE block does contain function blocks.
- Test Case F:
( font 0 tfon02 plain 9.0 9.0 ( font 1 tfon02 plain 9.0 9.0
that is, identical fonts, the same font style, and the :FONTSTYLE block does not contain function blocks.
And the results with uniform :STARTVALUE blocks were:
Test Case Nr of :FONTPAUSEs Nr of :FONTSWITCH :STARTVALUEs
A 11 11
B 11 11
C 9 9
D 11 7
E 11 7
F 9 9
For test cases C and F, the "missing" :FONTPAUSE instances are the last two, and they are not present because font style "plain" only requires one pass, while font style "bold" requires two, and the second pass requires a switch first from tfon01 to tfon02 and then a switch back to tfon01 again. The section on outputting entire lines discusses this form of "pass" in more detail in relation to sequencing.
For test cases D and E, the "missing" :FONTSWITCH block :STARTVALUE blocks were those associated with the last four instances, which would have been from tfon02 to tfon02 to print the first pass and then the second pass of the font style "bold".
These results are consistent with the statement above that wgml 4.0 evaluates the :STARTVALUE blocks of both :FONTSWITCH blocks and does not make the switch if they produce identical results. The fact that :FONTPAUSE blocks can be interpreted even when no corresponding :FONTSWITCH blocks are shows that wgml 4.0 considers a font switch to exist in a sense that triggers the :FONTPAUSE even when it does not occur in the sense of interpreting the :FONTSWITCH function blocks.
Since the :STARTVALUE blocks, when interpreted, were producing output like this:
>SW01 Instance: 9 >SW02 Instance: 10
it might be wondered in what sense they were "uniform". However, since the :ENDVALUE block is interpreted before the :FONTPAUSE block, and since both the :ENDVALUE block and the :STARTVALUE block are missing when the font switch, in some sense, does not occur, it seems likely that, at the time the two :STARTVALUE blocks are compared, the Instance number is the same for both blocks and so they would, in fact, have identical outputs. Since the Instance number is taken from a symbol, they would also have identical compiled forms and the 21 flags would be identical since the same device functions would be present.
If the :STARTVALUE block emits, for example, the value returned by %font_outname1(), so that the device can select the proper font, then the same :STARTVALUE block would emit different values for different fonts even if the compiled code and the 21 flags were identical. It would make very little sense for wgml 4.0 to compare the output of such a :STARTVALUE block using the same font each time it is "evaluated", so it seems likely that wgml 4.0 will, in fact, compare the :STARTVALUE block(s) with the font(s) they are associated with. So, if the result of %font_number() were included in the output, it should cause the output to be different while the compiled code (and the 21 flags) are identical.
At this point the test framework was enhanced in two ways:
- The result of device function %font_number() was output.
- The :ENDVALUE blocks of the :FONTSWITCH and :FONTSTYLE blocks were modified to print out the result fo %font_number() and the instance number. Redoing the last three test cases produced these results:
Test Case Nr of :FONTPAUSEs Nr of :FONTSWITCH :STARTVALUEs
D 11 11
E 11 11
F 9 9
which shows that wgml 4.0 does, indeed, identify a difference in the output of a :FONTSWITCH block :STARTVALUE block which is not detectable otherwise.
The output files produced also revealed a few additional bits of information, and the actual sequence of events in switching a font can now be described, starting with a high-level view of the complete procedure:
- Change the value reported by %font_number() to that of the new font.
- Determine whether or not the output of the :STARTVALUE block(s) of the :FONTSWITCH block(s) is identical.
- Perform the switch-from procedure (given below).
- Perform the switch-to procedure (given below).
The "switch-from" and "switch-to" procedures are separated out because device function %enterfont() (explicit or implicit) does the switch-to procedure without a switch-from procedure.
If any of the blocks shown contain device function %dotab(), then the horizontal positioning may occur when they are interpreted, as discussed here.
This is the "switch-from" procedure:
- Interpret the :FONTSTYLE block :ENDVALUE block for the :DEFAULTFONT instance being switched from.
- Interpret the :FONTSWITCH block :ENDVALUE block for the :DEFAULTFONT instance being switched from.
The second step will only occur if the comparison referred to above shows that it is necessary to actually use the :FONTSWITCH block :ENDVALUE block.
This is the "switch-to" procedure:
- Interpret the :FONTPAUSE block for the :DEFAULTFONT instance being switched to.
- Interpret the :FONSTSWITCH block :STARTVALUE block for the :DEFAULTFONT instance being switched to.
- Interpret the :FONTSTYLE block :STARTVALUE block for the :DEFAULTFONT instance being switched to.
The second step will only occur if the comparison referred to above shows that it is necessary to actually use the :FONTSWITCH block :STARTVALUE block.
The :FONTSTYLE sub-blocks were included based on these observations:
- When the :FONTSTYLE block :ENDVALUE block is interpreted, the font number used is the number of the :DEFAULTFONT instance being switched to. Thus, it is clearly interpreted after the value returned by device function %font_number() has been updated.
- The :FONTSTYLE block :STARTVALUE block that follows the :FONTSWITCH block :STARTVALUE block as part of the implicit %enterfont(0) following the DOCUMENT :INIT block is followed by another instance of the same :FONTSTYLE block :STARTVALUE block before any actual text output occurs.
At this point, including them in this sequence appears to make more sense than attempting to include them elsewhere.

