System Symbol Notes
From Open Watcom
This page contains notes on specific system symbols or sets of system symbols which came to my attention while working on other topics, which appeared to be worth exploring, and which turned out to be worth recording information on.
As such, it is very much a set of preliminary notes.
Line and Page Numbers
This page originated in the macro topsect, discovered while determining whether the tab stops established with control word .tb used non-default alignment or fill char values, which contained these lines:
.if &syslc. gt 3 .do begin . .in . .tb 1 _/&syscl. . .tb set $ $$ . .tb set . .tb .do end
This has the effect of drawing a continuous line of underscores from column 1 to the right boundary of the column (syscl), but only if there are more than three lines left (syslc) on the page. For a one-column page, of course, the value of syscl is the right margin of the page.
The core of the translation of text from input to output is a function which takes, as its input, a buffer of text and which produces, as it output, lines of output text as they are filled in or as breaks are encountered. There is no easily-predictable relationship between input lines and output lines, so it would seem that syslc would have to be set each time a line is output. However, since macro substitution is done before the resulting text is submitted, it would seem that syslc would not be reliable in our wgml: this appeared to be a potentially serious limitation on how our wgml could process text.
Examination of the list of system set variables in Waterloo SCRIPT revealed several in the same section that dealt with page numbers. Although page numbers should be less of a problem, since a new page is marked by printing the banners, which is where page numbers would generally appear, it is possible for an output paragraph to be split over a page boundary, potentially producing problems where a system variable has the wrong value when processed because the page number changes before the text containing it is printed.
Both concerns turned out to be non-existent: in wgml 4.0, neither syslc nor the page number symbols are handled in a way that would cause the problems expected. Instead, they pose different problems, or, better perhaps, different puzzles.
Page Numbers in Script and WGML
A distinction exists between two sets of page number system symbols:
sysapage syspage sysppage syspn
which can be called the script group, and
syspgnuma syspgnumad syspgnumc syspgnumcd syspgnumr syspgnumrd
which can be called the wgml group.
The difference appears when the :BANREGION block is used:
- When a member of the script group is used in the contents attribute of a :BANREGION block, a GP-fault results.
- When a member of the wgml group is used in the contents attribute of a :BANREGION block, wgml 4.0 replaces it with the current page number in the specified format.
The following remarks apply to both groups and the the value returned by device function %pages() as well.
Line Number Symbols
There are others, but the ones that concern us here are
syslc sysline syslst
The simplest is syslst: this has the value "0" at the start of each pass. It has the value "1" after each output line. However, it is reset to "0" under conditions which are not clear:
- It is not reset after a blank line, even when "script" (which treats blank lines as breaks) is in effect, when .co is "on". With .co "off", results vary with the test file used.
- It is not reset after at least some control words that do not cause a break, even when the rest of the line is empty.
- It is reset at the end of at least some sections (after :eTITLEP, for example).
- It is reset after at least some tags that cause a break, if the rest of the line is empty.
- It is reset after at least some command words that cause a break, if the rest of the line is empty.
- It is not reset after at least some control words that cause a break, even when the rest of the line is empty.
Testing was very limited: .ul (value not reset), :P (value reset), :FIG. (which reset the value to "0" and then caused wgml 4.0 to hang) (value reset), .br (value reset), .sk (value not reset). If this symbol ever needs to be implemented, more testing will be needed. It is possible that this behavior contains information about how wgml 4.0 processes text, although it appears that it would be more effort to extract such information than it is likely to be worth.
The symbols syslc and sysline are more interesting. They are described in this way in the Waterloo SCRIPT documentation:
SYSLC The Number of lines remaining in the current column. Unprinted footnotes and keeps are ignored. Value is '0' at top of page. SYSLINE The current output line number on the page now being formatted.
These initial observations still appear to be worth making:
- The same values, generally speaking, appear in both the PS device and in character devices: the values displayed are not given in vertical base units.
- The values change when the LPI option is used.
- The following relation holds on each line of output:
&syslc. + &sysline. ≈ N
- The value N is the same whether there is a banner at the top of the page or not.
- The value N is related to the value of the attribute depth of the :PAGE block of the :LAYOUT block and the LPI value:
N ≈ value(depth) * LPI
The sum of the values of syslc and sysline is only said to be approximately equal to N because it can be N-1 rather than N. Since, given their definitions, they would be expected to sum to a constant value, the only surprise is that this does not happen all the time.
The value N is only said to approximate the result of the multiplication because, for PS, N is 57 and, for a character device, it is 58, even when the same value for depth ('9.66i') and the default LPI (6) is used. This table compares the values of N for each device for various values of LPI:
LPI PS Character device 6 57 58 8 76 77 10 95 97 12 114 116
Further testing shows that this is the exact formula for N:
N = (value(depth) * LPI) - ((y_offset * LPI) / vbu)
where "y_offset" is the value of the attribute y_start of the :PAGEOFFSET block which is, of course, given in vertical base units, "vbu" is the number of vertical base units per inch, and the division is integer division.
The data generated for this is confusing and need not be kept since it can be regenerated whenever needed. The values of syslc and sysline are computed independently of the vertical positions actually used; any apparent relationship turns out to be coincidental and dependent on the metrics of the device (that is, the values of LPI, y_offset, and vbu). The values of syslc and sysline do not vary by a fixed amount when a page consisting of one-line paragraphs is considered; it takes the first four lines or so for the values to settle down, as it were, and increase or decrease by a constant amount. The values of syslc and sysline do scale with the value of LPI; their failure to always sum to N suggests that they are computed individually. So far as can be seen, at least one line on the following page will show the values of syslc and sysline appropriate to a line on the current page before the values are reset. Although banners do not affect N, the number of lines output before syslc is re-initialized is affected by bottom banners (it isn't that more lines are moved to the next page, but rather, that the same number of lines are moved but fewer appear on the first page).
Intriguing as all this is, it is unnecessary at this point and may never be necessary, depending on whether or not syslc is actually used in the Open Watcom documents and, if it is used, how it is used, for some uses may permit a more sensible implementation than that found in wgml 4.0. As to the effect on text processing, it is clear that it will not possible to determine if the computation of syslc would affect how it is done -- for example, whether it would require the function forming output lines to return immediately after a line is processed, even though the current input buffer has not been completely processed, so that syslc and sysline can be updated before symbol substitution occurs.
When Page Number Symbols Update
In normal text, all page number symbols update when the value of syslc is reset, that is, up to three paragraphs into the new page. If they were always updated in the second paragraph, this could be explained very simply: since their values are substituted before the text is formed into an output line, their values will have the old page number even though the text ends up on a new page.
In banners, the wgml group is updated by the time the top banner is printed. This, however, can be explained by positing that the banner is interpreted after the value of the symbol used has changed. The value returned by device function %pages() is changed immediately after the :NEWPAGE block is interpreted.
All of this is consistent with how our wgml updates page numbers.
The Open Watcom documents use these symbols:
syslc syspage $pgnuma
The symbol $pgnuma appears in the values of attribute content in :BANREGION blocks. This is not remarkable; however, this is the only symbol in the wgml group to be used, which appears odd, since the Open Watcom documents begin by numbering their pages in lower-case Roman numbers. However, in that case a slightly different mechanism is used: the non-symbol "pgnumr" is given as the sole value of the attribute content.
The symbol syspage is used twice in this context:
.newpage odd ..sr tcpage=&SYSPAGE
and the newpage macro is defined (only once) as
.gt newpage add newpage .dm newpage begin .pa .dm newpage end
The control word .pa is described as
This control word causes a break. When it is encountered, the rest of the current page is skipped, any saved footnote lines are printed, the Footing Space lines are printed, and a new page is begun. If no operand is specified then the next page will be numbered sequentially one more than the current page number.
and limited testing suggests that the value of syspage will be correct after .pa has had its effect.
The symbol syslc is used only in the context shown above:
.if &syslc. gt 3 .do begin . .in . .tb 1 _/&syscl. . .tb set $ $$ . .tb set . .tb .do end
This is part of macro topsect. This macro is defined in a large number of files, and only one of the definitions uses syslc. Since topsect is used in defining other macros, it at first appeared to be impractical to say when, or if, it is actually used. However, by temporarily inserting the line
.ty Gotcha! syslc = &syslc.
in the .dm block containing syslc and then producing the Open Watcom documents, it has been possible to verify that syslc is never used. (Additional testing showed that control word .ty would produce output if the macro was invoked and, indeed, that some of the lines sent to the terminal when producing the Open Watcom documents are generated by control word .ty.)
So it appears that syslc is not needed. Ironically, this also applies to the use of "_" as a wgml tab fill character: this is also unique to this particular definition of topsect.