Augmented Devices

From Open Watcom

Revision as of 00:15, 7 July 2012; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Introduction

The WGML Reference in 15.3 Augmented Device Definitions states:

Certain device operations are not selectable through the device and 
driver definitions. WATCOM Script/GML augments some device 
definitions by directly supporting these operations.

The augmented device definitions are recognized by the starting 
characters of the driver name. For example, HPLDRV is recognized as 
the name of a driver definition for the HP LaserJet printer.

Name Prefix    Augmented Device
HPL            HP LaserJet
HPLP           HP LaserJet Plus
MLT            Multiwriter V (emulation mode)
MLTE           Multiwriter V (express mode)
PCG            IBM PC Graphics
PS             PostScript

If the driver definition name begins with HPL (but not HPLP), the 
value returned by the %X_ADDRESS and %TABWIDTH device functions is 
in terms of decipoints instead of dots.

Exactly what functionality is provided is not otherwise specified. Since the PostScript device is part of the Open Watcom document system, it was investigated and is being implemented. The other devices can wait until needed. The basic test method is described here; as the ability to compare our wgml with wgml 4.0 improved, additional tests were done, revealing additional enhancements. This is an on-going process.

The test device and driver source files were copied and renamed to start with "ps". Only the device and driver files using the :ABSOLUTEADDRESS block, the :HLINE block, the :VLINE block, and the :DBOX block was used because the actual PostScript driver and device provide these blocks. The extension for the output file was "pst".

It is important to understand that the device definitions were otherwise identical, so any difference could only have resulted from the appearance of the prefix "ps". A diff program was then used to see what happened.

A general result was that only the value of attribute defined_name for the :DRIVER block must start with "ps" for the effect to be seen. Neither the defined name of the :DEVICE block nor either member (file) name needs to start with "ps". This is, of course, what the material quoted above says, if "driver name" is understood to be the :DRIVER block 's "defined name".

Device PS

This section documents the augmentations found so far that are done whenever the defined name of the :DRIVER block for a device starts with "ps". Since some of them involve PostScript language statements, any such device should be, in some sense, a PostScript device.

The results fall into three categories:

  • Those involving the available fonts.
  • Those affecting the output of PostScript language statements.
  • Those affecting the output of document text, including whitespace.

The next sections discuss these results.

The Available Fonts

The PS :DEVICE block uses a font name ('courier') with the :BOX block. Normally, this would result in an available font being generated for the 'courier' font and the :FONTVALUE blocks in the :INIT blocks would be interpreted for this generated available font.

This, however, does not happen. The reason is probably because the PSDRV :DRIVER block defines the :HLINE block, the :VLINE block, and the :DBOX block, so the :BOX block is never used, and so the font it defines is never used.

When the test driver is used, an available font is generated if the :UNDERSCORE block uses a non-empty font name. Since all versions of the PSDRV :DRIVER block define the macros @uline and @euline and use them instead of device functions %ulineon() and %ulineoff(), this is surprising since it is clearly never needed either.

This was implemented very easily: any additional available fonts based on the :BOX block or the :UNDERLINE block are not created if the PS device is being used.

PostScript Language Statements

When PostScript language statements are being emitted, then output records are terminated at the end of a word. As discussed here, wgml 4.0 normally flushes the output record when it has exactly the number of characters specified by the file specification, although it will do so earlier if device function %recordbreak() is encountered.

In this case, when the last word in the text would normally not fit into the output record, then the output record is flushed before the last word in inserted (the "last word", of course, becomes the first word of the next record).

Interestingly, if the word is followed by a space, that space must also fit into the output record. The space itself, however, is skipped. If followed by more than one space, the first space is skipped and the remaining spaces are placed in the output record, until it is full; any remaining spaces appear at the start of the next output record.

To illustrate this, consider this device function invocation:

%image("Font Number: ")

this inserts the character string shown into the output record. When the output record would normally end with Numb", then, for the pstest device but not the normal test device, the output record ends with "Font" and "Number: " appears at the start of the next output record (the space between them does not appear in either output record). However, "Number: " will also appear at the start of the next output record even when "Number:" would fit on the end of the preceding output record. Altering the string shows that "Nmber: " will appear at the end of the same output record as "Font ", no matter how many spaces it is followed with (the first being skipped).

When device function %text() is used instead of %image(), so that output translation occurs, then the behavior is exactly the same, but it is the translated word that must fit. This means that multi-byte output translations are not treated as described here, but rather as an integral and inseparable part of the translated word.

Testing showed that, if an entire word cannot be inserted into an empty output record, then wgml 4.0 emits this error message:

IO--011: Output file's record size is too small for
         the device 'pstest'

and exits. This is unlikely to happen with actual documents and the PS device, however.

Document Text

This category contains several distinguishable enhancements.

Text Output As Such

Text which is intended to appear in the final document is treated this way:

  • It is preceded by a "(" character, which does not undergo output translation and which never appears in the final position of the output record.
  • It is followed by ")", which does not undergo output translation and which is allowed to appear in the final position of the output record.
  • It is followed by either " shwd " or " sd ", which does undergo output translation and which is treated as discussed here.
  • If the horizontal position at which the current text is to begin was established by the :HTAB block, then " sd " is used.
  • If the horizontal position at which the current text (including any preceding spaces) is to begin was established by either the :ABSOLUTEADDRESS block or the emission of characters (including space characters), then " shwd " is used.
  • Test devices which start with "ps" but which do not define "sd" or "shwd" still have these sequences inserted by wgml 4.0.

Subjecting the " sd " or " shwd " to output translation only works because none of the characters involved are, in fact, translated. This is necessary because these are PostScript language statements, and must appear as-is in the output file in order to work properly. The implementation ensures this by not subjecting them to output translation at all. In effect, it treats them as if they were inserted as the result of device function %image() rather than device function %text().

Both "shwd" and "sd" are defined in the PS :DRIVER block in psdrv.pcd:

%image('/sd { exch currentpoint exch pop moveto shwd } def')
%image('/shwd {show} def')

It should be clear from this that the PS augmentation and the PS driver are closely connected.

The definition of "sd" is clearly to work with the definition of :HTAB:

:HTAB.
   :value.
      %image(%decimal(%x_address()))
      %image( ' ' )
   :evalue.
:eHTAB.

and so to depend on the value of %x_address() being available on, presumably, a stack used by the PostScript interpreter.

The implementation of the :HTAB block sets a flag; the function pre_text_output() sets a flag and, for a PS device, inserts the "(" correctly; the function post_text_output() clears two flags and, for a PS device, does this:

  1. inserts ")"
  2. inserts " shwd " or " sd ", as appropriate.

At present, this appears to produce the best match to the wgml 4.0 output, given the manner in which Postscript language statements are processed.

Examining the pattern in which these functions are used is the best way to find out how the desired output is produced.

The Sequence "() shwd "

This sequence amounts to outputting nothing at all. These notes are from the preliminary investigation. It occurs in some contexts and not in others:

  • It is inserted before the :ABSOLUTEADDRESS block in these contexts:
    • subsequent paragraph lines (left margin "6");
    • text used with or generated by a :FIG tag (left margin "8");
    • text used with the :BACKM tag (left margin "9");
    • the title-line "Index" (left margin "21");
    • the letters used to organize the Index (left margin "8");
    • the Index entries (left margin "6" or "7", depending on level);
    • the Table of Contents entries (left margin "6");
    • the List of Figures entries (left margin "6");
  • It is not inserted before the :ABSOLUTEADDRESS block in these contexts:
    • the first line of a paragraph (left margin "6", total "9");
    • the title-line "Table of Contents" (left margin "6", total "15");
    • the title-line "List of Figures" (left margin "6", total "16");

Notes:

  • Since the sequence "() shwd " represents no output at all, it is invisible in non-PS devices but may still be occurring. In other words, some occurrrences may be part of text output in general, not PS-specific.
  • The sequence "() shwd " sometimes appears in other contexts than proximity to the :ABSOLUTEADDRESS block.
  • The sequence "() shwd " does not appear in a simple PS file; however, it does not appear in a simple test device file either, so that may not mean much. A more complicated PS file will need to be examined.

Additional result: inserting text to move a normal "shwd" to the start of the next output record while filling the current output record caused the following :HP1. phrase to move to a new line, producing an :ABSOLUTEADDRESS block for the new line followed by a "() shwd " which pretty clearly was the space following the additional material and preceding the :HP1. block -- or rather, was not the space, but was located exactly where the space would be if the new line had not intervened, the current font being the default font (the switch to the :HP1. font followed). The possibility that this is the result of an empty text_chars at the start of a line needs to be explored.

Another additional result: When exploring tabbing with groups of multiple wgml tabs, each tab was encoded in an empty text_chars instance, even when "wscript" was in use. This produced the "() shwd " with both wscript and script. It did this in the test program outcheck as well as wgml 4.0, with no alteration of the output code itself. This "augmentation", then, may turn out to be nothing of the sort and, indeed, a consequence of the text output sequences already discovered.

Text That Spans Two Output Records

Here is what a paragraph looks like when output with the PS driver:

@fs2 1000 10633 am (This is the first sentence of the very first paragraph. Th\
is is the second sentence in that paragraph.) shwd 1000 10299 am (Here is the \
second paragraph. This is the second sentence in the second paragraph.) shwd 
1000 9965 am (Here, amazingly enough, is a third paragraph! It was added to se\
e what happens when a new page is) shwd 1000 9798 am (needed.) shwd 1000 9464
am (Now that a DEVICE_PAGE has been triggered, let's try for a DOCUMENT_PAGE! \
Previous tests) shwd 1000 9297 am (suggest that it shouldn't take much. Well, \
perhaps a bit more. That produced a second) shwd 1000 9130 am (DEVICE_PAGE, bu\
t DOCUMENT_PAGE is supposed to take priority when both are reached.) shwd 1000 
8963 am (Success!) shwd 

As can be seen, when output text is too long to put into one outbut record, then the last character in the first record becomes "\" and the first character in the second record is the character that would otherwise have appeared where the "\" appears.

Testing shows that the PS Interpreter will insert an extra space if a "(" character has been encountered and the output record does not end with "\". This is the reason that a "(" cannot occur in the final position of an output record: since no "\" follows it, an extra space will be inserted into the document. At least, that is the behavior when justification is off.

Investigation showed that wgml 4.0 will not end an output record containing output text with anything except "\" (or ")", if the text ends at the next-to-last position).

Testing during the implementation of control word TB revealed that there may be further details to be discovered here. A series of fixes was attempted, but each led to a new problem. Since this was not actually relevant to control word TB, and was not actually needed for the full test file, but rather appeared in a smaller test file used to isolate problems for diagnosis and solution, work was postponed. It is possible that the sequences for subscript and superscript require special handling with regard to determining how they work when they fall at the end of an output line.

Spacing For Justified Text

This is what text output looks like with the PS device when justification is "on":

@fs2 1000 10633 am (This is the first sentence of the very first paragraph. Th\
is is the second sentence in that paragraph.) shwd 1000 10299 am (Here is the \
second paragraph. This is the second sentence in the second paragraph.) shwd 
1000 9965 am (Here,) shwd 1366 (amazingly) sd 2016 (enough,) sd 2524 (is) sd 
2653 (a) sd 2759 (third) sd 3099 (paragraph!) sd 3822 (It) sd 3959 (was) sd 
4220 (added) sd 4622 (to) sd 4775 (see) sd 4989 (what) sd 5321 (happens) sd 
5855 (when) sd 6210 (a) sd 6316 (new page is) sd 1000 9798 am (needed.) shwd 
1000 9464 am (Now) shwd 1314 (that) sd 1598 (a) sd 1710 (DEVICE_PAGE) sd 2752 
(has) sd 2996 (been) sd 3317 (triggered,) sd 3946 (let's) sd 4236 (try) sd 4456
 (for) sd 4676 (a) sd 4787 (DOCUMENT_PAGE!) sd 6175 (Previous) sd 6736 (tests)
sd 1000 9297 am (suggest) shwd 1517 (that) sd 1841 (it) sd 2009 (shouldn't) sd 
2660 (take) sd 2999 (much.) sd 3542 (Well,) sd 3949 (perhaps) sd 4512 (a) sd 
4663 (bit) sd 4908 (more.) sd 5427 (That) sd 5796 (produced) sd 6445 (a) sd 
6596 (second) sd 1000 9130 am (DEVICE_PAGE,  but) shwd 2371 (DOCUMENT_PAGE) sd 
3721 (is) sd 3877 (supposed) sd 4492 (to) sd 4671 (take) sd 4990 (priority) sd 
5519 (when) sd 5900 (both) sd 6235 (are) sd 6491 (reached.) sd 1000 8963 am (S\
uccess!) shwd 

This appears to be what is happening:

  1. The last line of a paragraph, and so any single-line paragraph (for example, the first two paragraphs in the text shown), are not justified.
  2. Lines that are justified use "<num> (<text>) sd" to produce the correct position, but only if the requested spacing is not an integer multiple of the width of a space character.

The first point is plainly visible in the resulting document. The second is deduced from the item

6316 (new page is) sd 1000 9798 am 

in the seventh line. This both shows that spaces can be used in text output with sd, and the output from the :ABSOLUTEADDRESS block ("1000 9798 am") clearly shows that the "sd" follows the text and the number.

It seems reasonable to suppose that the :HTAB block will still be used, when the spacing requested is an even multiple of the width of a space character, as decribed here. This, of course, is normal behavior, not an augmentation.

Since "sd" is used when an :HTAB block has been interpreted the augmentation can be summarized as:

the :HTAB block is used when the distance to be moved is not an even multiple of the space width.

This has been implemented in the appropriate horizontal positioning sequences, and this was done so that it would apply to all page-addressing devices. This analysis is here, then, not because this is a PS device augmentation, but because device PS is the only device used with the Open Watcom documents which is a page-addressing device, and so the implementation is inevitably based on how wgml 4.0 does justification with device PS.

Finally, the lines

1000 9464 am (Now) shwd 1314 (that) sd 1598 (a) sd 1710 (DEVICE_PAGE) sd 2752 
(has) sd 2996 (been) sd 3317 (triggered,) sd 3946 (let's) sd 4236 (try) sd 4456

suggest, since the initial "(" in "(has)" would certainly fit in the final position of the preceding output record, that, even when justification is on, wgml 4.0 will not end a line with "(".

Subscripts and Superscripts

These script functions are used for subscripts:

s' 'sub()

and these for superscripts:

S' 'sup()

For character-mode devices, at least, for those in the Open Watcom repository, these do nothing. For device PS, special PostScript command sequences are emitted.

In wgml 4.0, the sequences used for superscripting with S' are effective and have this form:

 41 0 41 rmoveto .7 .7 scale

appears before the superscript text and

 1 .7 div dup scale neg 0 exch rmoveto

appears after the superscript text.

In wgml 4.0, the sequences used for subscripting with s' are the same two, reversed. This does not work; however, these sequences do work:

 41 0 41 neg rmoveto .7 .7 scale

used before the subscript text and

 1 .7 div dup scale 0 exch rmoveto

used after the subscript text.

The 'sup() and 'sub() functions use the same sequences, but wgml 4.0 has problems distinguishing between the PostScript command sequences and the output text.

The only value that varies is "41": it depends on the value of the font_height attribute of the :DEFAULTFONT block or the FONT command-line option. The formula is quite simple: the font_height, expressed in vertical base units, is multiplied by 3 and divided by 10 using integer arithmetic.

From this a theory of what the PostScript commands do can be formulated: the print position is moved up or down by 0.3 of the font height, and then the font is scaled to 0.7 of its nominal height and the text printed, after which both effects are reversed.

For device PS, then, subscripting and superscripting can be applied to any font and to any character in that font, without any restrictions.

There are, however, some restrictions on what can be subscripted or superscripted. In particular, a highlighted phrase cannot occur within the text to be subscripted or superscripted, that is, this sort of thing can not be expected to work:

&s':hp1.text.ehp1. &S':hp1.text.ehp1. &'sub(:hp1.text.ehp1.) &'sup(:hp1.text.ehp1.)

The reason for this is clear when the sequences used to implement subscripting and superscripting are considered: the font in effect before the :HPn tag is encounterd will be scaled, not the font intended. Embedding the function in the highlighted phrase would work if wgml 4.0 didn't mess up its output.

The tendency of wgml 4.0 to produce bad output made implementing the output code for these functions very difficult. The errors are not that hard to correct and, with a little planning and some effort with a text editor on the output file, an output file that demonstrates the various points can be created, so that it was possible to confirm that various planned aspects of the implementation would work, but the result will not be what wgml 4.0 does but rather what it would do had its implementors had the time to ensure correct output.

It should be noted that the Open Watcom documents only use S', and never embedded in a highlighted phrase (at least, not without other text both before and after the use of S'): wgml 4.0 works precisely in the only context in which it must work for the Open Watcom documents to be created. This can hardly be considered a coincidence.

One very interesting technical point was discovered when using a test device: the point at which the sequences are emitted differs between correct and incorrect output. To be specific:

  • When incorrect output is produced, the sequences are produced after the :LINEPROC block :FIRSTWORD block (or :STARTWORD block) and before the :LINEPROC block :ENDWORD block. They thus appear to be an inherent part of text output itself.
  • The correct output is produced, the sequence used before the text occurs before the :LINEPROC block :FIRSTWORD block (or :STARTWORD block) and the sequence used after the text occurs after the :LINEPROC block :ENDWORD block.

In this case, "correct" means that the resulting file can be displayed, without alteration, by GhostView.

As it happens, the PS device, as used with the Open Watcom documents, does not implement the :LINEPROC block :FIRSTWORD block, :STARTWORD block, or :ENDWORD block, so this feature will be implemented as part of the text output. At least, that will be how it is done first; testing will determine the final implementation.

When a superscript is placed at the start of a line by wgml 4.0, PostScript does not display it. Investigation showed that wgml 4.0 was placing the superscript initial sequence before interpreting the :ABSOLUTEADDRESS block to position PS at the start of the next line. Moving the positioning sequence so that it was in front of the superscript initial sequence resulted in the line being displayed by PS. Testing showed that the code in devfuncs.c interprets the :ABSOLUTEADDRESS block before it emits the superscript initial sequence, so our wgml does this correctly, and does it correctly for subscripts as well.

Personal tools