Device Function Language

From Open Watcom

Revision as of 17:31, 18 July 2009; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Introduction

This page focuses on the device functions as such. Closely related (so closely that some cross-referencing is inevitable) pages are:

  • Device File Blocks, which documents where device functions occur in the :PAUSE and :FONTPAUSE blocks;
  • Driver File Blocks, which documents where device functions occur in almost every included block; and
  • Device Functions, which documents those parts of the binary file format within which the compiled form of the device functions is placed.

From the viewpoint of the page Binary Device Files, then, this page is just another part of the topic it addresses.

However, considered by itself, this page will do much more than document a part of the source and binary formats used by the device library. It will document the device function language, which is to say:

  • which device functions are recognized by gendev 4.1 and implemented by wgml 4.0;
  • a description of the language they form: how they are grouped, how they interact, how they can be used; and
  • a description of the compiled form of these functions.

All of the statements made here were confirmed by actual test. Many of them match statements made in the WGML Reference or the README file produceable from the WGML 3.33 Update and should be taken as confirming the documentation, not as new discoveries. Those which contradict the documentation or which provide information not included in the documentation are new discoveries.

Definitions

In April, 1982, I purchased a book by R. G. Loeliger titled Threaded Interpreted Languages. I then implemented a TIL: not FORTH, the system I was using provided a line editor that reacted to "@" by erasing the text and starting over, but the same sort of language. I managed a cross-compiler (which, in the TIL world, means that it produced stand-alone programs rather than packages that had to be invoked within the interpreter) and an assembler for the Z-80 that used the Zilog instruction formats (i.e., not RPN formats). The latter involved treating the opcodes as verbs, that is, as the names of functions, rather than processing them as data, a technique that may or may not reappear when gendev is written. This was interrupted when I purchased my first "IBM PC" clone, and thus could obtain software with which useful work could be done.

That constitutes my entire experience in compiler theory and practice. If any of the definitions offered here are wrong, please let me know (on the newsgroup) and I will make corrections.

A device function name is a token which gendev 4.1 recognizes as naming a device function and so compiles into the CodeBlock it is creating.

The term value block tag refers to any or all of the tags in this table:

Start Tags      End Tags
:ENDVALUE       :eENDVALUE
:ENDWORD        :eENDWORD
:FIRSTWORD      :eFIRSTWORD
:FONTVALUE      :eFONTVALUE
:STARTVALUE     :eSTARTVALUE
:STARTWORD      :eSTARTWORD
:VALUE          :eVALUE

In addition, the term start tag will be used to refer to the tags in the first column, and the term end tag will be used to refer to the tags in the second column, when it is clear that it is value block tags that are under discussion. These tags are said to correspond when they are on the same line in the table above.

The term function block is used to refer to the device functions placed between a start tag and its corresponding end tag.

The term function sequence is used to refer to a Type I device function and all of its parameters, and all of its parameters' parameters, to whatever depth they exist.

Although gendev is generally said to "encode" the attributes in its source files in the binary files it produces, gendev is said to compile the function blocks into the field function of the corresponding CodeBlock.

While wgml is generally said to "use" or "apply" the data encoded in binary device files, wgml is said to interpret the field function of each CodeBlock it uses.

These codes were intended to be used to refer to the various types of functions block and CodeBlock, and are occasionally, although so far I have found it clearer to cite the exact context instead:

Source Binary Context
FB00   CB00   a :VALUE block
FB02   CB02   a :FONTVALUE block
FB04   CB04   an :ENDVALUE block not within a :LINEPROC block
FB05   CB05   a :STARTVALUE block not within a :LINEPROC block
FB08   CB08   an :ENDVALUE block within a :LINEPROC block
FB09   CB09   a :STARTVALUE block within a :LINEPROC block
FB28   CB28   an :ENDWORD block
FB29   CB29   a :STARTWORD block 
FB49   CB49   a :FIRSTWORD block

They consist, of course, of FB or CB followed by the corresponding CodeBlock designator.

The term parameter block is used to refer to either of the two structs identified below used to hold the parameters of device functions that have been compiled.

Two successive %text() functions with literal parameters are compiled as if they were a single %text() function with the parameters concatenated. The %image() device function is compiled the same way. The term merged will be used to refer to this situation; that is, the %text() or %image functions will be said to be "merged".

Some device functions have similar names and are occasionally referred to as a group:

  • %binaryN() is used to refer to any or all of
%binary() %binary1() %binary2() %binary4()
  • %ifX() is used to refer to any or all of
%ifeqn() %ifeqs() %ifnen() %ifnes()

Device Function List

The device functions given in the WGML Reference are:

%add() %binary1() %binary2() %binary4() %cancel() %clear3270() 
%clearPC() %date() %decimal() %default_width() %divide()
%flushpage() %font_height() %font_number() %font_outname1()
%font_outname2() %font_resident() %font_space() %hex()
%image() %line_height() %line_space() %page_depth() %page_width()
%pages() %recordbreak() %remainder() %sleep() %subtract()
%tab_width() %text() %thickness() %time() %wait() %wgml_header()
%x_address() %x_size() %y_address() %y_size()

The additional device functions given in the README file produceable from the WGML 3.33 Update are:

%endif() %getnumsymbol() %getstrsymbol() %ifeqn() %ifeqs() %ifnen()
%ifnes() %lower() %setsymbol()

Note: the README actually shows "%endif" (no parentheses), however, all instances shown are "%endif()" and, in fact, removing the parentheses produces this note:

String is = <endif>

and this error message:

DF--001: Unrecognized device function tag

from gendev 4.1. Clearly, "%endif()" is the correct form.

The research program findfunc.exe identified these completely undocumented functions in the :DEVICE and :DRIVER blocks available to me:

%binary() %dotab() %enterfont() %textpass() %ulineoff() %ulineon()

%binary() was not, in fact, found with findfunc, but was discovered when research was done to see if %binary2() or %binary4() were used anywhere.

It is, of course, possible that others are recognized by gendev 4.1 and wgml 4.0 but, since they are neither documented nor used, they cannot be identified. Well, provided I've found all the device functions used in the source files available to me.

Grammar

I am using (or abusing) the term "grammar" to refer to all aspects of the source form of the device function language.

Available Documentation

The WGML Reference provides basic information about the device functions it documents.

The README file produceable from the WGML 3.33 Update provides one-line descriptions of the additional device functions it lists.

Additional sources of documentation do exist. The first source consists of error messages listed in The WGML Reference. The second source consists of :CMT. lines in the various source files available to me.

Orthography

This section deals with how the language is written. These rules apply:

  1. Only alphabetic letters, the underscore character, and numbers are used in device function names.
  2. Only 7-bit ASCII character encodings are used.
  3. No device function names contain spaces.
  4. Each alphabetic letter can be in upper or lower case.
  5. Once the start tag of a function block has been seen, no other tags may appear except the end tag which, of course, terminates the function block. In particular, neither :CMT. nor :INCLUDE may appear.
  6. Each ( must be matched by a ). An end tag does not close open parentheses.
  7. Multiple parameters to the same device function must be separated by commas.
  8. Spaces may not be used between the device function name and the preceding %.
  9. Spaces may not be used between the device function name and the following (.
  10. Whitespace can be used between device function names and, within the parentheses, before and/or after the parameters. This allows function blocks to be written on multiple lines, if desired.

Rules 1, 2, and 3 are based entirely on examination of the known device function names.

Rule 4 is based on actual testing with gendev.

With regard to rule 5, a typical error message (one exists for each end tag) produced by gendev when a tag is encountered which is not the expected end tag is:

SN--046: Expecting :evalue tag

With regard to rules 6 and 7, omitting a closing parenthesis produces this note:

Parameter = recordbreak , Tag = image

and this error message from gendev:

DF--005: Commas must separate device function parameters

when a device function is encountered next and the error message

DF--004: Not a valid character in a device function

(presumably a reference to :) when the end tag is encountered next.

With regard to rules 8, 9, and 10, this example is offered:

When one of the :VALUE blocks in my test.pcd file,

%image("*** START PAUSE block.")
%recordbreak()

is modified to be

%image ("*** START PAUSE block.")
%recordbreak()

then gendev issues this note:

String is = <image ("*** START PAUSE block.")%recordbreak()>

and this error message from gendev:

DF--001: Unrecognized device function tag

If the first line is modified to:

% image("*** START PAUSE block.")

then the same error results with, of course, a slightly different note:

String is = < image("*** START PAUSE block.")%recordbreak()>

from which these conclusions can be drawn:

  • since % does not appear in the string shown in the note, it is not part of the device function name but instead marks where the device function name begins;
  • the string shown in the note extends to the end of the function block; this is probably done to make the location of the error as clear as possible; and
  • whitespace is generally allowed between device function names and within the parentheses before and/or after the parameter.

It is, of course, true that rules 8 and 9 may reflect the fact that none of the device function names actually used begins or ends with a space, rather than that they cannot begin or end with a space. It is simply not possible to distinguish the two cases.

Device Function Types

The WGML Reference distinguishes two types of device functions with respect to whether or not they directly produce output to the device:

The result of some device functions will be used as final values
for the sequence being defined. A final value is sent directly to
the output device. Some of the device functions produce results 
which are not suitable for use as a final value. The result of 
this type of function must be supplied as a parameter value to a 
device function which can produce a final value.

The function-by-function documentation then identifies these functions as producing "results which are not suitable for use as a final value":

%add() %date() %decimal() %default_width() %divide() %font_height()
%font_number() %font_outname1() %font_outname2() %font_resident()
%font_space() %hex() %line_height() %line_space() %page_depth() 
%page_width() %pages() %remainder() %subtract() %tab_width()
%thickness() %time() %wgml_header() %x_address() %x_size()
%y_address() %y_size()

It might be thought that the remaining functions:

%binary1() %binary2() %binary4() %cancel() %clear3270() %clearPC()
%flushpage() %image() %recordbreak() %sleep() %text() %wait()

all produce a "final value". This, however, is not the case: the only characteristic they have is common is that they cannot be used as parameters to other functions.

These are the functions documented to produce a "final value":

%binary1() %binary2() %binary4() %image() %text()

These are documented to have no effect when used in a :DRIVER block, but only in a :DEVICE block:

%clear3270() %clearPC() %wait()

which leaves functions without an explicit grouping:

%cancel() %flushpage() %recordbreak() %sleep() %wait()

although they could be considered "control functions".

The WGML Reference then makes this statement:

Prior to transmitting the device function sequences to the output 
device, WATCOM Script/GML translates each character of the sequence 
into another character. The translation values are defined in the 
font definitions used with the device. Some of the device functions 
produce final values which will not be translated.

In actual fact, per the documentation, only the %text() function's output is translated. Of course, of the other functions which actually produce a final value, translating the output of %binary1(), %binary2(), and %binary4() (which insert uint8_t, uint16_t, and uint32_t values, respectively) would not make much sense. The only other such function is %image() and, in fact, the only documented difference between %image() and %text() is that the result of the %image() function is not translated, while that of %text() is.

Based on the above, these device function types can be distinguished:

  • Type Ia device functions produce final values.
  • Type Ib device functions are used to control the process.
  • Type Ic device functions are used for user interaction.
  • Type II device functions can only be used as arguments to another device function.

Investigation of the Type II device functions suggests these sub-types:

  • Type IIa device functions are used for mathematical operations.
  • Type IIb device functions provide values from a :DEVICE, a :DRIVER, or a :FONT block.
  • Type IIc device functions provide formatting.
  • Type IId device functions do various other things.

To categorize all of the functions, not just those given in the WGML Reference, I took advantage of the fact that wgml executes the CodeBlocks produced from the :INIT block with the value "start" for the attribute place almost immediately, even before looking for the document specification file. wgml can thus be used to produce the output of any device function or combination of device functions allowed in an :INIT block very easily.

Starting with a complete list of all device functions, these caused gendev to emit this message:

DF--008: This tag at start of device function sequence is invalid

thus showing that they are Type II functions:

%add() %date() %decimal() %default_width() %divide() %font_height() 
%font_number() %font_outname1() %font_outname2() %font_resident() 
%font_space() %getnumsymbol() %getstrsymbol() %hex() %line_height() 
%line_space() %lower() %page_depth() %page_width() %pages() 
%remainder() %subtract() %tab_width() %thickness() %time() 
%wgml_header() %x_address() %x_size() %y_address() %y_size()

That leaves these as the Type I functions:

%binary %binary1() %binary2() %binary4() %cancel() %clear3270() 
%clearPC() %dotab() %endif() %enterfont() %flushpage() %ifeqn()
%ifnen() %ifeqs() %ifnes() %image() %recordbreak() %setsymbol()
%sleep() %text() %textpass() %ulineoff() %ulineon() %wait() 

Dividing them into the three subtypes given above required the use of both gendev and wgml, and, in the process, produced the initial information on sequencing and function signatures. The functions of each sub-type are:

Type Ia ("final"):

%binary() %binary1() %binary2() %binary4() %image() %text() 

Type Ib ("control"):

%cancel() %dotab() %endif() %enterfont() %flushpage() %ifeqn()
%ifnen() %ifeqs() %ifnes() %recordbreak() %setsymbol() %sleep()
%textpass() %ulineoff() %ulineon()

Type Ic ("user interaction"):

%clear3270() %clearPC() %wait() 

Type IIa ("math"):

%add() %divide() %remainder() %subtract() 

Type IIb ("device info"):

%default_width() %font_height() %font_number() %font_outname1()
%font_outname2() %font_resident() %font_space() %line_height() 
%line_space() %page_depth() %page_width() %pages() %tab_width()
%thickness() 

Type IIc ("formatting"):

%decimal() %hex() %lower() 

Type IId ("other"):

%date() %getnumsymbol() %getstrsymbol() %time() %wgml_header() 
%x_address() %x_size() %y_address() %y_size()

Device Function Signatures

This is the information accumulated as a result of investigating other topics. The binary file was examined to make some of these determinations.

A few notes on parameters:

  • gendev does object if it does not find the required number of parameters;
  • gendev does not object if it finds more than the required number of parameters;
  • gendev only compiles the required number of parameters, moving from left to right; additional parameters are ignored (at least for %emit());
  • random tests showed functions accepting strings or numbers without regard to the documented requirements.

There are five return value/parameter types:

  • numeric, that is, a sequence of digits; for a literal parameter, either decimal or, if preceeded by $, hexadecimal; for a return value, some suitable integer;
  • uint8_t, uint16_t and uint32_t are used for one-byte, two-byte and four-byte integers (the two-byte and four-byte integers are little-endian);
  • character, that is, a sequence of characters; for a literal parameter, it is enclosed in delimiters;
  • symbol, which is the same as character but is used as the name of a user-defined symbol rather than as a value; and
  • void, which, as usual, means that the function takes no parameter or returns no value, depending on where it is used.

This is for the Type I device functions. "Returns" indicates what, if anything, is inserted into the output buffer. "Parameters" reflects the documented types.

Function    Returns   Parameters           Side Effects
binary      uint8_t   numeric
binary1     uint8_t   numeric
binary2     uint16_t  numeric
binary4     uint32_t  numeric
cancel      void      character
clear3270   void      void
clearPC     void      void
dotab       void      void
endif       void      void
enterfont   void      numeric              invoke :FONTSWITCH
flushpage   void      void
ifeqn       void      numeric, numeric
ifnen       void      numeric, numeric
ifeqs       void      character, character
ifnes       void      character, character
image       character character
recordbreak void      void                 flushes the buffer
setsymbol   void      symbol, character
sleep       void      numeric              hangs gendev
text        character character
textpass    void      void                 insert current text
ulineoff    void      void                 wgml undercore off
ulineon     void      void                 wgml underscore on
wait        void      void

This is for the Type II device functions. "Returns" indicates what the function returns. "Parameters" reflects the documented types.

Function         Returns   Parameters           Side Effects
add              numeric   numeric, numeric
date             character void
decimal          character numeric
default_width    numeric   void
divide           numeric   numeric, numeric
font_height      numeric   void
font_number      numeric   void
font_outname1    character void
font_outname2    character void
font_resident    character void
font_space       numeric   void
getnumsymbol     numeric   symbol
getstrsymbol     character symbol
hex              character numeric
line_height      numeric   void
line_space       numeric   void
lower            character character
page_depth       numeric   void
page_width       numeric   void
pages            numeric   void
remainder        numeric   numeric, numeric
subtract         numeric   numeric, numeric
tab_width        numeric   void
thickness        numeric   void
time             character void
wgml_header      character void
x_address        numeric   void
x_size           numeric   void
y_address        numeric   void
y_size           numeric   void

The return types map to specific output-related functions that can accept that type of parameter:

Return Type      Useable With
character        %image(), %lower(), %text()
numeric          %add(), %binary(), %binary1(), %binary2(), 
                 %binary4(), %decimal(), %divide(), %hex(), 
                 %remainder(), %subtract()

Ultimately, of course, a Type I function must be used to actually insert bytes into the output buffer. By using %decimal() or %hex() with functions having numeric return values, %image() and %text() can insert the result of any Type II function. The functions %binary(), %binary1(), %binary2(), and %binary4(), in contrast, only work correctly with functions providing numeric return values: when used with character values, for example, %wgml_header() ("V4.0 PC/DOS"), they produce:

Function Sequence                 Decimal Result  Hex Result
%image(%decimal(%wgml_header()))  394944          606C0
%image(%hex(%wgml_header()))      395040          60720
%text(%decimal(%wgml_header()))   395136          60780
%text(%hex(%wgml_header()))       395232          607E0

It appears possible that wgml 4.0 is treating a char * as if it were a uint16_t.

Device Function Notes

This section contains the notes pertaining to gendev 4.0. The page Device Function Notes contains the notes pertaining to wgml 4.1.

The initial version of our gendev will have to behave in a very similar manner to gendev 4.1, the primary difference being the emission of error messages giving useful information indicating what the problem encountered was instead of relying on "Abnormal program termination" messages which do not give any useful indication of the problem. Also, it should not hang when device function %sleep() is encountered.

General Rule

The general rule for gendev 4.1 is this:

  • Extensive testing has shown that gendev 4.0 will allow any device function to occur in any block, except as noted below.
%sleep()

The function %sleep(), when used with a literal parameter, hangs gendev (whether DOS version 3.33, DOS version 4.1, or OS/2 version 4.1), and peaks the processor per Task Manager (Windows XP)/System Activity Monitor (OS/2)! This function is documented to cause "WATCOM Script/GML to suspend document processing for the specified number of seconds". None of the source files available to me use this function.

When used in this way:

%setsymbol("fred","1")
%sleep(%getnumsymbol("fred"))

gendev 4.1 successfully produces the binary file. wgml 4.0, however, then proceeds to hang!

There is additional information here. It records a coding error on the part of gendev 4.1 when compiling %sleep() with a non-literal parameter which may, or may not, explain why wgml 4.0 hangs.

%textpass(), %ulineoff, and %ulineon()

Extensive testing has shown that these functions cannot be used anywhere else than in a :LINEPROC block within a :FONTSTYLE block.

Within a :LINEPROC block within a :FONTSTYLE block, these functions can appear in these sub-blocks:

  • %textpass() is only allowed in :STARTVALUE blocks;
  • %ulineon() is allowed in :STARTVALUE, :FIRSTWORD, and :STARTWORD blocks but not in :ENDWORD or :ENDVALUE blocks; and
  • %ulineoff() is allowed in :FIRSTWORD, :STARTWORD, :ENDWORD, and :ENDVALUE blocks, but not in :STARTVALUE blocks

These rules apply to their interaction:

  1. At most one %textpass() can be used in :LINEPROC block.
  2. Neither %ulineon() nor %ulineoff() can be used in the same function block as %textpass().
  3. %ulineon() must be found before %ulineoff() in the same :LINEPROC block.
  4. If %ulineon() is found, then %ulineoff() must be present in the same :LINEPROC block.

When a %textpass() function is used where it is not allowed, this message is emitted by gendev:

SN--023: Invalid location for a TEXTPASS directive

When more than one %textpass() function is used in the the :STARTVALUE block of a :LINEPROC block, this error message results:

SN--024 More than one TEXTPASS directive specified in :lineproc

When a %ulineoff() function is used where it is not allowed, this message is emitted by gendev:

SN--052 Invalid location for a ULINEOFF directive

When a %ulineon() function is used where it is not allowed, this message is emitted by gendev:

SN--051 Invalid location for a ULINEON directive

When %textpass() and either (or both) of %ulineon() or %ulineoff() occur in the same function block, gendev presents this message:

SN--029 Both a TEXTPASS directive and a ULINEON or ULINEOFF found in a :lineproc

If %ulineoff() is found anywhere in a :LINEPROC block and no %ulineon() preceeds it in that same :LINEPROC block (they do not have to be in the same sub-block), then this message results:

SN--060 Expecting a ULINEON directive before a ULINEOFF

If %ulineon() is found anywhere in a :LINEPROC block and no %ulineoff() follows it in that same :LINEPROC block (they do not have to be in the same sub-block), then this message results:

SN--061 No corresponding ULINEOFF directive for a ULINEON

It is their characterization as "directives" in the messages shown that suggested the name "Directive" for one of the structs found in the parameter block.

Memory Protection Faults

This section discusses the situations in which gendev 4.1 emits the message

Abnormal program termination: Memory protection fault

and halts.

This occurs when these device functions:

%add() %divide() %remainder() %subtract() 

are used with two literal arguments directly with %image() and %text. The specific function sequences tested were:

%image(%add(1,2))       %image(%divide(2,1))
%image(%remainder(1,2)) %image(%subtract(2,1))
%text(%add(1,2))        %text(%divide(2,1))
%text(%remainder(1,2))  %text(%subtract(2,1))

These function sequences resulted in the Memory protection fault in these blocks:

START :PAUSE  DOCUMENT :PAUSE  DOCUMENT_PAGE :PAUSE  DEVICE_PAGE :PAUSE

Binary Output Values

This section discusses the output values of these device functions:

%binary() %binary1() %binary2() %binary4()

If the value passed to %binary() requires more than one byte to express it, then only the lower-order byte is used by gendev when the function is compiled. %binary1() behaves the same way.

%binary2() is used in a few drivers, but only with the parameter "0", producing two null bytes. Testing shows that will encode two-byte values, in little-endian form. If given a value that requires more than two bytes, only the lower-order two bytes are encoded.

%binary4() is not used, so far as I can tell, in the source files available to me. Testing shows that gendev emits this error message

SN--001: Number is too large or contains invalid characters

for values above "$7FFFFFFF". Testing also showed that "$7FFFFFFF" was compiled into "0xFFFFFFFF". Additional testing only increased my confusion, and there is no point to it anyway, since %binary4(), as noted above, is not used.

Consideration should be given to adding a new Type I device function, %nulls(), which would take a numeric parameter and generate the indicate number of nulls, if gendev/wgml is ever released for general use.

Compiled Form

Introduction

When I first encountered the compiled form of the function block, I looked at this very simple :VALUE. block:

:value.
   %text( "Just a test of the DOCUMENT PAUSE block." )
:evalue.

but I presented it like this:

2C 00 
FF FF 00 16 25 00  J  u  s  t     a     t  e  s
t     o  f     t  h  e     S  T  A  R  T     P
A  U  S  E     b  l  o  c  k  .

This turned out to be a mistake, one of many made in the course of my investigation: although 0x002C was clearly the length, I did not actually show 0x002C bytes. I was also misled by the use of "0x00" to designate a CodeBlock compiled from a :VALUE block into believing that two nulls followed the CodeBlock.

A more accurate picture of the encoding can be given by enclosing the data bytes within a tabular array:

2C 00 
     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 FF FF 00 16 25 00  J  u  s  t     a     t  e  s
0010  t     o  f     t  h  e     S  T  A  R  T     P
0020  A  U  S  E     b  l  o  c  k  .  00 

This made it clear that, for device function %text() with an explicit string parameter, the CodeBlock ends with a "0x00" byte. When I began analysing the :DRIVER block, I quickly realized that the byte after this "0x00" byte was, in fact, a designator for the next CodeBlock (if there was a next CodeBlock).

I also drew these conclusions:

  • The function %text() is compiled as "0xFF 0xFF 0x00 0x16".
  • The value "0x25 0x00" is a count of the number of bytes in the parameter of the function %text(), but not the final "0x00".

The second conclusion is easier to verify if the same format is applied to the parameter:

25 00  
     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000  J  u  s  t     a     t  e  s  t     o  f     t  
0010  h  e     S  T  A  R  T     P  A  U  S  E     b  
0020  l  o  c  k  .

I then considered a more complicated (and more realistic, for a :PAUSE block) :VALUE. block:

:value.
   %text( "Just a test of the DOCUMENT PAUSE block." )
   %text( "Press enter to start the document." )
   %recordbreak()
   %wait()%clearpc()
:evalue.

which, presented in the same format as before, is compiled as:

5D 00 
     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 4E 00 00 16 4A 00  J  u  s  t     a     t  e  S
0010  t     o  f     t  h  e     D  O  C  U  M  E  N
0020  T     P  A  U  S  E     b  l  o  c  k  .  P  r
0030  e  s  s     e  n  t  e  r     t  o     s  t  a 
0040  r  t     t  h  e     d  o  c  u  m  e  n  t  .
0050  00 01 00 00 01 01 00 00 25 FF FF 00 1E 

These conclusions appear to be drawable:

  • The strings in the two %text() function invocations are concatenated into a single parameter.
  • The function %text() now appears to be compiled as "0x4E 0x00 0x00 0x16": thus, the actual encoding may by "0x00 0x16", that is, two bytes rather than four.
  • If the "0x4E 0x00" is a count, it counts the same bytes as the "0x4A 0x00", just including "0x00 0x16 0x4A 0x00" in the count. In particular, it does not count the "0x00" after the string.
  • If the "0x00" after the string is taken as part of the encoding of function text and the encoding is taken as four bytes, the remaining encodings are:
01 00 00 01
01 00 00 25
FF FF 00 1E

It was suggested by my co-implementor that:

  • Perhaps the "0xFFFF" string marks the last or only function.

When I first tested a Type II function, a further clarification resulted. The :VALUE block is:

:value.
   %text( "Just a test of the INIT block." )
   %text( %font_outname1() )
   %recordbreak
:evalue.

and the result, displayed as above, is:

46 00 
     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 22 00 00 16 1E 00  J  u  s  t     a     t  e  s
0010  t     o  f      t  h  e     I  N  I  T     b  l
0020  o  c  k  . 00 1A 00 10 16 FF FF 0D 00 FF FF 00 
0030 00 00 00 00 00 37 00 00 FF FF FF FF 00 00 00 00 
0040 00 00 FF FF 00 01

This illustrates two things:

  • "0x10 0x16" is a %text() device function with a single-parameter parameter block.
  • Parameter blocks, even when the parameter is a Type II function, are quite different from the struct used for Type I functions.

If the error message used to identify Type II device functions is recalled:

DF--008: This tag at start of device function sequence is invalid

it is now possible to characterize a function block as:

a linked list of device function sequences

where each sequence is headed by a Type I device function.

The struct used for this linked list is:

FunctionList {
    uint16_t  offset;
    uint8_t   parameter_type;
    uint8_t   byte_code;
}

The field offset counts the number of bytes from the field byte_code to the first byte of the next offset. This field will have the value "0xFFFF" when the last compiled function is reached.

The following sections discuss the values of the fields parameter_type and byte_code.

Two notes on investigative technique:

  • In many cases, where the visible test was not clear enough as to what was going on, wgml's screen output was redirected to a text file and that file was examined using wdump.
  • In those same cases, the output file was also examined using wdump.

Literal Parameters

This section discusses literal parameters and how they are treated by gendev when it is compiling a function block.

For %image() and %text(), this is quite simple: no parameter block is present, instead an instance of this struct appears:

CharParameter {
    uint16_t  count;
    char      data[count];
    char      null = 0x00;
}

It might be thought that having both a count and a null terminator is a bit much, but, as will be seen shortly, the value of field data can include nulls, so the value of field count must be used to extract the data.

This may not apply to %text(), since text editors don't generally allow non-characters to be entered into strings; however, once a function is written to process a CharParameter for output for %image(), that function will probably be used for %text() as well. For one thing, such a function must take output record length into account as well; it will not be a simple "write the chars and return" function.

A CharParameter used with %image() can also contain null bytes in its data field, but for a very different reason.

Consider this function block:

:value
   %binary(3) %binary1(4) %binary2(5) %binary4(6)
:evalue

gendev compiles this into:

FF FF 00 15
   08 00 03 04 05 00 06 00 00 00 00

which is indistinguishable from a compiled %image() device function.

Although the exact process cannot be reconstructed, it can be conceptualized as having two steps:

  • each of the %binaryN() functions was convertd to the compiled form of an %image() function with the correct byte(s) in its CharParameter; then
  • since the resulting forms met the criteria for being merged, they were merged into a the compiled form of a single %image() function whose CharParameter had, for the value of its data field, the merged values of the data fields of all of the CharParemeter blocks formed in the first step.

This preprocessing of literals by gendev extends to several other cases. Note: although all tests were performed with %image(), %text() very likely behaves the same way.

The Type II device function %lower() is documented in the README file produceable from the WGML 3.33 Update by the statement "returns the lower case of the string" (where "the string" is its parameter). When the parameter is a literal string, this is actually done by gendev, so that both of these invocations:

%image(%lower("SUZY"))
%image("suzy")

are compiled as:

08 00 00 15 
   04 00 73 75 7A 79 00                             suzy

The Type II device functions %hex() and %decimal() convert their numeric parameters into hexadecimal and decimal character representations (respectively). If they are, in turn, used as the parameter of %image(), then that representation is treated as a character literal. Thus, both of these invocations:

%image(%hex(15))
%image("f")

are compiled as:

05 00 00 15
   01 00 66 00                                      f

and both of these invocations

%image(%decimal(15))
%image("15")

are compiled as:

06 00 00 15
   02 00 31 35 00                                   15


If the parameter is a character literal, something strange happens:

%image(%hex("fred"))
09 00 00 15 
   05 00 31 37 37 35 63 00                          1775c

The parameter is, of course, supposed to be numeric.

The type II device functions %add(), %divide(), %remainder(), and %subtract() take two numeric parameters and produce a numeric result. If the parameters are both literals, then the result is treated as a literal. So far as I can tell, this applies to any level of inclusion. For example, the invocation

      %image(%decimal(%add(3,%add(3,2))))

is compiled as:

05 00 00 15
   01 00 38 00                                      8

since 3 + 3 + 2 is 8. Similarly, the invocation

%binary(%add(%add(3,%add(3,2)),15))

is compiled as:

05 00 00 15
   01 00 17 00                                      0x17

since 15 + 8 is 23 in decimal, which is 17 in hexadecimal notation.

This behavior has implications for parsing these functions, not for wgml, which can simply emit the characters in the CharParameter for each %image() function it encounters, but for the research programs copparse and cfparse, which are intended to be completed with a parser that provides enough information to reconstruct the source code for a given binary device file.

For copparse and cfparse, this will probably turn out to be the best method possible:

  • Non-character bytes will be presented as arguments to %binary().
  • Character bytes will be presented, as strings, as arguments to %image() or to %text(), depending on which binary code (0x15 or 0x16) is present.
  • It will not be possible to distinguish such invocations as "%image(%decimal(%add(3,%add(3,2))))" from the form in which they will be reported, "%image('8')".
  • It will not be possible to distinguish such invocations as "%image(%lower("FRED")) from the form in which they will be reported, %image("fred").

So the goal has to be, not to reproduce the source file (impossible in any case since :CMT. lines are not compiled), but to produce a source file which, when processed by gendev, produces the same binary file which copparse or cfparse analysed.

Parameter Blocks

These are the values of field FunctionList.parameter_type which have been seen and their inferred meanings:

  • 0x00 indicates no parameter block at all.
  • 0x10 indicates that a parameter block is present.

The values shown do not fully determine the behavior of wgml: while "0x10" shows that a the parameter block is present, "0x00" does not mean that no parameter is present, only that no parameter block exists, as can be seen from the examples above using %text(). Thus, wgml must use its own knowledge of the %image() and %text() to determine what to expect:

Parameter Type  Function           wgml Action
0x00            %image()           expect character parameter only
0x00            %text()            expect character parameter only
0x00            other              expect no parameter
0x10            any                expect parameter block

The remainder of this section discusses parameter block structure. It only applies in cases where the Parameter Type is "0x10".

If a character parameter is given in the source, then the CharParameter struct, discussed in Literal Parameters, is used as part of the parameter block. Several other structs are also useful in understanding parameter block structure.

The following discussion is based on limited testing. When the parsing code is written, it will be modified as needed.

The first struct encountered is a Parameter struct:

Parameter {
    uint16_t offset1;
    uint16_t offset2;
    uint16_t offset3;
    uint16_t offset4 = 0x0000;
}

Discussion of how these fields are used must be postponed until the full structure of the parameter block has been examined.

Each parameter block begins with a header:

ShortHeader {
    Parameter parameter;
}

or:

LongHeader {
    Parameter parameter;
    uint16_t  value = 0x0000;
    uint16_t  nulls = 0x0000;
}

Each parameter is encoded in this struct:

Directive {
    char      op_code;
    Parameter parameter;
    uint16_t  value;
    uint16_t  nulls = 0x0000;
}

Now the various fields will be discussed.

The field nulls contains two null bytes under all tested device function sequences.

The value of the field value will always be "0x0000" in a LongHeader instance; in Directive instances where the parameter was not given as a numeric literal, its value will also be "0x0000", but when a literal numeric parameter was used, then it will contain the value of that literal.

The values observed for the field op_code are:

  • 0x00 if a literal character parameter was given, in which case a CharParameter is appended to the LongData instance with the value of that parameter.
  • 0x3C if a literal numeric parameter was given, in which case the field value in the LongData instance contains the value of that parameter.
  • other values are the byte code of the Type II device function which was used as the parameter.

The ShortHeader struct has only been seen used correctly when the first parameter is a character string. If there are no parameters, or there is only one parameter, whether it is a character string or not, or if the second parameter is a character string but the first is not, then the LongHeader struct is used. When device function %sleep() with a non-literal parameter was set up for interpretation, however, it turned out that, in this case, a ShortHeader is used. As noted below, however, it was encoded in such a way that the interpreter expects a LongHeader. Initially, the interpreters were written to compensate for this; however, the problem goes deeper: if the device function used with sleep() itself takes a parameter, then the offset to that parameter is also affected -- and the nature of the interpreter makes correcting that problem very difficult. The device function %sleep() implementation, therefore, does not make any correction, and will produce an error message.

Now the fields in struct Parameter can be characterized. It may be helpful to present a table of multiples for Directive instance lengths, since these are the values used in the illustrations:

Nr of Instances   Total Length
0                 0x0000
1                 0x000D
2                 0x001A
3                 0x0027
4                 0x0034
5                 0x0041
6                 0x004E
7                 0x005B

To illustrate the use of the field Parameter.offset1, consider this schematized compiled function sequence (the Parameter.offset1 fields are emphasized):

Function sequence: 
%image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number())))
%image(): 10 15 
LongHeader:  FF FF 0D 00 FF FF 00 00  00 00 00 00
Level 1:     0E 00 00 1A 00 41 00 00  00 00 00 00 00
  Level 2:      0F 0D 00 27 00 34 00 00  00 00 00 00 00  
    Level 3:       34 1A 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 1A 00 FF FF FF FF 00  00 00 00 00 00
  Level 2:      11 0D 00 4E 00 5B 00 00  00 00 00 00 00 
    Level 3:       34 41 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 41 00 FF FF FF FF 00  00 00 00 00 00

This suggests that the field Parameter.offset1, when not "0xFFFF", when added to the start of the first Directive following the parameter block locates the first byte of the Directive instance representing the function of which the current Directive instance represents a parameter.

These values of Parameter.offset1 appear to have special meanings:

  • 0xFFFF indicates that this is a ShortHeader or LongHeader instance;
  • 0x0000 indicates that this is a parameter of the Type I device function standing at the head of the function sequence.

To illustrate the use of the field Parameter.offset2, consider this schematized compiled function sequence (the Parameter.offset2 fields are emphasized):

Function sequence: 
%image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number())))
%image(): 10 15 
LongHeader:  FF FF 0D 00 FF FF 00 00  00 00 00 00
Level 1:     0E 00 00 1A 00 41 00 00  00 00 00 00 00
  Level 2:      0F 0D 00 27 00 34 00 00  00 00 00 00 00  
    Level 3:       34 1A 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 1A 00 FF FF FF FF 00  00 00 00 00 00
  Level 2:      11 0D 00 4E 00 5B 00 00  00 00 00 00 00 
    Level 3:       34 41 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 41 00 FF FF FF FF 00  00 00 00 00 00

This suggests that the field Parameter.offset2, when not "0xFFFF", when added to the start of the parameter block locates the first byte of the Directive instance representing the first parameter of the function represented by the current Directive instance.

The value of Parameter.offset2 in a ShortHeader instance is "0x0009", matching the number of bytes in a ShortHeader instance, just as the value "0x000D" found in a LongHeader matches the number of bytes in a LongHeader. These are the only two values observed in either type of header. As noted above, device function %sleep(), with a non-literal parameter, is encoded using a ShortHeader but with a value of "0x000D" for Parameter.offset2, which is, of course, wrong and may explain why device function %sleep() hangs wgml 4.0.

To illustrate the use of the field Parameter.offset3, consider this schematized compiled function sequence (the Parameter.offset3 fields are emphasized):

Function sequence: 
%image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number())))
%image(): 10 15 
LongHeader:  FF FF 0D 00 FF FF 00 00  00 00 00 00
Level 1:     0E 00 00 1A 00 41 00 00  00 00 00 00 00
  Level 2:      0F 0D 00 27 00 34 00 00  00 00 00 00 00  
    Level 3:       34 1A 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 1A 00 FF FF FF FF 00  00 00 00 00 00
  Level 2:      11 0D 00 4E 00 5B 00 00  00 00 00 00 00 
    Level 3:       34 41 00 FF FF FF FF 00  00 00 00 00 00
    Level 3:       28 41 00 FF FF FF FF 00  00 00 00 00 00

This suggests that the field Parameter.offset3, when not "0xFFFF", when added to the start of the parameter block locates the first byte of the Directive instance representing the second parameter of the function represented by the current Directive instance.

The field Parameter.offset4 is always 0x0000. It was so-named because it is tempting to think that it was intended to provide the offset to a third parameter, but is never used for that purpose because no device function, in its compiled form, has more than two parameters.

Device Function Code Bytes

This table lists the code bytes found for all known device functions:

Device Function    Code Byte
%add()             0E
%binary()          09
%binary1()         09
%binary2()         0A
%binary4()         0B
%cancel()          24
%clear3270()       1F
%clearPC()         1E
%date()            3B 
%decimal()         0C
%default_width()   27
%divide()          10
%dotab()           23
%endif()           1C
%enterfont()       06
%flushpage()       1D
%font_height()     31
%font_number()     28
%font_outname1()   37
%font_outname2()   38
%font_resident()   39
%font_space()      32
%getnumsymbol()    12
%getstrsymbol()    13
%hex()             0D
%ifeqn()           1A
%ifeqs()           18
%ifnen()           1B
%ifnes()           19
%image()           15
%line_height()     33
%line_space()      34
%lower()           14
%page_depth()      2A
%page_width()      2B
%pages()           35
%recordbreak()     01
%remainder()       11
%setsymbol()       17
%sleep()           26
%subtract()        0F
%tab_width()       29
%text()            16
%textpass()        20
%thickness()       30
%time()            3A
%ulineoff()        22
%ulineon()         21
%wait()            25
%wgml_header()     36
%x_address()       2C
%x_size()          2E
%y_address()       2D
%y_size()          2F

When the list is sorted by byte code, several ranges of values are missing:

  • 0x00 before %recordbreak();
  • 0x02 through 0x05 between %recordbreak() and %enterfont(); and
  • 0x07 and 0x08 between %enterfont() and %binary()

The highest value is "0x3B"; however the values "0x00" and "0x3C" are used to designate literal parameters in the Directive struct discussed in the section on parameter blocks.

Projected Code for wgml

This section has been moved and reformatted because the output code has advanced far beyond the point reached in this section.

Personal tools