The implementation of the Rubato system was broken into two halves which are essentially independent. The first half was the Rubato compiler, while the second half was a combined assembler, linker, interpreter and debugger (an executor). In addition, a player is available to perform music generated by the compiler-executor pair. The compiler translates source files written in the Rubato language into assembly files. Assembly files are essentially text files containing one Rubato machine instruction per line. In addition, some pseudo instructions were generated by the compiler which are intercepted by the assembler. The assembler will also allow labelling of instructions and data, and will attempt to resolve forward label references generated by the compiler. Similarly, the interpreter generates a player file containing a list of notes to be played during the music performance. This player file is read in by the player.
The compiler is called rc (which stands for Rubato Compiler) and the other half of the implementation is called rx (Rubato
The main reason for the separation of the compiler and executor was to isolate the Rubato language from the Rubato machine. As these are actually independent designs, they should be implemented independently. Furthermore, since the Rubato language uses the CMN model of music representation and does not have implicit performance rules built into the model of representation, it was decided that the performance rules should be encoded in the executor.(18)
The current version of the interpreter will not actually drive the music performance hardware directly. Instead, the details of actual music performance were left to yet another program called the player. Although the player is not technically part of the Rubato system proper, it is however essential for music performance. This separation is done so that the executor can be implemented as quickly and as painlessly as possible. Ideally, the player should be incorporated into the interpreter so that real-time performance can be achieved by the interpreter. Rather than attempting to design a player from scratch, an existing player was used and the interpreter was designed to generate files that can be read in by the player for music performance. The player chosen for this task was the Adagio player developed at Canegie-Mellon University as part of the CMU MIDI Toolkit.[3] The reasons Adagio was chosen were mainly due to:
This was the first part of the Rubato system to be implemented. A parser for the Rubato language was developed jointly with the design of the language from the EBNF grammar. The yacc(1) parser generator in Unix Version 8 was used to generate an LALR(1) parser for the Rubato language. However, the lexical analyser was written by hand in C. Although the Unix lexical analyser generator lex(1) could have been used to generate a lexical analyser for the Rubato language as it is currently specified, when the language was in its initial stage, a hand-written lexical analyser was necessary due to an early condition that required the lexical analyser to be context sensitive and also emit more than one token for certain characters seen on input depending on context. Fortunately, when the language was finalised, most of these context-dependent irregularities were removed.
The compiler used the strategy of generating a parse tree for semantic analysis and code generation. As it currently stands, the compiler will attempt to build a parse tree for each global definition. It then walks through the parse tree performing semantic analysis. Some semantic transformations may be effected by either the parser or the semantic analyser to reduce relative expressions to normal expressions and to convert delaymul and delaydiv attributes to normal delay attributes. Identifier typing is done by the parser and flagged to the lexical analyser. Hence the lexical analyser is still to some degree context sensitive, but the context sensitivity is limited to identifier typing. This was done in order that the grammar of the language be less ambiguous and to reduce the number of conflicts reported by the parser generator.
One drawback of the parse tree approach is the handling of nested object definitions with the same name. Currently, the compiler has a bug that disallows nested definitions to reuse an identifier. Also, building a complete parse tree for each definition takes up a lot of memory, so the compiler can run out of memory if the definition is too large.
Code generation was relatively straightforward, partly because the design of the Rubato machine was optimized for easy code generation by the compiler. A decision was taken to make the compiler generate assembly code rather than true machine code. Generating assembly code relieves the compiler from maintaining an image of the code and data stores in memory and keep track of code and data locations and addresses. By generating labels as a substitute for store locations, the compiler is no longer responsible for keeping track of forward jumps or references, leaving it to the assembler or linker to fix up forward label referencing. Also, generating assembly code allows the compiler output to be human readable and this is an aid to debugging the compiler.
The compiler makes only a single pass through the source file, hence it is relatively quick. It also makes only a single traversal of the parse tree in most instances in order to improve efficiency. Memory for the parse tree is allocated dynamically.
Identifiers are kept on a hash table of singly linked lists. This is the symbol table. As the level of nesting of definitions get deeper, new entries are inserted into the hash table into the head of the linked list. This makes the insertion of symbols into the symbol table a cheap operation. In addition, during symbol table searches, the closer the level of definition of the identifier is to the current level of definition, the faster the identifier can be retrieved from the symbol table. Global identifiers end up at the tail of the linked lists, hence searching for global identifiers is a slow operation. All symbols in the current block of nesting may be deleted by a single function call which is relatively cheap since all symbols belonging to the current block will always be at the head of each linked list in the symbol table.
Keywords are not kept on the symbol table. Instead, they are stored in a fixed array. Keyword searching is accomplished using a binary search algorithm.
The following optimizations are performed by the compiler:
Currently, the grammar for the parser generator does not include error productions, so no error recovery is attempted should a syntax error occur in the source file. This is not nearly as bad as it sounds, as the compiler is reasonably fast as to not preclude the time honoured strategy of one-error fixes followed by a recompile. Furthermore, the parser generator version available on Unix Version 8 gives excellent diagnostics should a syntax error occurs. It attempts to print out the current state in a production rule within the grammar as well as the offending token.
The implementation of the compiler is portable across two different system environments. Currently, the rc compiler runs on the IBM PC/XT on MS-DOS as well as the DEC VAX 11/780 on Unix Version 8.
The executor, in comparison to the compiler, was much easier to implement than the compiler, even though it had to combine the functionality of an assembler, a linker, an interpreter and a debugger.
The assembler only makes a single pass through the assembly files specified on the command line. It assembles instructions and data directly onto the interpreter's code and data stores.(19) Forward label references are handled by keeping a linked list of locations in the code store that has to be fixed once the label is defined. The assembler maintains a symbol table which is a hash table of linked lists. The assembler symbol table is very similar to the compiler's symbol table except that the symbol table has no concept of scope or nesting. Also, the mnemonics for the assembly language instructions as well as the pseudo instructions are also kept on the symbol table for quick searches. The function that deletes all labels in the symbol table is careful enough not to delete symbols which are really assembler mnemonics and public labels.
Pseudo instructions are recognized by the assembler and acted upon. The pseudo instructions recognized by the assembler are:
const n
data n
udata n
public l
extern l
public)
The linker is an extremely simple linker. All it does is resolve external and public labels between assembly files. It is really an extension of the assembler.
The interpreter is the most complex part of the executor. The code and
data stores are simply arrays holding code instructions and data values.
In the IBM PC/XT implementation, these arrays were declared as far
arrays(20) so that the compiler will allocate the arrays in a different
segment. Hence the implementation will allow up to
16K of code and 16K of data
using up a total of 128K of bytes in the host system, even though the IBM
PC/XT has only a 16-bit addressing capability. On the DEC VAX 11/780, these
arrays were simply declared normally as the VAX has 32-bit addressing.
The main body of the interpreter is a execution loop that fetches an instruction from the code store, obtains the address of the function corresponding to that instruction via a jump table and then executes the instruction. If more than one VIEU is currently active, each VIEU will execute one instruction before passing control to the next VIEU in a round-robin fashion. All active VIEUs are kept on a doubly-linked circular list. If a VIEU is unable to execute an instruction (because it is waiting for child VIEUs to die) control passes on to the next VIEU in the circular list. If no VIEU can execute an instruction, then a deadlock situation occurs and the interpreter will terminate execution and enter the debugger. VIEUs are allocated dynamically, so the maximum number of active VIEUS that can execute concurrently is bounded by the amount of memory available to the interpreter.
The interpreter maintains the following data structures, all allocated dynamically:
The activation record. This holds information pertaining to the activation of a procedure or function or subroutine:
The environment record. This holds all the local definitions in the current block:
Register set data structure. There is one register for each note attribute.
The debugger is a late addition to the executor. It basically emulates the facilities provided by debuggers available in most programming environments. It debugs at the machine level. The following commands are recognized by the debugger:
Each note generated by the interpreter into the output file has the following format, on one line of the player file:
ttimeppitchudurationlloudnessvchannelzpatch
The time of the note is the onset time, i.e. the time the note should start sounding given by the number of centiseconds into the piece. The pitch of the note is the pitch number - 12. The duration of the note is the length of time in which the note will sound, in centiseconds. The loudness of the note corresponds to the note velocity. The patch and channel of the note corresponds with the equivalent attribute in the Rubato language.
Next: Chapter 8: FINALE: A ROOM WITH A VIEU
Previous: Chapter 7: THE IMPLEMENTATION OF THE RUBATO SYSTEM
Back to: Table of Contents