We chose to implement our algorithm in C++, although most other procedural languages would have also sufficed. If the reader wishes to implement the algorithm or perhaps even improve upon what we provide, this chapter provides details of our implementation. This chapter also serves as the documentation for our program.
Sample Input and Output
The following illustrates the required input format.
a bc c da b be d dd e e @ 100000 a 10000The homomorphism is specified with one production per line. Each line is of the form ax (spaces and tabs are ignored) where Phi(a)=x. The @ symbol signifies the end of Phi's specification. Finally n, a, and i follow, delimited by white space.
Here we have the coresponding output.
*************The Alphabet: abcde *************The Homomorphism: e => e d => dd b => be c => da a => bc Vital: abcde Mortal: Recursive: abcde Expanding: d Exponential: acd Find: Find called with n=100000,a=a,i=10000. bb Case 3 -1<99998: 49999 24999 12499 6249 9374 10936 10155 9764 9959 10057 10008 9983 9995 10001 9998 9999 (poly) n=9999,a=b,i=10000. ee Case 2 Basic Tree Descent called with n=0,a=e,i=1. Answer, character number 10000 of Phi^100000(a): 'e'The output begins by printing the alphabet, the homomorphism, and the sets of vital letters, mortal letters, and so on. When Find is called, it reports the values of n, a, and i, and then starts descending the derivation tree. In this case, the derivation path starts at 'a', and proceeds to 'b' and then encounters another 'b'; a repeated letter is found. We see that our algorithm enters Case 3 and does a binary search for t = 9999. Since the value of x is polynomial, the algorithm starts again with n=9999, a='b' and i=10000. This time we enter Case 2, BasicTreeDescent is called, and the result 'e' is reported.
A set of symbols is needed to represent things such as the letters accessible
from 'a', or the set of mortal letters with respect to Phi.
The SymbolSet class stores this data as an array of sixty-two integers
which correspond to the occurrences of each of the upper and lower
case letters and ten numerals.
The SymbolSet class has methods to add and delete letters to a set, to
test whether a set contains a specified letter, to test whether two
sets are equal or disjoint,
to supply the Parikh vector or a string corresponding to a given set,
and to supply the cardinality of the set itself.
With the intention of eventually replacing the data type used to store
integers with a data type more capable of holding larger numbers,
a separate module is used to define the LargeNumber data type.
Initially, LargeNumbers were simply defined as doubles and,
as we will see below, replaced with
a data type capable of storing arbitrarily large numbers.
The code dealing with matrices and vectors is found in the
MatrixUtil module. While there are many publicly available
methods to implement matrices, we also require that our matrices
have LargeNumbers as elements.
Naturally, we use an array of LargeNumbers to store the data.
In addition to the standard operations one might expect in a Matrix class
(such as specifying the value of an element, or multiplying two matrices),
we include the procedures Power and SuperPower as special methods.
Recall that we require Power and SuperPower to restrict each incidence matrix to only the letters accessible from a given word. Hence, we also include another method Clean to remove the rows and columns corresponding to these unaccessible letters. At first, Clean merely set the elements of each unnecessary row and column to zero, but too much time was wasted adding and multiplying zeros together. The method we finally arrived at stores two different values for the size of the array: one value to describe the number of elements allocated in memory, the other to specify the how many rows and columns are actually being used. Thus, the procedure Clean entails copying the useful elements of the matrix to the top left corner, and then reducing the number of rows and columns begin used.
Our most important data structure is our homomorphism.
Paired with an alphabet, a homomorphism becomes a D0L-scheme and hence the
DOLScheme is the most important class our algorithm deals with.
The DOLScheme class has methods which allow us to create its incidence
matrix and to calculate the various sets of symbols
such as the polynomially growing letters.
More importantly, however, the procedures Length, BasicTreeDescent,
GenericLength, and Find are also part of the DOLScheme class.
The DOLScheme stores the alphabet Sigma as a SymbolSet
and the homomorphism (implemented separately as the class Homom)
is a linked list of character-string pairs.
As mentioned above, the DOLScheme module contains a substantial amount of our algorithm. The details of our algorithm have been explained and defined in the technical report mentioned above and our program is mostly a straightforward implementation of the given pseudocode. We say `mostly' since the pseudocode does not provide tedious details such as those showing how to manage the allocation and deallocation of memory, to restrict a matrix to the letters accessible from a given word, or to implement a binary search. Since such details are not terribly enlightening, we omit them here.
In BasicTreeDescent it is not necessary to store any details
of the derivation tree. In Find, however, we need to keep track of
the derivation path and the branches not taken at level j, that is,
aj, xj, and yj.
The class DerivTree is written for such a purpose and contains several
useful methods.
For instance, we have methods to set and read the values of
aj, xj, and yj,
to find whether a repeated letter has been
encountered yet, and to return the set of letters contained in
x and y (useful for determining whether x
or y is exponentially growing, for example).
Again, we simply use two arrays of SymbolSets and one array
of letters to store this data.
While the Interface module and the Main module do not define any
data structures or classes, they do provide a means
to call the algorithm with a variety of inputs and parameters.
The Interface module allows the user to
input homomorphisms and arbitrarily large integers.
The Main module determines whether the user should be prompted for
input or not, whether BasicTreeDescent or Find is called, and
whether tracing should be turned on or off.
Once all the necessary data has been input, then the ith letter of
Phin(a) is calculated.
Since both the Interface module and the Main module
are not vital to our algorithm,
they can be replaced by any other program that wishes to call Find.
The requirements for our package are fairly simple. In addition to being able to assign the value of one integer a to another integer b, we required the ability to compute a+b, a-b, a*b, a mod b, floor(a-b), and to determine whether a<b, a=b, a>b. Since most multiple precision packages already have this capability, I chose to use the Berkeley MP package which some Unix systems provide (in our case, SunOS version 4.1).
There are many other multiple precision packages available if the reader wishes to use some other package. To do so, the LargeNumber module need only incorporate the new multiple precision package to provide the following functions.
move(a,b) : moves the value of a to b
madd(a,b,c) : puts the value a+b into c
msub(a,b,c) : puts the value a-b into c
mult(a,b,c) : puts the value a*b into c
mdiv(a,b,c,d) : puts the value floor(a/b) into c
and the value a mod b into d
mcmp(a,b) : returns 0 if a=b, 1 if a>b,
and -1 if a<b
mprint(a) : simply outputs the value of a in decimal
itom(i) : returns a pointer to a new variable initialized to i
mfree(a) : frees the memory allocated for the variable a.
An unfortunate circumstance of Berkeley MP occurs when calculating expressions such as j<- tq+r. Such calculations must be broken up into the two statements: mult(t,q,j); madd(j,r,j);. which render the code a little less readable. In retrospect, the capabilities of C++ could have been used to define operators between a newly defined multiple-precision class --- as opposed to just the data type it is now --- which would have allowed for more complex expressions.