In the implementation of the ParaWeb JPRS, we modify the Java interpreter to allow threads to be instantiated on any remote machine that is running the modified interpreter. The Java interpreters on each machine coordinate with each other to maintain the illusion of a global shared address space for the application they are running, in much the same way as distributed shared memory systems [9][6][1] maintain the illusion of shared memory. A crucial element in this approach is that the definition of the Java Virtual Machine remains unchanged, as does the byte-code of existing Java programs, so that no new libraries or annotations need be added to existing Java programs. Thus, any compiled Java application utilizing threads may be executed in parallel, by distributing those threads to different machines. Clearly, one would only want to distribute compute intensive threads, but determining a priori which threads are compute intensive is a non-trivial task.
Our implementation of the Java parallel runtime system relies on the existing Java mechanisms for concurrency and synchronization. In Java, a new thread is created upon the instantiation of an object that extends the Thread class. Threads may share any object to which they have a reference. Synchronization is achieved using a monitor-like facility. The keyword synchronized placed in front of a method definition implies that any thread executing that method must gain exclusive access rights prior to executing the method. Within a synchronized method, a thread may call wait, to temporarily halt execution of the thread and allowing another thread to execute a synchronized method in that class. The original thread resumes execution only when another thread calls signal.
Figure 2 shows an example of a parallel matrix multiplication using Java threads. A series of threads are created, one per row of the result matrix. Each of the threads is passed a reference to the two input matrices and the result matrix, and each thread in turn calls a method to multiply the row it has been assigned (the constructor, which accepts the parameters, is called when the class is instantiated, and the run method in the class is executed when the start method is called on that class).
Figure 2: Example Matrix Multiply code
In the JPRS, upon the invocation of a thread, the modified interpreter on the local machine (the client) contacts one of the other machines also running the modified interpreter (the server). The client must then upload the class that is to be executed, along with any classes that may be referenced by this class, to the server. To do this, upon instantiation of a thread on a remote machine, replicas of all classes are made on the remote machine. Then, consistency mechanisms ensure that the replicas on the different machines remain consistent during the course of the parallel computation.
Consistency enforcement occurs at the synchronization points in the program, using the notion of release consistency used in many DSM systems (Munin and Treadmarks, for example [6][1]). Release consistency assumes that all synchronization operations are classified as either acquire or release operations. The underlying coherence protocol relies on these acquire and release operations being visible to the system, and on the correct use of these acquire and release operations. The mechanisms used to provide consistency in ParaWeb make use of Java's built-in synchronization facilities. Each of the Java synchronization points is mapped onto the acquire and release operations of the release consistency model. That is, for the purpose of consistency, calling a synchronized method in Java is equivalent to an acquire operation, and exiting a synchronization method is equivalent to a release operation. Whenever a thread calls wait inside a synchronized method, that wait operation is treated like a release operation followed by an acquire. A signal operation is treated like a release operation.
The prototype implementation of ParaWeb is being designed with an update-based protocol, in which all replicas of a class are updated whenever a synchronization point equivalent to a release operation is invoked. Future enhancements to this protocol are planned which use more sophisticated coherence protocols.