Last updated 12-15-04
In this paper we will discuss a method for programming and implementing a PVM style application across a network of machines varying in architecture and OS styles. We will also purpose a framework for writing programs to take advantage of this technology, as well as discuss how to port existing multi-threaded applications to multi-process PVM style programs.
In the past there has been several approaches to the distributed program idea. One has been to go for the "Renderfarm" method. In this method the application is given the ability to communicate with other processes on other machines though a given protocol. However, this normally requires the creation of a protocol and a fair amount of knowledge of TCP/IP. Each program using this method must have it’s own way of finding the nodes, distributing the data, and running code on each of the nodes (i.e. there is no standard way of going about it).
To get rid of some of the complications of this last method, PVM was created. PVM (and the Beowulf clusters that run it) are based on an uniform API that allows multiple process to communicate with each other in a more transparent way. However, there is no mechanism for communicating between machines of varying architectures, especially when these machines differ in the way they order the bytes in a word or double word, known as little endian and big endian machines. There needs to be one uniform API for multiple processes on multiple architectures to communicate with each other in a transparent way.
Granted it is probably possible to modify and port the Linux PVM libraries to other architectures such as PPC, SPARC, MIPS, XScale, and Ia64. However these systems will still need to be running Linux for the API to communicate properly with the OS. Some companies and organizations would love to be able to get the benefit of PVM but are unwilling to switch to Linux and are happy with their current OS. Because of this, we must look in to a new method of PVM for a heterogeneous network.
HPVM will try to solve these problems by designing a Heterogeneous Parallel Virtual Machine (hereafter known as HPVM). We will also discuss how to implement a distributed disk drive that will allow all of the machines to access the disks of the other machines. The main goal is for a user to be able to create a single super machine from a collection of smaller machines.
The above figure demonstrates the way that HPVM will be implemented. On the Server Machine the Parent Application tells the HPVM Server to create several processes on the remote machines and to allow them to access some specific blocks of memory (known as shares). The HPVM Server then contacts the HPVM Client which receives the block of machine code to execute. At this point, the HPVM Client runs the Child Kernel which was downloaded from the Server Machine. The Child Kernel contacts the HPVM Client as it needs the memory shares from the Server Machine. These memory shares are then downloaded by the HPVM Client and handed to the Child Kernel.
Whenever the Parent Application is compiled, the code for the Child Kernels is given to a script that will compile the code for each architecture that the code will be executed on. These kernels will then be organized in a tree that that is organized first by processor architecture (x86, x86-64, MIPS32, MIPS64, etc.) and then by OS, and then by optimization. If no optimization for that specific chip is found there must be a generic kernel included under all OS nodes. This will allow for optimizations on processors where the kernels would notice a increase in speed.
Macros will be defined in the API for conversion to and from big endian and little endian ordering for data conversion to and from different machines.
On a more specific level, here is a sample Parent Application
// HPVM sample program.
// Calculate the square root of numbers from 0 to 1,000,000
// ----------------------------------------------------------
#include "stdio.h"
#include "hpvm.h"
#define MAX_NUMBER 1000000
#define CHUNCK_NUMBER 10000
#define START_NUMBER_ARG 0
#define END_NUMBER_ARG 1
void *spawn_callback(long id, long *size){
long *arg;
arg = malloc(sizeof(long) *2);
arg [START_NUMBER_ARG] = id *CHUNCK_NUMBER;
arg[END_NUMBER_ARG] = (id +1) *CHUNK_NUMBER;
size = sizeof(int) *2;
return (arg);
}
int main () {
float *endshare;
FILE *fp;
endshare = malloc(sizeof(float) * MAX_NUMBER);
hpvm_share("output", endshare, sizeof(float) * MAX_NUMBER)
hpvm_spawn("square.hkt", MAX_NUMBER/CHUNK_NUMBER, spawn_callback);
hpvm_unshare("output");
fp = fopen("output.dat", "w");
for (x = 0; x<MAX_NUMBER; x++){
fprintf("/l = /l /n", x, endshare[x]);
}
fclose(fp);
}
The Kernel is as follows:
// Square root Kernel
// -------------------
#include "math.h"
#include "hpvm.h"
#define START_NUMBER_ARG 0
#define END_NUMBER_ARG 1
int kernel_main (long id, long size, void args){
long x, y, i;
float *share;
x = (long)args(START_NUMBER_ARG);
y = (long)args(END_NUMBER_ARG);
share = hpvm_get_share("output");
for (i = x; i < y; i++) {
share[i] = sqrt(i);
}
hpvm_merge_share("output", share, x, y);
return (HPVM_OK);
}
The Parent Application allocates memory for the share, and tells hpvm to share it across the network. It then spawns a large number of processes and directs hpvm to use square.hkt as the kernel tree. Upon the completion of all of the tasks, the control is returned to the program which then saves the share, after unsharing it.
When the tasks are created, hpvm calls the callback for each task that it creates. It is up to the Parent Application to give hpvm a block of data to give each task. Hpvm passes an unique sequential id number (starting at 0). So if the Parent Application spawns 100 tasks, the callback function will be called with an id of 0 first, then increments it for each new task.
The kernel file is mostly self-explanatory. It first finds it's arguments out of the data block handed to it. Then it loads a share, and calculate the output numbers. After that, it merges the local share with the hpvm server share.
Here then is a prototype of the functions in the hpvm server:
hpvm_share(char *sharename, void *share, long sharesize);
Creates a share for use by the kernels.
char *sharename - A ASCIIZ string used to specify the name of the share.
void *share - The data to be shared
long sharesize - The size of the share in bytes
hpvm_spawn(char *kerneltree, long nofprocs, void *(callproc)(long id, long
*size));
Spawns the tasks
char *kerneltree - An ASCIIZ string that defines the file that contains the
kernel tree.
long nofprocs - The number of procs to spawn
void *(callproc) - The callback used to provide proccess spesific information
to hpvm.
hpvm_unshare(char *sharename);
Destroys a share (but not the data associated with it)
char *sharename - The name of the share to destroy.
Here are the prototypes for the functions in the hpvm cleint:
void *hpvm_get_share(char *sharename);
Gets the share *sharename and returns a void * to it.
int hpvm_merge_share(char *sharename, void *share, long start, long end);
Merges the given data with *sharename, starting at position start and ending
with end.
SMP machines will only start one client process, which will alert the server process to the fact of it having more than one processor. Then the server will send one kernel for all processors, and data for each processor. However shares will be shared by all processors. So if a kernel modifies a share, it will have the possibility of being read as modified by the second processor.
In order to connect to each machine, the Server must have access to a file called hpvm_hosts.cfg. This is a text file that looks as follows:
192.168.0.4 localhost
192.168.0.2 bigboy
192.168.0.3 laptop
192.168.0.10 palm
192.168.0.4 hpvm_host
The first column specifies the IP address (no DNS is supported at the time) and the second column is the friendly name for the host. This friendly name will be used in text messages, and logs. There must be at least one hpvm_host entry in every log. This will be the IP adress that is sent to the clients when they are configured, and will allow them to comunicate back to the server.
Note: HPVM is not secure by any means.It should be protected behind a firewall if connected to a LAN other than it'self. Any system on the network can easily jump and take control of the entire cluster.
This is the end of the current design notes. More should be added later.