
..:: Protocol for Staging Files to Remote Peers ::..

The file staging proces is the second of the two phases of the application
launch. In he first phase, the MPI class handles the -n and -r parameters
of the p2pmpirun command (number of processes and replicas). The MPD
decides where the processes should be match, and it mandates its local RS
to book these peers. In case the reservation is successful, the process
goes in the second phase (file staging).
In this second phase, the MPI class handles the -l parameter, which called
the \emph{transfer file}. The transfer file is a simple text file containing 
the filenames to be transfered. MPI sends the content of the file, as well as
the reserved peers, to its local FT. Its local FT stages the files (can be input 
data and programs) to remote FTs. When remote FTs have received the files, 
they launch the program, which attemps to synchronize at MPI_Init() with other
remote processes. 


1. Transfer Specification 

Each line of the transfer file contains a number of string tokens (blank separated). 
All string tokens except the CACHE_KEYWORD make up the absolute pathname, e.g
/home/john/my file with blanks.txt
is a valid filename.
If the last token matches the CACHE_KEYWORD (default 'cache') then it is
assumed the user requests the file to be stored in a persistent way on
each remote peer. For example,
/home/john/my file with blanks.txt  cache
is a request to cache the file.


2. Transfer protocol

There are two methods to transfer the files between FT_0 and another FT_i.

1- in a single message
2- as a separate network stream

Method 1- is reserved for small transfers, i.e if the total size of all files
to transmit does not exceed XFER_PROTOCOL_SWITCH_LIMIT (default 20 MB).
In that method, a single message is sent, containing the filenames, and
the data of all files. 

Method 2- is used if total size exceeds XFER_PROTOCOL_SWITCH_LIMIT. 
In that method, FT_0 sends a set of vectors that allow FT_i to determine 
for each file:
  - its name
  - its size
  - its md5sum 
  - if it should be cached

Upon reception of this information, FT_i builds a vector of booleans: for
each file, FT_i checks into the diskCache if the file is already present 
(using md5sums).
If already present the vector element is set to 'true' (notToBeSent).
FT_i opens a socket, puts the vector of booleans and socket port in a message,
which is sent back to FT_0. FT_0 can then start sending files that need it to
be at the given port. 



3. Detailed Protocol

The above description is detailed with the pseudo code below. 

FT_0
====
filenames {f1,f2,f3,...}
cached    {true,false,true,...}
md5sums   {m1,m2,m3,...}
sizes     {s1,s2,s3,...}


if ( \Sum_i s_i  > XFER_PROTOCOL_SWITCH_LIMIT )
then
	send FTMessage(cmd=FT_REQUESTPORT) to FT_1,FT_2, ....
else
	send FTMessage(cmd=FT_TRANSFER, contents[f1][f2][f3]) to FT_1,FT_2, ....
endif

if (FTMessage.cmd==FT_ASSIGNEDPORT, replyURI, notToBeSent[])
then
       sock := new Socket( replyURI )
       for each file number i (0 to n-1)
      	if (!notToBeSent[i])
               writeFileToSocket(filenames[i],sizes[i], sock.getOutputStream())
            endif
       endfor
end




FT_i (i>0)
==========

  if (FTMessage.cmd==FT_REQUESTPORT) 
  then
      sock := new SocketServer()
      portToTransferData:= sock.getPortNumber();
      for each file number i (0 to n-1)
      	if (cached[i] and isInCache(md5sums[i])
               notToBeSent[i]:= true
            else
               notToBeSent[i]:= false
      endfor
      cacheFit:= true is \Sum_i s_i <= CACHE_SIZE
      send FTMessage( cmd=FT_ASSIGNEDPORT, portToTRansferData, notToBeSent[])
     
      sock.accept();
      for each file number i (0 to n-1)
      	if (cacheFit)
               if (!notToBeSent[i])
                  if (cached[i])
                     write to cache LRU (sock.getInputStream(), size[i])
                     cacheIsChanged :=true
                  else
                     write to temp dir (sock.getInputStream(), size[i] )
                  endif
                endif
             endif
       endfor
       create links in temp dir to access files in cache
       send MPDMessage(cmd=DONE,myRank,appDir,jarDep,cacheIsChanged) to local MPD
  end
               
