Architecture (and library!) Independent Parallel Code

The code developed and made available under the link below was from a project supported in part by NSF grant NSF/ITR IIS-0324816 and also NSF MRI grant NSF EIA-9977508. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

We have been developing parallel code for a variety of scientific problems on the PC Cluster. The parallel code is architecture independent in the sense that it requires only recompilation to be run on a variety of parallel platforms as long as these platforms support the parallel programming libraries required by the code. The programmatic principle used to develop the code has been influenced by the Bulk-Synchronous Parallel model. The code itself is (hopefully) architecture independent and library independent. The same piece of code (for most of the available code) needs only recompilation to be run under the two libraries supported by the PC Cluster: MPI through LAM-MPI v6.5.1 and BSPlib through BSPlib v1.4.

Library independent support is available because the code was developed under the BSP programming paradigm using programming interfaces for parallel programming that are library independent and supported by both LAM-MPI and BSPlib. Depending on whether the code is compiled under LAM-MPI or BSPlib, generic programming interfaces are translated into one-sided direct memory communication functions utilizing either the MPI-2 extensions of LAM-MPI (e.g. MPI_PUT, MPI_GET) or the the cooresponding facilities for BSPlib (e.g. bsp_put, bsp_get, bsp_hpput, bsp_hpget).

NOTE With minor modifications our code can be compiled under CriticalSoftware's WMPI and work on Windows 2000 clusters. So far the only piece of code tested under WMPI/Windows 2000 using Visual C++ 6.0 is as03v2.tar (communication network performance assessment toolset).


Software that is being made available

  1. Templates for sequential and multi-core sorting tsmf.html These are operation Rsort, Dsort, Rsort-m and Dsort-m as described in the paper "Parallel computing techniques for sequential and multi-core sorting". (August 2011)
  2. Sequential Sorting Framework ssf.html In relation to "A parallelism-motivated sequential sorting framework". (Nov 2009)
  3. Binomial Tree-based option price valuations binomial.html
  4. Communication network performance assessment assess.html
  5. Parallel Dense Matrix Multiplication matmul.html
  6. Parallel and Sequential Radix-Sort prdx.html
  7. Broadcasting and Parallel Prefix Operations brdppf.html
  8. Trinomial Tree-based option price valuations trinomial.html
  9. Parallel option price valuations with the explicit finite difference method efdm.html Apr 2009
  10. Sequential Option Price Valuations whose code was subsequently parallelized and described separately seqfin.html
  11. Increasing the efficiency of existing (sequential) sorting algorithms by using randomized wrappers. imrs.html
  12. Comprehensive Parallel Option Price Valuations Package (includes updated variants of algorithms included in packages 1,5,6). fin.html
  13. Probabilistic integer sorting pris.html

Other Software that will become available Dense matrix elimination method (eg LU decomposition under a scattered and wrapped-around matrix distributions, Gauss-Jordan elimination under the same distributions and possibly Cholesky factorization). Sorting Algorithm implementations: a simple case (p*p < n case) of the Gerbessiotis-Valiant randomized oversampling-based algorithm, a simple-case of the Gerbessiotis-Siniolakis deterministic regular oversampling algorithm (a non trivial extension of the Shi-Shaeffer algorithm that uses oversampling and parallel sample sorting that achieves finer and programmer-determined bucket imbalance), and a simple case of an extension of the ideas of the Gerbessioti-Siniolakis algorithm into a new randomized algorithm whose (practical rather than theoretical) performance improves upon the Gerbessiotis-Valiant (simple case) algorithm.


Software Support to build a PC cluster

In order to use the programming libraries LAM-MPI or BSPlib and start writing parallel programs, a cluster of PC workstations needs to be set-up in a certain way so that code can be distributed through the facilities of both libraries and interworkstation communication be realized.

I have developed a set of csh scripts that allow the easy set-up of a collection of Linux based PCs in a pc cluster that can be made to work under both LAM-MPI and BSPlib. The scripts require root privileges to be run, and take a few seconds to execute. They have been tested under the default RedHat 7.1 and Redhat 7.2 distributions. A tar file containing these scripts with an example installation is being made available.

Note, however, that documentation is limited, if not non-existent, and no support is provided. If you decide to download it, you use it AT YOUR OWN RISK and neither I nor NJIT accepts any responsibility at all from the use (or misuse) of the supplied code.

PC Cluster Configuration Scripts (tar format) My best advice is to read the script and modify them accordingly depending on your needs and setup.

Last Update : Aug 31, 2011