Integrating BOINC with Sun Grid Engine
Table of Contents
Recently, I setup a small High-Performance Computing (HPC) cluster for Iowa State’s solar car team. This cluster was primarily used for running Computational Fluid Dynamics (CFD) and MATLAB code. However, whenever the team was in the build phase and not actively designing the next car, the cluster sat idle the vast majority of the time. We decided that we should donate the extra compute time to scientific research.
We decided that BOINC would be the best way to donate our extra compute time. In case you’re not aware, BOINC stands for Berkeley Open Infrastructure for Network Computing. Basically, it’s a way for people to have their personal computers donate extra computational capacity to various scientific research projects and is run by University of California, Berkeley. Being designed for home computers will prove to be problematic.
The first challenge was simply installing BOINC. To avoid as many configuration changes as possible, we prefer to install applications to our network file share rather than locally on each compute node. Also, all of our compute nodes are headless, and BOINC usually ships with a GUI manager. This quickly rules out simply using the package manager and the prebuilt binaries available are severely out-of date. This leaves us with compiling from source.
1# change directories to network file share 2cd /share/apps 3# clone repo 4git clone https://github.com/BOINC/boinc 5# move into repo 6cd boinc 7# run setup script 8./_autosetup 9# run configure script with file share path, and disable GUI BOINC manager 10./configure --prefix="/share/apps/boinc" --disable-manager 11# run make multithreaded 12make -j
In order to use BOINC, you need to create an account with each project you want to contribute to and attach your hosts to it. This can be extremely tedious, so using a service like BOINC Account Manager (BAM) is highly recommended. Once you select some projects, you can also check the option “Attach new host by default?” so that everything is automatic.
The only issue I’ve had with BAM, is that my work profile was not being added to new hosts by default.
The next challenge was that BOINC is designed to only be run on a single computer and is not built around Message Passing Interface (MPI) like most applications that run on HPC clusters. To get around this, we created a separate queue in our job scheduler (Son of Grid Engine (SGE, an open-source fork of Sun Grid Engine)) with a single slot per compute node, so that if a BOINC job got scheduled on a node, it would be reserved the entire node. Make sure that the user account you will be using for BOINC has access to this queue and the compute nodes.
An important setting needs to be set in SGE so that other jobs have higher priority. For every other queue that you have, set the BOINC-specific queue to be a subordinate with a max slot count as 1. That way, if any other jobs come into the queue, they will take priority over the BOINC job.
Next, we need to setup how BOINC will be launched.
Launching a job across multiple hosts is easy with SGE with array jobs. We can write a small Bash script to parse how many slots are available in the BOINC queue and launch an array job for that many:
Launching the client is trickier. BOINC doesn’t like sharing a directory with other running instances, so each host needs it’s own directory to work out of. Also, we want to attach each instance to our account manager. Lastly, we need to launch the client in interactive mode so the job is considered running by SGE. If you let BOINC launch in the background like it wants to by default, SGE will think the job has completed and mark the node as available.
1#!/bin/bash 2 3USERNAME=yourusername 4PASSWORD=yourpassword 5HOST=`hostname -s` 6 7CLIENT_BASE=/share/apps/boinc/client/ 8DATA_BASE=/home/boinc/boinc_data 9 10DATA_DIR=$DATA_BASE/$HOST 11 12# stop any existing instances 13pkill boinc_client 14 15echo "Creating data directory $DATA_DIR" 16mkdir -p $DATA_DIR 17 18echo "Clearing lockfile" 19rm -f $DATA_DIR/lockfile 20 21# start the client 22echo "Starting BOINC client" 23$CLIENT_BASE/boinc_client --daemon --dir $DATA_DIR 24 25# wait for client to start 26echo "Waiting for BOINC client to start" 27sleep 5 28 29cd $DATA_DIR 30# make sure account is joined 31echo "Attaching account manager" 32$CLIENT_BASE/boinccmd --join_acct_mgr http://bam.boincstats.com $USERNAME $PASSWORD || true 33# sync 34echo "Syncing account manager" 35$CLIENT_BASE/boinccmd --acct_mgr sync 36 37# stop client from daemon mode and restart on interactive 38echo "Stopping BOINC client" 39$CLIENT_BASE/boinccmd --quit 40 41# give the client time to stop 42sleep 60 43pkill boinc_client 44 45echo "Restarting BOINC client in interactive mode" 46$CLIENT_BASE/boinc_client --dir $DATA_DIR
That’s it! You should now be able to run your array job submission script which will start a BOINC process on each available host. If any new jobs come in, the BOINC jobs will be paused until the entire host is available again.