My current programming work involves computational linguistics machine learning experiments, which most often means running HTCondor DAGman workflows in order to provide concurrency for complex compute jobs. Naturally what I would like to have is a way to develop such programs that is seamless from my desktop (for code, test, and small jobs) to the cloud. HTCondor works fine on desktops, grids, and clouds but the installation, configuration, and administration for each of those environments is different, fiddly, and often time consuming. Therefore I’ve taken a stab at packaging up HTCondor in a Docker container and configuring it for Kubernetes to run on Google Compute Engine. The current setup is mostly a proof-of-concept at this stage but I wanted to share the details so other who are interested can give it a whirl.
Getting all this stuff set up belies the ease-of-use I have in mind which is that it will eventually be no more involved than installing Docker and signing up for a Google Cloud account, but such is life on the bleeding edge.
gcloud config set project <your-project-id>
.build/release.sh
. Set an environment variable KUBE_HOME
to the directory where you’ve installed Kubernetes for use in the next steps.
The Dockerfile and Kubernetes configuration files are on GitHub at https://github.com/jimwhite/condor-kubernetes.
1 2 |
|
There is a script start-condor-kubernetes.sh
that does all these steps at once, but I recommend doing them one-at-a-time so you can see whether they succeed or not. I’ve seen occassions where the cAdvisor monitoring service doesn’t start properly, but you can ignore that if you don’t need to use it (and it can be loaded separately from the instruction in $KUBE_HOME/examples/monitoring
). The default settings in $KUBE_HOME/cluster/gce/config-default.sh
are for NUM_MINIONS=4
+ 1 master n1-standard-1
size instances in zone us-central1-b
.
1
|
|
There is a trusted build for the Dockerfile I use for HTCondor in the Docker Hub Registry at https://registry.hub.docker.com/u/jimwhite/condor-kubernetes/. To use a different repository you would need to modify the image
setting in the condor-manager.json
and condor-executor-controller.json
files.
Spin up the Condor manager pod. It is currently configured by the userfiles/start-condor.sh
script to be a manager, execute, and submit host. The execute option is pretty much just for testing though. The way the script determines whether it is master or not is by looking for the CONDORMANAGER_SERVICE_HOST
variable configured by Kubernetes. Using the Docker-style variables would be the way to go if this container scheme for Condor were used with Docker container linking too.
1
|
|
Start the Condor manager service. The executor pods will be routed to the manager via this service by using the CONDORMANAGER_SERVICE_HOST
(and ..._PORT
) environment variables.
1
|
|
Start the executor pod replica controller. The executors currently have submit enabled too, but for applications that don’t need that capability it can be omitted to save on network connections (whose open file description space on the collector can be a problem for larger clusters).
1
|
|
You should see 3 pods for Condor (one manager and two executors), initially in ‘Pending’ state and then ‘Running’ after a few minutes.
1
|
|
Find the id of a minion that is running Condor from the list of pods. You can then ssh
to it thusly:
1
|
|
Then use docker ps
to find the id of the container that is running Condor and use docker exec
to run a shell in it:
1
|
|
And condor_status
should show nodes for the manager and the two executor pods:
1 2 3 4 5 6 7 8 9 10 11 |
|
The Condor Executor pod replica count is currently set to 2 and can be changed using the resize
command (see also the $KUBE_HOME/examples/update-demo
to see another example of updating the replica count).
1
|
|
If you list the pods then you’ll see some executors are ‘<unsassigned>’ because the number of minions is too low. Auto-scaling for Kubernetes on Google Cloud is currently in the works and is key to making this a generally useful appliance.
Be sure to turn the lights off when you’re done…
1
|
|
There remain several things to do to make this a truly easy-to-use and seamless research tool. A key reason for using Docker here besides leveraging cloud services is being able to containerize the application being run. While Docker images as Condor job executables is a natural progression (as is Docker volume-based data management), the model we have here is just as good for many purposes since we just bundle Condor and the application together in a single image. The way I plan to use that is to add a submit-only pod which submits the workflow and deals with data management. Of course for bigger data workflows a distributed data management tool such as GlusterFS or HDFS would be used (both of which have already have been Dockerized).
]]>Here’s a brief demonstration video:
This demonstration is on Mac OS X but the process is similar for Windows or Linux. The first step is installing Docker : all platforms, Mac OS X, Windows (the last two include videos). With that done, you only need a few simple commands to start the ParsingShell or other BLLIP parser Python scripts. So open a Terminal window and enter:
1
|
|
The -it
means “interactive + terminal” mode, the --rm
means “remove this container after this command exits”, the -p 5901:5901
makes TCP port 5901 accessible (we’ll be using this to access graphics from inside the container using VNC), and the jimwhite/bllip-parser-python
means load the Docker image I’ve put on the Docker Hub Registry here. Because the image contains most of Ubuntu the size is about 1.2G and so may take a few minutes to download the first time.
Docker will run the image in a container and the initial command here is another shell and so you’ll see a Bash prompt, although from Linux here.
1
|
|
At this point if you want to use the command line (CLI) then you could enter the Python commands for the parser wrapper just as they’re shown on the PyPi web page. To use the GUI though we need to start a VNC server by entering ./runvnc.sh
like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
That prompts for a password to secure the VNC server (which won’t actually be used in this setup and can be anything of at least six characters) and then starts it.
Now you need to open a VNC client, which on Mac is easy since there is one built in. For Windows you’ll need to install additional software like TightVNC. One way to use the built-in Mac OS X VNC client is to open new Terminal window (command-N) and use the open
command. I show here checking for the Boot2Docker VM IP address and then opening the VNC client:
1 2 3 4 5 |
|
Or you can enter that URL in Safari’s location bar:
That will ask for confirmation:
Then show the initial Ubuntu desktop with an LXTerminal window opened for you:
Use that terminal to enter the Python commands for the parsing shell. Firefox is available inside the container because you can’t copy & paste outside it in this setup (icon for the web browser is in the lower left).
When you’re done, just return to the Docker container’s prompt and exit
:
1 2 3 |
|