Single Node Condor and Pegasus on ubuntu12.04
[STEP 1: Install Condor]
Download the latest version of HTCondor (native package) for Ubuntu 12.04 from the following URL. What I have downloaded is condor-8.1.6-247684-ubuntu_12.04_amd64.deb. The actual filename might change over time.
http://research.cs.wisc.edu/htcondor/downloads/
Install Condor using the following commands:
$ sudo dpkg -i condor-8.1.6-247684-ubuntu_12.04_amd64.deb $ sudo apt-get update $ sudo apt-get install -f $ sudo apt-get install chkconfig $ sudo chkconfig condor on $ sudo service condor start
Now we should have Condor up and running, and it should be automatically started when the system boots. Check into the status of Condor using the following commands:
$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@ip-10-0-5-11 LINUX X86_64 Unclaimed Benchmar 0.060 1862 0+00:00:04 slot2@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:05 slot3@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:06 slot4@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:07 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 4 0 0 4 0 0 0 Total 4 0 0 4 0 0 0 $ condor_q -- Submitter: ip-10-0-5-114.ec2.internal : : ip-10-0-5-114.ec2.internal ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
[STEP 2: Install Pegasus]
Pegasus needs Java (1.6 or higher) and Python (2.4 or higher). Ubuntu 12.04 comes with Python 2.7 but not Java, so we will need to install Java first. Optionally Pegasus also needs Globus for grid support, we will take care of Globus later.
$ sudo apt-get install openjdk-7-jdk
Then we config the Pegasus repository and install Pegasus.
$ gpg --keyserver pgp.mit.edu --recv-keys 81C2A4AC $ gpg -a --export 81C2A4AC | sudo apt-key add -
All the following line into /etc/apt/source.list:
deb http://download.pegasus.isi.edu/wms/download/debian wheezy main
Update the repository and install Pegasus:
$ sudo apt-get update $ sudo apt-get install pegasus
Now we should have Pegasus installed on the system. Check the installation with the following command. If you see similar output, congratulations!
$ pegasus-status (no matching jobs found in Condor Q)
Pegasus comes with some examples, we will use these example to test the installation further.
$ cd ~ $ cp -r /usr/share/pegasus/examples . $ cd examples/hello-world $ ls dax-generator.py hello.sh pegasusrc submit world.sh
Run the hello-world example:
$ ./submit 2014.06.24 10:34:00.455 UTC: Submitting job(s). 2014.06.24 10:34:00.460 UTC: 1 job(s) submitted to cluster 1. 2014.06.24 10:34:00.465 UTC: 2014.06.24 10:34:00.471 UTC: ----------------------------------------------------------------------- 2014.06.24 10:34:00.476 UTC: File for submitting this DAG to Condor : hello_world-0.dag.condor.sub 2014.06.24 10:34:00.481 UTC: Log of DAGMan debugging messages : hello_world-0.dag.dagman.out 2014.06.24 10:34:00.487 UTC: Log of Condor library output : hello_world-0.dag.lib.out 2014.06.24 10:34:00.492 UTC: Log of Condor library error messages : hello_world-0.dag.lib.err 2014.06.24 10:34:00.497 UTC: Log of the life of condor_dagman itself : hello_world-0.dag.dagman.log 2014.06.24 10:34:00.503 UTC: 2014.06.24 10:34:00.508 UTC: ----------------------------------------------------------------------- 2014.06.24 10:34:00.513 UTC: 2014.06.24 10:34:00.519 UTC: Your workflow has been started and is running in the base directory: 2014.06.24 10:34:00.524 UTC: 2014.06.24 10:34:00.530 UTC: /home/ubuntu/examples/hello-world/work/ubuntu/pegasus/hello_world/20140624T103359+0000 2014.06.24 10:34:00.535 UTC: 2014.06.24 10:34:00.540 UTC: *** To monitor the workflow you can run *** 2014.06.24 10:34:00.546 UTC: 2014.06.24 10:34:00.551 UTC: pegasus-status -l /home/ubuntu/examples/hello-world/work/ubuntu/pegasus/hello_world/20140624T103359+0000 2014.06.24 10:34:00.556 UTC: 2014.06.24 10:34:00.562 UTC: *** To remove your workflow run *** 2014.06.24 10:34:00.567 UTC: 2014.06.24 10:34:00.572 UTC: pegasus-remove /home/ubuntu/examples/hello-world/work/ubuntu/pegasus/hello_world/20140624T103359+0000 2014.06.24 10:34:00.578 UTC: 2014.06.24 10:34:01.024 UTC: Time taken to execute is 1.109 seconds
Check the status of the Pegasus jobs and Condor queue using the pegasus-statua and condor_q commands:
$ pegasus-status STAT IN_STATE JOB Run 01:05 hello_world-0 Summary: 1 Condor job total (R:1) $ condor_q -- Submitter: ip-10-0-5-114.ec2.internal : : ip-10-0-5-114.ec2.internal ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 ubuntu 6/24 10:34 0+00:01:28 R 0 0.0 pegasus-dagman -f 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Multi Node Condor and Pegasus on ubuntu12.04
[STEP 1: Install Condor]
Similar to the previous tutorial, download the latest version of HTCondor (native package) for Ubuntu 12.04 from the following URL. What I have downloaded is condor-8.1.6-247684-ubuntu_12.04_amd64.deb. The actual filename might change over time.
http://research.cs.wisc.edu/htcondor/downloads/
Install Condor using the following commands:
$ sudo dpkg -i condor-8.1.6-247684-ubuntu_12.04_amd64.deb $ sudo apt-get update $ sudo apt-get install -f $ sudo apt-get install chkconfig $ sudo chkconfig condor on $ sudo service condor start
Now we should have Condor up and running, and it should be automatically started when the system boots. Check into the status of Condor using the following commands:
$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@ip-10-0-5-11 LINUX X86_64 Unclaimed Benchmar 0.060 1862 0+00:00:04 slot2@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:05 slot3@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:06 slot4@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:00:07 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 4 0 0 4 0 0 0 Total 4 0 0 4 0 0 0 $ condor_q -- Submitter: ip-10-0-5-114.ec2.internal : : ip-10-0-5-114.ec2.internal ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
[STEP 2: Config Condor Master Node]
Use a text editor to open /etc/condor/condor_config, and add the following line to the end of the file:
ALLOW_WRITE = *
Then restart Condor with the following command:
$ sudo service condor restart
Also, find the IP address of the Master Node with the following command, you will need it to config theWorker Node.
[STEP 3: Config Condor Worker Node]
Now we go ahead to config the Worker Node. Use a text editor to open /etc/condor/condor_config.local, find the following line
CONDOR_HOST = $(FULL_HOSTNAME)
and update it with the IP address of the Master Node. Assuming that the IP address of the Master Node is 192.168.1.1, then this line should look like the following
CONDOR_HOST = 192.168.1.1
Then restart Condor using the following command:
$ sudo service condor restart
Now on both the Master Node and the Worker Node, we will be able to see both nodes. In the following example, both the Master Node and the Worker Node are c3.xlarge instances. Each of the c3.xlarge instance have 4 vCPU’s, so we are seeing 8 slots in the cluster.
$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:04:36 slot2@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:05 slot3@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:06 slot4@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:07 slot1@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.040 1862 0+00:04:36 slot2@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:05 slot3@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:06 slot4@ip-10-0-5-11 LINUX X86_64 Unclaimed Idle 0.000 1862 0+00:05:07 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 8 0 0 8 0 0 0 Total 8 0 0 8 0 0 0