Skip to content

XSEDE 2013 BigJob Tutorial

shantenujha edited this page Jul 22, 2013 · 60 revisions

Introduction

We assume you know the basics of a what a pilot-job system is and how (and why) you might want to use it. Please refer to the following link for a brief overview.

Pilot-Job Overview

Preparation

This tutorial will use the TACC Virtual Machine (VM), repex1, to submit jobs remotely to the XSEDE machine, Stampede. Please login to repex1, as we did at the beginning of this session.

$ ssh <username>@repex1.tacc.utexas.edu

Installation

Next, you need to install BigJob in your user account on repex1. Since BigJob, just like saga-python, is written in Python, you can use virtualenv to create a local installation:

$ virtualenv $HOME/bigjobenv
$ . $HOME/bigjobenv/bin/activate

The BigJob package that we are using is called saga-bigjob and can be installed via pip:

$ pip install saga-bigjob

Please, validate the installation by typing:

$ python -c "import pilot; print pilot.version"
0.4.134-162-g242cdb3-saga-python

A note on BigJob architecture:

BigJob

Since BigJob supports distributed execution, it requires a central point of communication in order to manage pilots, tasks and their associated data files across resources. This is accomplished through the use of the "Distributed Coordination Service," a central database. The database that BigJob uses is called Redis.

For the purposes of this tutorial, we will utilize a Redis server maintained by the SAGA Project team on a Virtual Machine at Indiana University. We have set up environment variables in your home directories on repex1 that contain the 'secret' password to this VM. Please note that after this tutorial is complete, you will be able to reference this page to learn how to setup your own Redis server.


Code Example 1: Simple Ensemble

The simplest usage of a pilot-job system is to submit multiple identical tasks collectively i.e., as one big job! Such usage arises, for example to perform either a parameter sweep job or a set of ensemble simulation.

This example runs NUMBER_JOBS concurrent '/bin/echo' tasks on TACC's Stampede cluster. NUMBER_JOBS is defined to be 32. A 32-core pilot-job is initialized and 32 single-core tasks are submitted to it. This example also shows basic error handling via 'try/except' and coordinated shutdown (removing pilot from Stampede's queue) once all tasks have finished running via finally (line 74).

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble.py.

Execution

Execute the Python script:

python simple-ensemble.py

The output will look something like this:

* Submitted task '0' with id 'cu-262ee4a2-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-26464cbe-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
Terminating BigJob...

Discussion

Let's analyze the script that we just ran in order to understand the important components of a BigJob script.

In order to use BigJob, we must call the python module at the beginning of our script:

import pilot

Also, it's important to call your attention to two things at the beginning of the script:

# The coordination server
COORD       = "redis://%s@gw68.quarry.iu.teragrid.org:6379" % REDIS_PWD
...
# The number of jobs you want to run
NUMBER_JOBS = 32

While these are not inherent features of BigJob, notice that we tell BigJob to use the redis server at Indiana University, and we also specify the number of jobs we want to run. While you submit just a single BigJob in this example, that BigJob reserves space to run 32 jobs.

There are two main components to a BigJob script: the Pilot Compute Description and the Compute Unit Description. The Pilot Compute Description tells BigJob which resource to send your job to and characteristics about the "Pilot-Job" - such as, how many cores you want to reserve, what queue you want to submit to, what your allocation is, etc. Let's take a look at the Pilot Compute Description from the script we just submitted:

        pilot_description = pilot.PilotComputeDescription()
        pilot_description.service_url = "slurm+ssh://%s" % HOSTNAME
        pilot_description.queue = QUEUE
        pilot_description.number_of_processes = 32
        pilot_description.working_directory = WORKDIR
        pilot_description.walltime = 10

Notice that we tell BigJob we want to submit to the slurm queue on HOSTNAME (where HOSTNAME is stampede.tacc.utexas.edu). Please note, this means that, even though we are currently logged into repex1.tacc.utexas.edu, we will submit from repex1 to the supercomputer Stampede. This means your jobs will actually be executed on the compute nodes of Stampede. (Advanced Note: We could have also done such task submission from your local laptop, if you had SSH keys configured for password-less login to Stampede. Read more here

This is a fine example of the intrinsic power of SAGA and SAGA-BigJob to Submit locally, Execute Globally with just about the least changes possible.

We also have to set the value of QUEUE in the script to the appropriate QUEUE on Stampede we want to submit to - in this case, normal. In addition, we ask for 32 cores (number_of_processes), specify a wall clock time for the job in minutes (walltime), and tell BigJob which directory we want to store our files in (working_directory). Please note that this working_directory is a directory on Stampede and NOT on repex1.

For a complete list of Pilot Compute Description parameters, please click here.

The Compute Unit Description, on the other hand, describes the details of your application kernel (executable), including what inputs the executable might require and where/how to save the stdout and stderr files. Here's the Compute Unit Description from our script:

            task_desc = pilot.ComputeUnitDescription()
            task_desc.executable = '/bin/echo'
            task_desc.arguments = ['I am task number $TASK_NO', ]
            task_desc.environment = {'TASK_NO': i}
            task_desc.number_of_processes = 1
            task_desc.output = 'stdout.txt'
            task_desc.error = 'stderr.txt'

Note that we are using the executable /bin/echo - this executable takes an argument and then echoes this argument - for instance, /bin/echo dog would return dog from the shell. Meanwhile, /bin/echo $HOME, if this environment variable is set, would print the path of your home directory, i.e. /home/tutorial-04. If we take a look at this Compute Unit Description, we see that we are calling /bin/echo 'I am task number $TASK_NO'. The environment variable, $TASK_NO is defined on the next line to be the iterator of the loop that this Compute Unit Description is contained in (recall for i in range(NUMBER_JOBS):). Here, number_of_processes refers to the number of cores allocated to a single job. We only require 1 core, so this parameter is set to 1. Next, we define the file names of the shell's stderr and stdout. Note that if you are running an executable that generates its own output files, these will be captured in the working directory as well.

For a complete list of Compute Unit Description parameters, please click here

Where is My Output?

Recall that we specified the working directory of our script as follows:

WORKDIR     = "/home1/02554/sagatut/XSEDETutorial/%s/example1" % USER_NAME

We can ssh to Stampede (ssh sagatut@stampede.tacc.utexas.edu) and cd into this directory in order to see the output from our script. For example, for tutorial account tutorial-00, you would do the following from repex1:

     (python)tutorial-00@repex1:~$ ssh sagatut@stampede.tacc.utexas.edu

     ... 
     Last login: Fri Jul 19 02:07:23 2013 from repex1.tacc.utexas.edu
     ------------------------------------------------------------------------------
                   Welcome to the Stampede Supercomputer
     ...

     login2$ cd XSEDETutorial/tutorial-00/example1/
     login2$ ls
     bj-dc138466-f041-11e2-a1fa-005056a13723                   
     stdout-bj-dc138466-f041-11e2-a1fa-005056a13723-agent.txt
     stderr-bj-dc138466-f041-11e2-a1fa-005056a13723-agent.txt

Do not be alarmed by the random string of numbers! This is just a unique identifier that BigJob uses so that it doesn't overwrite other BigJobs that run at the same time. The two agent files in this directory are important mostly for debugging purposes - while the stderr.txt file you defined in the Compute Unit Description captures errors that occur from your application itself, these agent files capture output and errors that are specific to BigJob. If you suspect something might be wrong with your BigJob script, you may find a hint to the problem in the stderr-agent-<bj-id>.txt files located in the BigJob working directory.

     login2$ cd bj-dc138466-f041-11e2-a1fa-005056a13723/
     login2$ ls
     login2$ ls -r
     sj-e24786ac-f041-11e2-a1fa-005056a13723    
     ...
     sj-e08a17da-f041-11e2-a1fa-005056a13723

You should see a number of subjob directories with their unique identifiers appended to the end. These subjobs refer to the 32 jobs we just ran (one directory for each job). You can verify that there are 32 directories by using the Unix command wc (word count), as follows:

     login2$ ls | wc
      32      32    1280 

Choose one of these directories and cd into it.

     login2$ cd sj-e24786ac-f041-11e2-a1fa-005056a13723/
     login2$ ls
     stderr.txt  stdout.txt

In this case, your stderr file should be empty, because we know that the executable /bin/echo is available on Stampede, and we know that we provided valid output to it. Let's see what our stdout file says:

     login2$ cat stdout.txt 
     I am task number 31

We can cd into any number of these subjob directories and see that the echo command returns the task #, in the order it was submitted to the Pilot Job.


Code Example 2: Adding Data Transfer

Now that we understand the basics of submitting simple executables via BigJob, let's extend the previous example to include file transfer. For example, once the 32 tasks have finished executing, we use SAGA-Python to transfer the individual output files back to the local machine (in this case, we are transferring the files from Stampede (where the executable runs) back to repex1 (where we submitted our job from).

Preparation

  1. Take a look at the full example code on GitHub.

    wget https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/02_bigjob-simple-ensemble-datatransfer.py

  2. On repex1, create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble-datatransfer.py.

Execution

Execute the Python script:

python simple-ensemble-datatransfer.py

The output will look something like this:

* Submitted task '0' with id 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Output for 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9bfd334c-e996-11e2-8e8b-14109fd519a1.txt'
* Output for 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9c169a1c-e996-11e2-8e8b-14109fd519a1.txt'
[...]
* Output for 'cu-a0bc8202-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-a0bc8202-e996-11e2-8e8b-14109fd519a1.txt'
Terminating BigJob...

Discussion

This time, we do not have to log in to Stampede to see our output. We can remain on repex1 and view the files labeled ex-2-stdout-*, you will see the output of the tasks (i.e. I am task number 11).

What bit of code did we add? Recall the file transfer example from the saga-python tutorial.

        # all compute units have finished. now we can use saga-python
        # to transfer back the output files...
        d = saga.filesystem.Directory("sftp://%s/" % (HOSTNAME))
        for task in tasks:
            local_filename = "ex-2-stdout-%s.txt" % (task.get_id())
            d.copy("%s/stdout.txt" % (task.get_local_working_directory()), "file://localhost/%s/%s" % (os.getcwd(), local_filename))
            print "* Output for '%s' copied to: './%s'" % (task.get_id(), local_filename)

Note: The job output on Stampede did not go anywhere. If you still want to view your output files on Stampede itself, you can ssh into Stampede and access your data in the following manner:

    login1$ cd XSEDETutorial/tutorial-27/example2
    login1$ ls
    bj-cfbb398c-f084-11e2-b8e1-005056a13723
    stderr-bj-cfbb398c-f084-11e2-b8e1-005056a13723-agent.txt
    stdout-bj-cfbb398c-f084-11e2-b8e1-005056a13723-agent.txt
    login1$ cd bj-cfbb398c-f084-11e2-b8e1-005056a13723/
    login1$ ls
    sj-d1faeaf8-f084-11e2-b8e1-005056a13723  
    ...
    sj-d3bce85a-f084-11e2-b8e1-005056a13723
    login1$ ls | wc
     32      32    1280

Code Example 3: Chained Ensemble

This tutorial example introduces task synchronization. It submits a set of 32 '/bin/echo' tasks (task set A). For every successfully completed task, we submits another '/bin/cat' task from task set B to the same Pilot-Job. Tasks from set A can be seen as producers and tasks from set B as consumers, since B-tasks read ('consume') the output files of A-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

    wget https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/03_bigjob_chained_ensemble.py

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as chained_ensemble.py.

Execution

Execute the Python script:

python chained_ensemble.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-27ab3846-e9a9-11e2-88eb-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-27c2cca4-e9a9-11e2-88eb-14109fd519a1'
[...]
One 'A' task cu-27ab3846-e9a9-11e2-88eb-14109fd519a1 finished. Launching a 'B' task.
* Submitted 'B' task '31' with id 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1'
[...]
* Output for 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-352139c6-e9a9-11e2-88eb-14109fd519a1.txt'
* Output for 'cu-353e2946-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-353e2946-e9a9-11e2-88eb-14109fd519a1.txt'
[...]
* Output for 'cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1.txt'
Terminating BigJob...

Discussion

Let's take a moment to review what is going on here by looking at the Compute Unit Descriptions:

Our task set A is the same as it was in example 1, with one difference - we now name the output files A-stdout.txt and A-stderr.txt. The focus here is on task set B:

        while len(task_set_A) > 0:
            for a_task in task_set_A:
                if a_task.get_state() == "Done":
                    print "One 'A' task %s finished. Launching a 'B' task." % (a_task.get_id())
                    task_desc = pilot.ComputeUnitDescription()
                    task_desc.executable = '/bin/cat'
                    task_desc.arguments = ["%s/A-stdout.txt" % a_task.get_local_working_directory()]
                    task_desc.number_of_processes = 1
                    task_desc.output = 'B-stdout.txt'
                    task_desc.error  = 'B-stderr.txt'
                    task = pilotjob.submit_compute_unit(task_desc)

Notice that we initially run 32 1-core jobs (set A) in our Pilot Job of size 32-cores. This essentially makes the Pilot Job 'full'. As soon as one of these A-tasks completes, we have room to execute another task, and in this case, we do. We execute a task from set B. The executable in set B is different from that of set A: /bin/cat. A single B task reads the output of a single A task and then cats that output into its own output file. When might this be useful? If you have two different executables (or even the same executable) with some data dependencies on the output of an initial run of the executable (output here belonging to set A).

Where is my Output?

Note that we have only transferred the output of the B-tasks back to repex1. This was just a stylistic decision, since the output of the A-tasks and B-tasks are the same in this case. It may be the case that the process of running your executable in set A just creates intermediate data for set B, and this data is not needed (and thus, does not have to be copied back to the machine that you ran your BigJob script from). If you want the data from task set A, you can still get it by either logging into Stampede and searching through the appropriate directory, or you can implement a second file transfer to get the A-task output back on repex1.

In fact we recommend (after the tutorial session) that you implement this second file transfer to test your understanding.

If you open and look at the ex-3-stdout-* files, you will see the output of the B-tasks, which is just the 'forwarded' content they read from the A-task outputs.


Code Example 4: Coupled Ensemble

This tutorial example shows another form of task set synchronization. It exemplifies a simple workflow which submits a set of tasks (set A) and (set B) and waits until they are completed before it submits another set of tasks (set C). Both A- and B-tasks are 'producers'. C-tasks 'consumers' and concatenate the output of an A- and a B-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

    wget https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/04_bigjob_coupled_ensembles.py

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as coupled_ensembles.py.

Execution

Execute the Python script:

python coupled_ensembles.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-833b3762-e9ac-11e2-b250-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-8352c0f8-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'A' task '31' with id 'cu-86137aee-e9ac-11e2-b250-14109fd519a1'
* Submitted 'B' task '0' with id 'cu-862ad342-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'B' task '31' with id 'cu-88fe4c2a-e9ac-11e2-b250-14109fd519a1'
Waiting for 'A' and 'B' tasks to complete...
* Submitted 'C' task '0' with id 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'C' task '31' with id 'cu-0281b708-e9ad-11e2-b250-14109fd519a1'
Waiting for 'C' tasks to complete...
* Output for 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-ffb024ce-e9ac-11e2-b250-14109fd519a1.txt'
[...]
* Output for 'cu-0281b708-e9ad-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-0281b708-e9ad-11e2-b250-14109fd519a1.txt'

Terminating BigJob...

Discussion

Let's take a look at the executables of set A and set B, and then set C.

Set A:

            task_desc.executable = '/bin/echo'
            task_desc.arguments = ['I am an $TASK_SET task with id $TASK_NO', ]
            task_desc.environment = {'TASK_SET': 'A', 'TASK_NO': i}

We anticipate a sample output of such a task to be: "I am an A task with id 7"

Set B:

            task_desc.executable = '/bin/echo'
            task_desc.arguments = ['I am a $TASK_SET task with id $TASK_NO']
            task_desc.environment = {'TASK_SET': 'B', 'TASK_NO': i}

We anticipate a sample output of a task in B to say: "I am a B task with id 7"

These two tasks will both run until they are completed. Set C will not be a factor at all until A & B finish. Note that we still only asked for a 32-core Pilot Job, thus, A will execute all tasks, then B will execute all its tasks - and finally, once both A & B tasks have completed the C tasks will begin.

Set C:

            a_task_output = "%s/A-stdout.txt" \
                % task_set_A[i].get_local_working_directory()
            b_task_output = "%s/B-stdout.txt" \
                % task_set_B[i].get_local_working_directory()

            task_desc = pilot.ComputeUnitDescription()
            task_desc.executable = '/bin/cat'
            task_desc.arguments = [a_task_output, b_task_output]

Note that we read the contents of a particular set A stdout and a particular set B stdout into temporary variables (in this case, a_task_output and b_task_output). We then cat the contents, meaning that the stdout of C should contain the contents of both the stdout file from set A and the stdout file from set B.

For convenience, this script copies back the output of set C to repex1 so that we do not have to log in to Stampede again. Note that your data still persists on Stampede - we are simply copying it to repex1. If you open and look at the ex-4-stdout-* files, you will see the output of the C-tasks which is the concatenated output of the A- and B- tasks. For example:

I am an A task with id 7
I am a B task with id 7

Code Example 5: Mandelbrot

In this example, we split up the calculation of a Mandelbrot set into several tiles.

We have already covered the basic topics required to write and run your own BigJob scripts. All of the concepts covered in the Mandelbrot example have been reviewed earlier in this tutorial. We included this example in order to show you the difference between using saga-python to serially submit a number of individual jobs to a queuing system vs. using BigJob to submit one job to the queuing system and then execute a number of tasks within that job.

This aims to show why BigJob is useful; we reserve the resources we need for all of the jobs, but submit just one job that requests all of these resources. Once the job becomes active, the tasks are executed in a distributed fashion. We then use the saga-python data transfer capabilities to transfer the individual created images back to repex1 and form the complete image.

Mandelbrot Tiles

Preparation

  1. Install the Python Image Library (PIL):

    pip install PIL
  2. Download the Mandelbrot application kernel and the 'bootstrap' script:

    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.sh
    
    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.py
  3. Take a look at the full example code on GitHub.

    wget https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/05_bigjob_mandelbrot.py

  4. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as bigjob_mandelbrot.py.

Execution

Execute the Python script:

python bigjob_mandelbrot.py

The output will look something like this:

* Submitted task 'cu-6f26b08c-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
* Submitted task 'cu-6f3cff54-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
[...]
* Submitted task 'cu-706628ec-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial00/example5/tile_x0_y0.gif back to /home/tutorial-00
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x0_y1.gif back to /home/tutorial-00
[...]
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x3_y3.gif back to /home/tutorial-00
* Stitching together the whole fractal: mandelbrot_full.gif
Terminating BigJob...

Discussion

You have copied the mandelbrot_full.gif image back to your home directory on repex1 from Stampede, but you won't be able to view it unless you are doing X11 forwarding. To actually view the image, you can copy the image back to your laptop (e.g., via sftp or scp) and open it with an image viewer. You should see the full 8192x8192 Mandelbrot fractal.

Let's just illustrate how this differs from the saga-python example by inspecting the code in both cases:

for x in range(0, tilesx):
            for y in range(0, tilesy):

                # describe a single Mandelbrot job. we're using the
                # directory created above as the job's working directory
                outputfile = 'tile_x%s_y%s.gif' % (x, y)
                jd = saga.job.Description()
                #jd.queue             = "development"
                jd.wall_time_limit   = 10
                jd.total_cpu_count   = 1
                jd.working_directory = workdir.get_url().path
                jd.executable        = 'sh'
                jd.arguments         = ['mandelbrot.sh', imgx, imgy,
                                        (imgx/tilesx*x), (imgx/tilesx*(x+1)),
                                        (imgy/tilesy*y), (imgy/tilesy*(y+1)),
                                        outputfile]
                # create the job from the description
                # above, launch it and add it to the list of jobs
                job = jobservice.create_job(jd)
                job.run()
                jobs.append(job)
                print ' * Submitted %s. Output will be written to: %s' % (job.id, outputfile)

        # wait for all jobs to finish
        while len(jobs) > 0:
            for job in jobs:
                jobstate = job.get_state()
                print ' * Job %s status: %s' % (job.id, jobstate)
                if jobstate in [saga.job.DONE, saga.job.FAILED]:
                    jobs.remove(job)

Above, in the saga-python example, we submit just one job for each section that we want to calculate. This results in many jobs going to the Stampede queuing system. This means we must wait for each job to go into the Queue, move to the Running state, and the Complete. You can imagine this can get frustrating if some jobs get queued for long periods of time while others get executed fairly quickly.

In the BigJob example, we still loop over the X tiles and Y tiles, but this time we submit to our Pilot Job instead of directly to the queuing system. The Pilot Job sits by itself in the queue, and as soon as it becomes active, it can start consuming resources.

        for x in range(0, TILESX):
            for y in range(0, TILESY):
                # describe a single Mandelbrot job. we're using the
                # directory created above as the job's working directory
                task_desc = pilot.ComputeUnitDescription()
                task_desc.executable = '/bin/sh'
                task_desc.arguments = ["/%s/mandelbrot.sh" % WORKDIR, IMGX, IMGY,
                                       (IMGX/TILESX*x), (IMGX/TILESX*(x+1)),
                                       (IMGY/TILESY*y), (IMGY/TILESY*(y+1)),
                                       '%s/tile_x%s_y%s.gif' % (WORKDIR, x, y)]

                task_desc.wall_time_limit = 10
                task_desc.number_of_processes = 1

                task = pilotjob.submit_compute_unit(task_desc)

Where to get help from here?

Google Groups
Subscribe to bigjob-users
Email:
Visit this group

THANK YOU!