Distributed Analysis Environment: DIANE & Job definition and management tool: GANGA
From EUAGwiki
[edit] Lectured by
Jingya You
Academia SINICA Grid Computing
Taiwan
mailto:jingya.you@twgrid.org
[edit] Running DIANE
Here's the example of running crash application by DIANE.
This test program can be found after you have installed DIANE in the local end.
Currently, Crash.py is placed in /opt/diane/install/2.0-beta18/python/diane_test_application/crash.py
The run file of Master is in /tmp/diane-demo.py
- Start a DIANE MASTER
[UPM01@ui~]$ diane-run /tmp/diane-demo.py
- Start a DIANE WORKER
[UPM01@ui~]$ diane-worker-start
[root@ui ~]# diane-worker-start
2009-08-05 08:29:10,200 DIANE.setup INFO: This is DIANE version "2.0-beta18"
2009-08-05 08:29:10,239 DIANE.WorkerAgent INFO: workdir: /root
2009-08-05 08:29:10,241 DIANE.config INFO: ==============================
2009-08-05 08:29:10,241 DIANE.config INFO: initial configuration
2009-08-05 08:29:10,241 DIANE.config INFO: ------------------------------
2009-08-05 08:29:10,241 DIANE.config INFO: config.RunMaster.IDLE_WORKER_TIMEOUT = 600
2009-08-05 08:29:10,242 DIANE.config INFO: config.RunMaster.LOST_WORKER_TIMEOUT = 60
2009-08-05 08:29:10,242 DIANE.config INFO: config.RunMaster.CONTROL_DELAY = 1
2009-08-05 08:29:10,242 DIANE.config INFO: config.WorkerAgent.BOOTSTRAP_CONTACT_TIMEOUT = 30
2009-08-05 08:29:10,242 DIANE.config INFO: config.WorkerAgent.APPLICATION_SHELL =
2009-08-05 08:29:10,243 DIANE.config INFO: config.WorkerAgent.PULL_REQUEST_DELAY = 0.2
2009-08-05 08:29:10,243 DIANE.config INFO: config.WorkerAgent.HEARTBEAT_TIMEOUT = 30
2009-08-05 08:29:10,243 DIANE.config INFO: config.WorkerAgent.BOOTSTRAP_CONTACT_REPEAT = 10
2009-08-05 08:29:10,243 DIANE.config INFO: config.WorkerAgent.HEARTBEAT_DELAY = 10
2009-08-05 08:29:10,243 DIANE.config INFO: config.main.cache = ~/dianedir/cache
2009-08-05 08:29:10,244 DIANE.config INFO: ==============================
2009-08-05 08:29:10,257 DIANE.FileTransfer.Client INFO: uploading vcard.txt->00001/vcard.txt
2009-08-05 08:29:10,261 DIANE.FileTransfer.Client INFO: upload OK
2009-08-05 08:29:10,262 DIANE.FileTransfer.Client INFO: downloading _application.tgz<-_application.tgz
2009-08-05 08:29:10,264 DIANE.FileTransfer.Client INFO: download OK
diane_test_applications/
diane_test_applications/__init__.pyc
diane_test_applications/crash.pyc
diane_test_applications/crash.py
diane_test_applications/ExecutableApplication/
diane_test_applications/ExecutableApplication/executable.py
diane_test_applications/ExecutableApplication/__init__.py
diane_test_applications/ExecutableApplication/test/
diane_test_applications/ExecutableApplication/test/executable_test.run
diane_test_applications/ExecutableApplication/test/byebye
diane_test_applications/ExecutableApplication/test/hello
diane_test_applications/sample1/
diane_test_applications/sample1/__init__.py
diane_test_applications/sample1/sample1.py
diane_test_applications/sample1/run.py
diane_test_applications/idle.py
diane_test_applications/test_config.py
diane_test_applications/__init__.py
2009-08-05 08:29:10,287 DIANE.application INFO: application boot and run data received
2009-08-05 08:29:10,287 DIANE.application INFO: workerClassName = 'CrashWorker'
2009-08-05 08:29:10,288 DIANE.application INFO: name = 'diane_test_applications.crash'
2009-08-05 08:29:10,288 DIANE.application INFO: worker_class_name = None
2009-08-05 08:29:10,288 DIANE.application INFO: darname = '_application.tgz'
2009-08-05 08:29:10,288 DIANE.application INFO: runid = 'root@ui.euag.org:/root/diane/runs/0001/'
2009-08-05 08:29:10,289 DIANE.application INFO: application_name = None
2009-08-05 08:29:10,289 DIANE.application INFO: config = {'RunMaster': <diane.config.ConfigSection instance at 0xb7bbe46c>, 'WorkerAgent': <diane.config.ConfigSection instance at 0xb7bbe7ac>, 'main': <diane.config.ConfigSection instance at 0xb7bbec2c>}
2009-08-05 08:29:10,289 DIANE.config INFO: ==============================
2009-08-05 08:29:10,289 DIANE.config INFO: updated configuration
2009-08-05 08:29:10,289 DIANE.config INFO: ------------------------------
2009-08-05 08:29:10,290 DIANE.config INFO: config.RunMaster.IDLE_WORKER_TIMEOUT = 600
2009-08-05 08:29:10,290 DIANE.config INFO: config.RunMaster.LOST_WORKER_TIMEOUT = 60
2009-08-05 08:29:10,290 DIANE.config INFO: config.RunMaster.CONTROL_DELAY = 1
2009-08-05 08:29:10,290 DIANE.config INFO: config.WorkerAgent.BOOTSTRAP_CONTACT_TIMEOUT = 30
2009-08-05 08:29:10,290 DIANE.config INFO: config.WorkerAgent.APPLICATION_SHELL =
2009-08-05 08:29:10,291 DIANE.config INFO: config.WorkerAgent.PULL_REQUEST_DELAY = 0.2
2009-08-05 08:29:10,291 DIANE.config INFO: config.WorkerAgent.HEARTBEAT_TIMEOUT = 30
2009-08-05 08:29:10,291 DIANE.config INFO: config.WorkerAgent.BOOTSTRAP_CONTACT_REPEAT = 10
2009-08-05 08:29:10,291 DIANE.config INFO: config.WorkerAgent.HEARTBEAT_DELAY = 100
2009-08-05 08:29:10,291 DIANE.config INFO: config.main.cache = ~/dianedir/cache
2009-08-05 08:29:10,292 DIANE.config INFO: ==============================
2009-08-05 08:29:10,293 DIANE.CrashApplication INFO: *** initialize: worker id=1
2009-08-05 08:29:10,495 DIANE.CrashApplication INFO: *** do_work: worker id=1 tid=1
2009-08-05 08:29:10,495 DIANE.CrashApplication INFO: *** executing command: 'time.sleep(10)'
2009-08-05 08:29:20,696 DIANE.CrashApplication INFO: *** do_work: worker id=1 tid=2
2009-08-05 08:29:20,696 DIANE.CrashApplication INFO: *** executing command: 'time.sleep(10)'
2009-08-05 08:29:30,896 DIANE.CrashApplication INFO: *** do_work: worker id=1 tid=3
2009-08-05 08:29:30,897 DIANE.CrashApplication INFO: *** executing command: 'time.sleep(10)'
[edit] Running Ganga
Ganga is an easy-to-use Grid job submission framework homepage used in several activities in different application domains, supporting large users community (like in the case of High-Energy Physics) and steering large productions (like for the H5N1 bird flu drug searches or in the International Telecommunication Union digital broadcasting frequency definition).
Ganga supports multiple execution back ends (Grid, Batch, Local) and infrastructures (EGEE Production Grid, GILDA Testbed). Ganga should be installed on a machine enabled to submit jobs to the corresponding back end (in the case of grid submission, we call this machine a Grid User Interface). To use a local batch system (e.g. LSF, PBS) the batch system commands should be available on the same machine.
- Ganga Installation
Skip this step becasue there has been an pre-installed Ganga on the system.
bash> wget http://cern.ch/ganga/download/ganga-install bash> python ganga-install --prefix /opt --extern=GangaAtlas,GangaNG,GangaPanda,GangaGUI,GangaPlotter,GangaLHCb 5.3.1 bash> export PATH=/opt/Ganga/install/5.3.1/bin:$PATH
[edit] Before Launch Ganga
Skip this step because there has been pre-configured on your system.
- Generate .gangarc
[UPM04@ui ~]$ ganga -g
*** Welcome to Ganga *** Version: Ganga-5-3-1 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details. Copied current config file to /home/UPM04/.gangarc.00 Using flavour Created standard config file /home/UPM04/.gangarc
- View .gangarc and make sure the setting is correct
[UPM04@ui ~]$ vi .gangarc
# LCG/gLite/EGEE configuration parameters [LCG] EDG_ENABLE = True EDG_SETUP = /opt/glite/etc/profile.d/grid-env.sh GLITE_ENABLE = True GLITE_SETUP = /opt/glite/etc/profile.d/grid-env.sh VirtualOrganisation = gilda # default attribute values for GridProxy objects [defaults_GridProxy] voms = gilda
[edit] Launching the Ganga CLIP
[UPM04@ui ~]$ ganga
Ganga will initiate your voms-proxy if voms-proxy is not availabe yet.
Cannot find file or dir: /home/UPM04/.glite/vomses Enter GRID pass phrase: Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=Kuala Lumpur/CN=KUALALUMPUR04 Creating temporary proxy ............................... Done Contacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "gilda" Done Creating proxy ....................................................................... Done Your proxy is valid until Sat Aug 1 10:18:21 2009 *** Welcome to Ganga *** Version: Ganga-5-3-1 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details. Ganga.GPIDev.Lib.JobRegistry : INFO Found 1 jobs in "jobs", completed in 0 seconds Ganga.GPIDev.Lib.JobRegistry : INFO Found 0 jobs in "templates", completed in 0 seconds In [1]:
[edit] First Ganga job: running an arbitrary shell script
Running a "HelloWorld" job is straight-forward in Ganga. You could try it by typing the following commands in the Ganga CLIP.
In [1]:Job().submit()
Ganga.GPIDev.Lib.Job : INFO submitting job 1 Ganga.GPIDev.Adapters : INFO submitting job 1 to Local backend Ganga.GPIDev.Lib.Job : INFO job 1 status changed to "submitted" Out[1]: 1
In [2]:jobs
Out[2]: Job slice: jobs (2 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 0 completed Executable LCG ce.euag.org:2119/jobmanager-lcgpbs-gilda # 1 completed Executable Local ui.euag.org Ganga.GPIDev.Lib.Job : INFO job 1 status changed to "running" Ganga.GPIDev.Lib.Job : INFO job 1 status changed to "completed"
Here we will do a little bit advance to run an arbitrary user script using the built-in application of Ganga, the Executable application.
[edit] Preparing your shell script
Since any shell command can be called within the Ganga CLIP, one can start creating a shell script like this:
In [4]:!vi myscript.sh
The following example takes one argument from the command and grabs the hostname, cpuinfo and meminfo of the machine the script is executed.
#!/bin/sh
echo "Hello ${1} !"
echo $HOSTNAME
cat /proc/cpuinfo | grep 'model name'
cat /proc/meminfo | grep 'MemTotal'
echo "Run on `date`"
When you finished editing the script, save and close the editor and you'll be back in Ganga CLIP.
and make the script executable:
In [5]:!chmod +x myscript.sh
[edit] Running the shell script on local machine in interactive mode
Type in the following commands in the Ganga CLIP, you will launch your first Ganga job running a user specified shell script interactively:
In [6]:j = Job()
In [7]:j.application = Executable()
In [8]:j.application.exe = File('myscript.sh')
In [9]:j.application.args = ['KUALA LUMPUR']
In [10]:j.backend=Local()
In [11]:j.submit()
Ganga.GPIDev.Lib.Job : INFO submitting job 5 Ganga.GPIDev.Adapters : INFO submitting job 5 to Local backend Ganga.GPIDev.Lib.Job : INFO job 5 status changed to "submitted" Out[11]: 1
In [12]:j
Out[12]: Job (
status = 'completed' ,
name = ,
inputdir = '/home/UPM04/gangadir/workspace/UPM04/LocalAMGA/5/input/' ,
outputdir = '/home/UPM04/gangadir/workspace/UPM04/LocalAMGA/5/output/' ,
outputsandbox = [] ,
id = 5 ,
info = JobInfo (
submit_counter = 1
) ,
inputdata = None ,
merger = None ,
inputsandbox = [] ,
application = Executable (
exe = File(name='/home/UPM04/myscript.sh',subdir='.') ,
env = {} ,
args = ['KUALA LUMPUR']
) ,
outputdata = None ,
splitter = None ,
subjobs = 'Job slice: jobs(5).subjobs (0 jobs)
' ,
backend = Local (
actualCE = 'ui.euag.org' ,
workdir = '/tmp/tmpCyYKZJ' ,
nice = 0 ,
id = 9125 ,
exitcode = 0
)
)
Ganga.GPIDev.Lib.Job : INFO job 5 status changed to "running"
Ganga.GPIDev.Lib.Job : INFO job 5 status changed to "completed"
In [13]:jobs
Out[13]: Job slice: jobs (6 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 0 completed Executable LCG ce.euag.org:2119/jobmanager-lcgpbs-gilda # 1 completed Executable Local ui.euag.org # 2 failed Executable Local ui.euag.org # 3 running Executable LCG ce.euag.org:2119/jobmanager-lcgpbs-gilda # 4 completed Executable Local ui.euag.org # 5 completed Executable Local ui.euag.org
[edit] Checking out the final outputs
Once the jobs are in the completed state, one can check the output using the following ways:
In [14]:j.peek();
total 8 -rw-rw-r-- 1 UPM04 UPM04 0 Jul 31 11:42 __syslog__ -rw-rw-r-- 1 UPM04 UPM04 156 Jul 31 11:42 stdout -rw-rw-r-- 1 UPM04 UPM04 0 Jul 31 11:42 stderr -rw-rw-r-- 1 UPM04 UPM04 85 Jul 31 11:42 __jobstatus__
In [15]: j.peek('stdout', 'cat')
or
In [15]: cat $j.outputdir/stdout
Hello KUALA LUMPUR ! ui.euag.org model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz MemTotal: 514372 kB Run on Fri Jul 31 11:42:47 UTC 2009
[edit] Running the shell script on your local machine in batch mode (background)
try to do it.
In [16]: j1 = j.copy() In [17]: j1.backend = Local() In [18]: j1.submit()
In [19]: jobs
In [20]: j1.peek('stdout', 'cat')
or
In [21]: cat $j1.outputdir/stdout
[edit] Running the shell script on gLite
We have tested our script locally through Ganga. Now we want to switch to run it on a production grid environment, the gLite. What we have to do in Ganga is just assign the job to use the gLite backend. In the following example, the new job is created by cloning the previous job. This saves time to re-do what we have done before.
In [22]:j2 = j.copy() In [23]:j2.backend = LCG(middleware='GLITE') In [24]:j2.application.args = ['Europe'] In [25]:j2.submit()
Ganga.GPIDev.Lib.Job : INFO submitting job 6 Ganga.GPIDev.Adapters : INFO submitting job 6 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 6 status changed to "submitted" Out[25]: 1
In [26]:jobs
Out[26]: Job slice: jobs (7 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 0 completed Executable LCG ce.euag.org:2119/jobmanager-lcgpbs-gilda # 1 completed Executable Local ui.euag.org # 2 failed Executable Local ui.euag.org # 3 completed Executable LCG ce.euag.org:2119/jobmanager-lcgpbs-gilda # 4 completed Executable Local ui.euag.org # 5 completed Executable Local ui.euag.org # 6 submitted Executable LCG grid010.ct.infn.it:2119/jobmanager-lcgpbs-gil
For the jobs submitted to gLite, the job's logging info could be queried from the gLite logging & bookkeeping system within Ganga.
In [27]:cat $j2.backend.loginfo(verbosity=1)
********************************************************************** LOGGING INFORMATION: Printing info for the Job : https://wms.euag.org:9000/F4iSehb7NJJPIThgdCfyBw --- Event: RegJob - Source = NetworkServer - Timestamp = Fri Jul 31 12:02:26 2009 UTC --- Event: RegJob - Source = NetworkServer - Timestamp = Fri Jul 31 12:02:26 2009 UTC --- Event: Accepted - Source = NetworkServer - Timestamp = Fri Jul 31 12:02:27 2009 UTC --- Event: EnQueued - Result = START - Source = NetworkServer - Timestamp = Fri Jul 31 12:02:28 2009 UTC --- Event: EnQueued - Result = OK - Source = NetworkServer - Timestamp = Fri Jul 31 12:02:29 2009 UTC --- Event: DeQueued - Source = WorkloadManager - Timestamp = Fri Jul 31 12:02:29 2009 UTC --- Event: Match - Dest id = grid010.ct.infn.it:2119/jobmanager-lcgpbs-gilda - Source = WorkloadManager - Timestamp = Fri Jul 31 12:02:29 2009 UTC --- Event: EnQueued - Result = START - Source = WorkloadManager - Timestamp = Fri Jul 31 12:02:30 2009 UTC --- Event: EnQueued - Result = OK - Source = WorkloadManager - Timestamp = Fri Jul 31 12:02:30 2009 UTC --- Event: DeQueued - Source = JobController - Timestamp = Fri Jul 31 12:02:31 2009 UTC --- Event: Transfer - Destination = LogMonitor - Result = START - Source = JobController - Timestamp = Fri Jul 31 12:02:31 2009 UTC --- Event: Transfer - Destination = LogMonitor - Result = OK - Source = JobController - Timestamp = Fri Jul 31 12:02:31 2009 UTC --- Event: Accepted - Source = LogMonitor - Timestamp = Fri Jul 31 12:02:32 2009 UTC **********************************************************************
[edit] list the final outputs
In [28]:j2.peek();
total 32 -rw-rw-r-- 1 UPM04 UPM04 16922 Jul 31 12:56 __jobloginfo__.log -rw-rw-r-- 1 UPM04 UPM04 158 Jul 31 13:28 stdout.gz -rw-rw-r-- 1 UPM04 UPM04 29 Jul 31 13:28 stderr.gz -rw-rw-r-- 1 UPM04 UPM04 2021 Jul 31 13:28 __jobscript__.log
[edit] Exit Ganga
In [29]: <<ctrl-D>> Do you really want to exit ([y]/n)? y
[edit] Check out the final output
[UPM04@ui ~]$ gunzip ./gangadir/workspace/UPM04/LocalAMGA/6/output/stdout.gz [UPM04@ui ~]$ cat gangadir/workspace/UPM04/LocalAMGA/6/output/stdout
Hello Europe ! wnb.euag.org model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz MemTotal: 514372 kB Run on Fri Jul 31 13:07:27 UTC 2009
[edit] Inputsandbox & Outputsandbox
- Inputsandbox
- You can specify your input files and put them into the inputsandbox
- Outputsandbox
- You can specify which file should be copied from the worker node to your machine
- For example:
- j.outputsandbox = ['*.txt']
- This will copy all the .txt files into “j.outputdir”
- test.c – Read a file “input_for_test”, and output every line to “output_for_test”
[UPM04@ui ~]$ vi test.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int count = 0;
int sum = 0;
char str[1000];
FILE *fin, *fout;
fin = fopen("input_for_test", "r+");
fout = fopen("output_for_test", "a+");
while (fscanf(fin, "%s", str) != EOF) {
sum += atoi(str);
if(fscanf(fin, "%s", str) != EOF)
{
sum += atoi(str);
}
fprintf(fout, "Line %2d: %d\n", count, sum);
count++;
sum = 0;
}
fclose(fin);
fclose(fout);
return;
}
[UPM04@ui ~]$ gcc -o test test.c
[UPM04@ui ~]$ vi input_for_test
2 3 33 44 321 456 1234 5678
[UPM04@ui ~]$ ganga
In [1]:jj = Job()
In [2]:jj.application = Executable()
In [3]:jj.application.exe = File('~/test')
In [4]:jj.application.args = []
In [5]:jj.inputsandbox = ['input_for_test']
In [6]:jj.outputsandbox = ['output_for_test']
In [7]:jj.backend = LCG(middleware='GLITE')
In [8]:print jj
Job (
status = 'new' ,
name = ,
inputdir = '/home/UPM04/gangadir/workspace/UPM04/LocalAMGA/3/input/' ,
outputdir = '/home/UPM04/gangadir/workspace/UPM04/LocalAMGA/3/output/' ,
outputsandbox = ['output_for_test'] ,
id = 3 ,
info = JobInfo (
submit_counter = 0
) ,
inputdata = None ,
merger = None ,
inputsandbox = [File(name='/home/UPM04/input_for_test',subdir='.')] ,
application = Executable (
exe = File(name='/home/UPM04/test',subdir='.') ,
env = {} ,
args = ['Hello World']
) ,
outputdata = None ,
splitter = None ,
subjobs = 'Job slice: jobs(3).subjobs (0 jobs)
' ,
backend = LCG (
status = None ,
actualCE = None ,
iocache = ,
middleware = 'GLITE' ,
CE = ,
perusable = False ,
reason = None ,
exitcode_lcg = None ,
id = None ,
jobtype = 'Normal' ,
exitcode = None ,
requirements = LCGRequirements (
nodenumber = 1 ,
ipconnectivity = False ,
cputime = None ,
other = [] ,
memory = None ,
software = [] ,
walltime = None
)
)
)
In [9]:jj.submit()
Ganga.GPIDev.Lib.Job : INFO submitting job 3 Ganga.GPIDev.Adapters : INFO submitting job 3 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 3 status changed to "submitted" Out[9]: 1
In [10]:jobs Out[10]: Job slice: jobs (4 jobs) -------------- # fqid status name subjobs application backend backend.actualCE # 0 completed Executable Local glite-tutor.ct.infn.it # 1 completed Executable Interactive glite-tutor.ct.infn.it # 2 completed Executable LCG gn0.hpcc.sztaki.hu:2119/jobmanager-lcgpbs-gil # 3 completing Executable LCG grid010.ct.infn.it:2119/jobmanager-lcgpbs-inf
Ganga.GPIDev.Lib.Job : INFO job 3 status changed to "running"
In [11]:jj.peek()
total 40K -rw-r--r-- 1 UPM04 users 141 Oct 18 17:31 output_for_test -rw-rw-r-- 1 UPM04 users 27 Oct 18 17:36 stdout.gz -rw-rw-r-- 1 UPM04 users 27 Oct 18 17:36 stderr.gz -rw-rw-r-- 1 UPM04 users 223 Oct 18 17:36 _output_sandbox.tgz -rw-rw-r-- 1 UPM04 users 1.7K Oct 18 17:36 __jobscript__.log
In [12]:cat $jj.outputdir/output_for_test
Line 0: 5 Line 1: 77 Line 2: 777 Line 3: 6912
In [13]: Do you really want to exit ([y]/n)? y [UPM04@ui ~]$


