Advanced Jobs

From EUAGwiki

Jump to: navigation, search

Contents

[edit] About this page

This page is intended to help in the practice on Advanced Jobs in gLite. The following job types will be exercize:

  1. #Sandboxes
  2. #Parametric Jobs
  3. #Collection Jobs
  4. #DAG Jobs


[edit] Sandboxes

[edit] Using data from SE

copyDataFromSE.sh

inputFromSE.jdl


[edit] Storing output on SE

CopyAndRegisterOutput.sh

simpleExeFile.sh

storeOutputOnSE.jdl

[edit] Parametric Jobs

[edit] String Arguments

For this exercise, the writing of a simple bash (Bourne Again SHell) script is required.

The bash script will use the "case" syntax:

case "$variable" in
'value')
command
;;
*) #THIS IS THE DEFAULT CASE
echo "Not a valid option"
esac


The script, called echo_node_properties.sh, will take one string argument, which will be the type of information we want to retrieve from the Worker Node (WN). For this exercise, we want to know:

  • The hostname (/bin/hostname)
  • The hardware (/sbin/lspci)
  • The overall running processes (/bin/ps aux)

So, as an example, these three information can be labelled as: hostname, hardware, processes.

Our bash script will have the structure:

#!/bin/bash

#ARGUMENT CHECK: $# is the number of arguments excluded the first (the executable name), ${0,1,2...} the arguments
if [ $# -ne 1 ] ; then
  echo "Usage: ${0} [hostname, date, lspci, ps]"
fi

#CHOICE OF COMMAND
case "$1" in
'hostname')
/bin/hostname
;;
<PUT YOUR OTHER COMMANDS> 
echo "must be: <set your own variable list>"
esac

If you encounter problems in writing such a script, have a look at the following example (but please try yourself before!): echo_node_properties.sh


[edit] Integer Arguments

A parametric job can take also integer parameters. In order to try this feature, a C++ program will be created.

Montecarlo integration is a quite simple technique where a set of pairs of random points (x,y) is cast and compared with the function to be integred .

The value of the integral is then:

  • A*[Number of points below the function]/[Number of total points]

where A is the total area where the points have been casted.

The C program should take then one integer argument, in this case the number of points to be casted. The program structure will be something like:


The declaration of our to-be-integred function:

double FUNC(float x){
  float result = x;
  return result;
}


In our main, we take the first argument as the number of integration points and define the X,Y upper/lower limits

 int n = atoi(argv[1]);
 double LOWER_X_LIMIT = 0;
 double UPPER_X_LIMIT = 1;
 double LOWER_Y_LIMIT = 0;
 double UPPER_Y_LIMIT = 1;


The real integration cycle, where the (x,y) random pair is created and compared to f(x):

 for(i=0;i<n;i++){
    try_x = (UPPER_X_LIMIT - LOWER_X_LIMIT) * ((double)rand())/((double)RAND_MAX);
    try_y = (UPPER_Y_LIMIT - LOWER_Y_LIMIT) * rand()/(RAND_MAX);
    if( try_y < FUNC( try_x )) sum += 1;
 }


The computation of the integral:

 sum /= (double) n;
 sum *= (UPPER_X_LIMIT-LOWER_X_LIMIT)*(UPPER_Y_LIMIT - LOWER_Y_LIMIT);
 printf("%.2f\n",sum);


The program can be compiled on the UI as:

[leonardo@ui ~]$ gcc -o integral1 integral1.c

and the result can be executed as ./integral1

After having tested the program locally:

  • Create a parametric jdl with integer parameters (please, keep the number of interations below 100, this is only a test!)
  • Let’s say, 3 values for the parameter
  • Launch and see if it works

If you have problems with the executable, try to have a look to integral1.c or retrieve the executable:

# wget http://moby.mib.infn.it/~sala/EAGSS09/EXE/integral1

[edit] Collection Jobs

A collection job is a special kind of jobs which join separates jobs and treat them as a whole unity. To test the potentiality of it, we'll take the parametric string example and modify it. Create a jdl for a parametric jobs which launches as executables:

  • /bin/hostname
  • /sbin/lspci
  • /bin/ps (with "aux" as argument)

The output should be very similar to the foreamentioned exercize.


[edit] DAG Jobs

The MC integration program explained above depends on the sequence of the generated random numbers. These sequences are generated starting from a Random Seed, which is the number gives as input to the random generator. Changing the random seed will change the random sequence and consequently the result of our numeric integration: obviously, the results for different random seeds will converge in the limit of large N.

To see what is the uncertainty due to the random seed, this can be changed, taking as example the Unix date in seconds. It is enough to add some lines to the integral1.c program:

char line[80];
FILE *fr;
if( argc==3 ){
          fr = fopen (argv[2], "rt");  /* open the file for reading */

  while(fgets(line, 80, fr) != NULL)
  {
        /* get a line, up to 80 chars from fr.  done if NULL */
        sscanf (line, "%ld", &seed);
       }
   srand(seed);
 }


so the exe can accept a second argument (the seed). Compile the new program (integral2) and try it locally: if you are not able to modify the program refer to integral2.c and compile it:

# gcc -o integral2 integral2.c

or download the executable:

# wget http://moby.mib.infn.it/~sala/EAGSS09/EXE/integral2

The DAG job will be made by 3 layers:

  1. "Father layer": a single node which gives as output the random seed (the date)
  2. "Sons layer": two nodes which take the output from the father (the random seed) and use it to compute the integral of the same function with different number of interactions N (eg: 10 and 100)
  3. "Final layer": a script which merge the sons' outputs

So, Sons must share the output in the Father's OutputSandBox, while the Final layer must share the outputs in both Father and Sons' OutputSandBoxes


The first layer can be done simply using the /bin/date command:

* date +%s

which outputs the seconds passed since the 01-01-1970.

The final layer can be done with a batch script, ( see integral_cfr.sh), which simply echoes the output files retrieved from the sons.



JDL solutions

Personal tools