Use Parrot for attaching existing programs to remote I/O systems

From EUAGwiki

Jump to: navigation, search

Contents

[edit] Parrot

[edit] Lectured by

Image:INFN_logo.PNG
Giuseppe LA ROCCA
Italian National Institute of Nuclear Physics
Italy
mailto:giuseppe.larocca@ct.infn.it

[edit] About this page

Goal of this wiki page is to provide some hints and examples about how to use Parrot, a tool which can be use to accesses files stored on remote storages by normal applications. It does not require any special privilages. It can be applied to any program without re-writing, re-linking, or re-installing. Several are the features introduced by PARROT. Among them, it allows users to start a shell under Parrot that can be used by suers to mount any number of remote file systems locally as an unprivileged user.

It supports various remote protocols such as: HTTP, FTP, GridFTP, iRODS, SRB, HDFS, RFIO, DCAP and Chirp. It also supports a number of authentication methods, including GSI, Kerberos, hostname and IP-address.

Parrot is distributed as part of the Cooperative Computing Tools (cctools). The Cooperative Computing Tools are a collection of programs designed to assist users with the difficulties of building and managing complex, fault-prone distributed systems.

The components of the cctools are:

  • Parrot - A personal virtual file system;
  • Chirp - A distributed file and storage system;
  • Makeflow - A workflow engine similar to Make;
  • Work Queue - A flexible master-worker library;
  • Watchdog - A reliable process manager;
  • ftsh - A fault tolerant shell language;
  • FTP-Lite - A light weight FTP client library.


Parrot is Copyright (C) 2003-2004 Douglas Thain and Copyright (C) 2005- The University of Notre Dame. All rights reserved. This software is distributed under the GNU General Public License. See the file COPYING for details.

[edit] Download the Cooperative Computing Tools

Download the cctools packages in your working directory according to your system architecture.

A complete list of stable versions of this tools are available here [1]

Uncompress the tar in your working directory (e.g. /opt/) using the command:

$ tar zxvf cctools-current-i686-linux-2.6.tar.gz 
cctools-current-i686-linux-2.6/
cctools-current-i686-linux-2.6/include/
cctools-current-i686-linux-2.6/include/list.h
cctools-current-i686-linux-2.6/include/sha1.h
cctools-current-i686-linux-2.6/include/catalog_query.h
cctools-current-i686-linux-2.6/include/work_queue.h
cctools-current-i686-linux-2.6/include/hash_cache.h
cctools-current-i686-linux-2.6/include/itable.h
cctools-current-i686-linux-2.6/include/file_cache.h
cctools-current-i686-linux-2.6/include/batch_job.h
cctools-current-i686-linux-2.6/include/console_login.h
[cut ..]

After uncompressing your directory should contains ...

$ cd cctools-current-i686-linux-2.6
$ ls -al
drwxr-xr-x  2 larocca users  8192 Oct 10  2008 bin
-rwxr-xr-x  1 larocca users   788 Jan 16 10:56 doc
drwxr-xr-x  2 larocca users  8192 Oct 12 11:36 etc
drwxr-xr-x  2 larocca users  8192 Oct 12 11:44 include
-rw-r--r--  1 larocca users  8922 Jan 15 17:14 lib

If ctools have been successfully installed you can run PARROT and test which support are enabled.

$ /opt/cctools-current-i686-linux-2.6/bin/parrot -h

Use: /opt/cctools-current-i686-linux-2.6/bin/parrot [options] <command> ...
Where options and environment variables are:
  -a <list>  Use these Chirp authentication methods.   (PARROT_CHIRP_AUTH)
  -A <file>  Use this file as a default ACL.          (PARROT_DEFAULT_ACL)
  -b <bytes> Set the I/O block size hint.              (PARROT_BLOCK_SIZE)
  -d <name>  Enable debugging for this sub-system.    (PARROT_DEBUG_FLAGS)
  -D         Disable small file optimizations.
  -F         Enable file snapshot caching for all protocols.
  -f         Disable following symlinks.
  -E <url>   Endpoint for gLite combined catalog ifc. (PARROT_GLITE_CCURL)
  -G <num>   Fake this gid; Real gid stays the same.          (PARROT_GID)
  -H         Disable use of helper library.
  -h         Show this screen.
  -K         Checksum files where available.
  -k         Do not checksum files.
  -l <path>  Path to ld.so to use.                      (PARROT_LDSO_PATH)
  -m <file>  Use this file as a mountlist.             (PARROT_MOUNT_FILE)
  -M/foo=/bar Mount (redirect) /foo to /bar.         (PARROT_MOUNT_STRING)
  -N <name>  Pretend that this is my hostname.          (PARROT_HOST_NAME)
  -o <file>  Send debugging messages to this file.     (PARROT_DEBUG_FILE)
  -O <bytes> Rotate debug files of this size.     (PARROT_DEBUG_FILE_SIZE)
  -p <hst:p> Use this proxy server for HTTP requests.         (HTTP_PROXY)
  -Q         Inhibit catalog queries to list /chirp.
  -R <cksum> Enforce this root filesystem checksum, where available.
  -s         Use streaming protocols without caching.(PARROT_FORCE_STREAM)
  -S         Enable whole session caching for all protocols.
  -t <dir>   Where to store temporary files.             (PARROT_TEMP_DIR)
  -T <time>  Maximum amount of time to retry failures.    (PARROT_TIMEOUT)
  -U <num>   Fake this unix uid; Real uid stays the same.     (PARROT_UID)
  -u <name>  Use this extended username.                 (PARROT_USERNAME)
  -v         Display version number.
  -w         Initial working directory.
  -W         Display table of system calls trapped.
  -Y         Force sYnchronous disk writes.            (PARROT_FORCE_SYNC)
  -Z         Enable automatic decompression on .gz files.

Known debugging sub-systems are: 
syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp 
landlord multi dcap rfio glite lfc gfal grow pstree alloc cache poll hdfs bxgrid remote 
debug login irods wq all time pid 

Enabled filesystems are: http grow ftp anonftp gsiftp chirp irods hdfs bxgrid

[edit] A real use case: BLAST & PARROT

Let's suppose we have installed NCBI databases on Grid and we want to run BLAST software for a given target sequence. Before to start, we nned to create the proxy certificate with the command:

$ voms-proxy-init --voms euasia
Cannot find file or dir: /home/larocca/.glite/vomses
Enter GRID pass phrase:
Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca
Creating temporary proxy ......................................................... Done
Contacting  voms.grid.sinica.edu.tw:15015 [/C=TW/O=AS/OU=GRID/CN=voms.grid.sinica.edu.tw] "euasia" Done
Creating proxy ................................................... Done
Your proxy is valid until Mon Jan 18 21:46:41 2010
$ voms-proxy-info --all
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca/CN=proxy
issuer    : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca
identity  : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u560
timeleft  : 11:59:57
=== VO euasia extension information ===
VO        : euasia
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca
issuer    : /C=TW/O=AS/OU=GRID/CN=voms.grid.sinica.edu.tw
attribute : /euasia/Role=NULL/Capability=NULL
timeleft  : 11:59:57
uri       : voms.grid.sinica.edu.tw:15015

Assuming NCBI databases have been successfully registered in the LFC File Catalog (e.g.: /grid/euasia/blast_dbs) ...

$ lfc-ls /grid/euasia/blast_dbs
Homo_sapiens.NCBI36.apr.dna.chromosome.nhr
Homo_sapiens.NCBI36.apr.dna.chromosome.nin
Homo_sapiens.NCBI36.apr.dna.chromosome.nsd
Homo_sapiens.NCBI36.apr.dna.chromosome.nsi
Homo_sapiens.NCBI36.apr.dna.chromosome.nsq

and referencied with the following SURL ....

$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nhr
srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filefe4defc1-eeca-4310-b4b0-942f19466f5c

$ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filefe4defc1-eeca-4310-b4b0-942f19466f5c gsiftp
gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0
bfecb259-2bcb-4c85-b16a-a328ce1cc66f


$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nin
srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1

$ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1 gsiftp
gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0
78a3d4d7-d6fc-4c00-8194-9c6ac3dbe419


$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsd
srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3

$ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3 gsiftp
gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/storage/euasia/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3.231544.0
4ffe7397-a848-4524-bcf4-25c8f48ee1f9


$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsi
srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e

$ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e gsiftp
gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/glite-se/euasia/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e.231545.0
ac257877-d026-40d9-8f3b-cfefc7cacb63


$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsq
srm://rhino.lsr.nectec.or.th/dpm/lsr.nectec.or.th/home/euasia/generated/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973

$ lcg-gt srm://rhino.lsr.nectec.or.th/dpm/lsr.nectec.or.th/home/euasia/generated/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973 gsiftp
gsiftp://rhino.lsr.nectec.or.th/rhino.lsr.nectec.or.th:/dpmdisk/euasia/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973.208252.0
e8a4941f-c6ea-40e3-919a-0fbd5ee7327a

Now, in order to make easier users to access NCBI databases, Parrot allows you to create a custom namespace for any program. All file name activity passes through the Parrot name resolver, which can transform any given filename according to a series of rules that you specify.

In our use case, we can define the list of path(s) to be mounteded as follow:

$ cat mountinglist_Homo_sapiens 
/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nhr /gsiftp/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0

/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nin /gsiftp/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0

/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsd /gsiftp/se01.knowledgegrid.net.my:/storage/euasia/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3.231544.0

/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsi /gsiftp/se01.knowledgegrid.net.my:/glite-se/euasia/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e.231545.0

/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsq /gsiftp/rhino.lsr.nectec.or.th:/dpmdisk/euasia/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973.208252.0

At this point, we can instruct PARROT to mount these path(s) and run BLAST skipping the downloading of the databases in the current working directory.

$ cd /opt/cctools-current-i686-linux-2.6/bin/
$ parrot -m mountinglist_Homo_sapiens \
	 blast-2.2.14/bin/blastall \
	 -p blastn \
	 -d Homo_sapiens.NCBI36.apr.dna.chromosome \
	 -i query.input \
	 -o outfile.blastn

The outfile.blastn file created by BLAST looks like:

BLASTN 2.2.14 [May-07-2006]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Database: Homo_sapiens.NCBI36.apr.dna.chromosome 
           29 sequences; 3,652,008,333 total letters

Searching

Query= (31 letters)
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

10 dna:chromosome chromosome:NCBI36:10:1:135374737:1                   62   1e-08
2 dna:chromosome chromosome:NCBI36:2:1:242751149:1                     40   0.039
1 dna:chromosome chromosome:NCBI36:1:1:247199719:1                     40   0.039
12 dna:chromosome chromosome:NCBI36:12:1:132289534:1                   40   0.039
13 dna:chromosome chromosome:NCBI36:13:1:114127980:1                   38   0.15 
6 dna:chromosome chromosome:NCBI36:6:1:170896992:1                     36   0.61 
3 dna:chromosome chromosome:NCBI36:3:1:199446827:1                     36   0.61 
16 dna:chromosome chromosome:NCBI36:16:1:88822254:1                    36   0.61 
14 dna:chromosome chromosome:NCBI36:14:1:106360585:1                   36   0.61 
X dna:chromosome chromosome:NCBI36:X:1:154913754:1                     34   2.4  
9 dna:chromosome chromosome:NCBI36:9:1:140273252:1                     34   2.4  
8 dna:chromosome chromosome:NCBI36:8:1:146274826:1                     34   2.4 
7 dna:chromosome chromosome:NCBI36:7:1:158821424:1                     34   2.4  
4 dna:chromosome chromosome:NCBI36:4:1:191263063:1                     34   2.4  
22 dna:chromosome chromosome:NCBI36:22:1:49591432:1                    34   2.4  
20 dna:chromosome chromosome:NCBI36:20:1:62435964:1                    34   2.4  
19 dna:chromosome chromosome:NCBI36:19:1:63806651:1                    34   2.4  
15 dna:chromosome chromosome:NCBI36:15:1:100338915:1                   34   2.4  
Y dna:chromosome chromosome:NCBI36:Y:2709521:57443437:1                32   9.6  
5 dna:chromosome chromosome:NCBI36:5:1:180837866:1                     32   9.6  
18 dna:chromosome chromosome:NCBI36:18:1:76117153:1                    32   9.6  
17 dna:chromosome chromosome:NCBI36:17:1:78654742:1                    32   9.6  
11 dna:chromosome chromosome:NCBI36:11:1:134452384:1                   32   9.6  

>10 dna:chromosome chromosome:NCBI36:10:1:135374737:1
          Length = 135374737

 Score = 61.9 bits (31), Expect = 1e-08
 Identities = 31/31 (100%)
 Strand = Plus / Plus

Query: 1      ctaaaatgaaaggttttgggttttggccagc 31
              |||||||||||||||||||||||||||||||
Sbjct: 277986 ctaaaatgaaaggttttgggttttggccagc 278016

 Score = 38.2 bits (19), Expect = 0.15
 Identities = 19/19 (100%)
 Strand = Plus / Plus
                                  
Query: 11      aggttttgggttttggcca 29
               |||||||||||||||||||
Sbjct: 8656761 aggttttgggttttggcca 8656779

 Score = 36.2 bits (18), Expect = 0.61
 Identities = 21/22 (95%)
 Strand = Plus / Plus

Query: 4        aaatgaaaggttttgggttttg 25
                |||||||| |||||||||||||
Sbjct: 54303149 aaatgaaatgttttgggttttg 54303170

 Score = 34.2 bits (17), Expect = 2.4
 Identities = 20/21 (95%)
 Strand = Plus / Plus
                                     
Query: 10       aaggttttgggttttggccag 30
                ||||||||||| |||||||||
Sbjct: 17139118 aaggttttgggctttggccag 17139138

 Score = 34.2 bits (17), Expect = 2.4
 Identities = 20/21 (95%)
 Strand = Plus / Plus

[..cut..]

Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Sequences: 29
Number of Hits to DB: 1,479,010
Number of extensions: 84036
Number of successful extensions: 146
Number of sequences better than 10.0: 25
Number of HSP's gapped: 146
Number of HSP's successfully gapped: 146
Length of query: 31
Length of database: 3,652,008,333
Length adjustment: 18
Effective length of query: 13
Effective length of database: 3,652,007,811
Effective search space: 47476101543
Effective search space used: 47476101543
X1: 11 (21.8 bits)
X2: 15 (29.7 bits)
X3: 25 (49.6 bits)
S1: 16 (32.2 bits)
S2: 16 (32.2 bits)

[edit] References

http://www.cse.nd.edu/~ccl/software/parrot/ http://www.cse.nd.edu/~ccl/software/manuals/parrot.html

[edit] Book Chapter & Journal Articles

http://www.cse.nd.edu/~dthain/papers/

Personal tools