Use Parrot for attaching existing programs to remote I/O systems
From EUAGwiki
Contents |
[edit] Parrot
[edit] Lectured by
Giuseppe LA ROCCA
Italian National Institute of Nuclear Physics
Italy
mailto:giuseppe.larocca@ct.infn.it
[edit] About this page
Goal of this wiki page is to provide some hints and examples about how to use Parrot, a tool which can be use to accesses files stored on remote storages by normal applications. It does not require any special privilages. It can be applied to any program without re-writing, re-linking, or re-installing. Several are the features introduced by PARROT. Among them, it allows users to start a shell under Parrot that can be used by suers to mount any number of remote file systems locally as an unprivileged user.
It supports various remote protocols such as: HTTP, FTP, GridFTP, iRODS, SRB, HDFS, RFIO, DCAP and Chirp. It also supports a number of authentication methods, including GSI, Kerberos, hostname and IP-address.
Parrot is distributed as part of the Cooperative Computing Tools (cctools). The Cooperative Computing Tools are a collection of programs designed to assist users with the difficulties of building and managing complex, fault-prone distributed systems.
The components of the cctools are:
- Parrot - A personal virtual file system;
- Chirp - A distributed file and storage system;
- Makeflow - A workflow engine similar to Make;
- Work Queue - A flexible master-worker library;
- Watchdog - A reliable process manager;
- ftsh - A fault tolerant shell language;
- FTP-Lite - A light weight FTP client library.
Parrot is Copyright (C) 2003-2004 Douglas Thain and Copyright (C) 2005-
The University of Notre Dame. All rights reserved.
This software is distributed under the GNU General Public License. See the file COPYING for details.
[edit] Download the Cooperative Computing Tools
Download the cctools packages in your working directory according to your system architecture.
A complete list of stable versions of this tools are available here [1]
Uncompress the tar in your working directory (e.g. /opt/) using the command:
$ tar zxvf cctools-current-i686-linux-2.6.tar.gz cctools-current-i686-linux-2.6/ cctools-current-i686-linux-2.6/include/ cctools-current-i686-linux-2.6/include/list.h cctools-current-i686-linux-2.6/include/sha1.h cctools-current-i686-linux-2.6/include/catalog_query.h cctools-current-i686-linux-2.6/include/work_queue.h cctools-current-i686-linux-2.6/include/hash_cache.h cctools-current-i686-linux-2.6/include/itable.h cctools-current-i686-linux-2.6/include/file_cache.h cctools-current-i686-linux-2.6/include/batch_job.h cctools-current-i686-linux-2.6/include/console_login.h [cut ..]
After uncompressing your directory should contains ...
$ cd cctools-current-i686-linux-2.6 $ ls -al drwxr-xr-x 2 larocca users 8192 Oct 10 2008 bin -rwxr-xr-x 1 larocca users 788 Jan 16 10:56 doc drwxr-xr-x 2 larocca users 8192 Oct 12 11:36 etc drwxr-xr-x 2 larocca users 8192 Oct 12 11:44 include -rw-r--r-- 1 larocca users 8922 Jan 15 17:14 lib
If ctools have been successfully installed you can run PARROT and test which support are enabled.
$ /opt/cctools-current-i686-linux-2.6/bin/parrot -h Use: /opt/cctools-current-i686-linux-2.6/bin/parrot [options] <command> ... Where options and environment variables are: -a <list> Use these Chirp authentication methods. (PARROT_CHIRP_AUTH) -A <file> Use this file as a default ACL. (PARROT_DEFAULT_ACL) -b <bytes> Set the I/O block size hint. (PARROT_BLOCK_SIZE) -d <name> Enable debugging for this sub-system. (PARROT_DEBUG_FLAGS) -D Disable small file optimizations. -F Enable file snapshot caching for all protocols. -f Disable following symlinks. -E <url> Endpoint for gLite combined catalog ifc. (PARROT_GLITE_CCURL) -G <num> Fake this gid; Real gid stays the same. (PARROT_GID) -H Disable use of helper library. -h Show this screen. -K Checksum files where available. -k Do not checksum files. -l <path> Path to ld.so to use. (PARROT_LDSO_PATH) -m <file> Use this file as a mountlist. (PARROT_MOUNT_FILE) -M/foo=/bar Mount (redirect) /foo to /bar. (PARROT_MOUNT_STRING) -N <name> Pretend that this is my hostname. (PARROT_HOST_NAME) -o <file> Send debugging messages to this file. (PARROT_DEBUG_FILE) -O <bytes> Rotate debug files of this size. (PARROT_DEBUG_FILE_SIZE) -p <hst:p> Use this proxy server for HTTP requests. (HTTP_PROXY) -Q Inhibit catalog queries to list /chirp. -R <cksum> Enforce this root filesystem checksum, where available. -s Use streaming protocols without caching.(PARROT_FORCE_STREAM) -S Enable whole session caching for all protocols. -t <dir> Where to store temporary files. (PARROT_TEMP_DIR) -T <time> Maximum amount of time to retry failures. (PARROT_TIMEOUT) -U <num> Fake this unix uid; Real uid stays the same. (PARROT_UID) -u <name> Use this extended username. (PARROT_USERNAME) -v Display version number. -w Initial working directory. -W Display table of system calls trapped. -Y Force sYnchronous disk writes. (PARROT_FORCE_SYNC) -Z Enable automatic decompression on .gz files. Known debugging sub-systems are: syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp landlord multi dcap rfio glite lfc gfal grow pstree alloc cache poll hdfs bxgrid remote debug login irods wq all time pid Enabled filesystems are: http grow ftp anonftp gsiftp chirp irods hdfs bxgrid
[edit] A real use case: BLAST & PARROT
Let's suppose we have installed NCBI databases on Grid and we want to run BLAST software for a given target sequence. Before to start, we nned to create the proxy certificate with the command:
$ voms-proxy-init --voms euasia Cannot find file or dir: /home/larocca/.glite/vomses Enter GRID pass phrase: Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca Creating temporary proxy ......................................................... Done Contacting voms.grid.sinica.edu.tw:15015 [/C=TW/O=AS/OU=GRID/CN=voms.grid.sinica.edu.tw] "euasia" Done Creating proxy ................................................... Done Your proxy is valid until Mon Jan 18 21:46:41 2010
$ voms-proxy-info --all subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca/CN=proxy issuer : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca identity : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca type : proxy strength : 1024 bits path : /tmp/x509up_u560 timeleft : 11:59:57 === VO euasia extension information === VO : euasia subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca issuer : /C=TW/O=AS/OU=GRID/CN=voms.grid.sinica.edu.tw attribute : /euasia/Role=NULL/Capability=NULL timeleft : 11:59:57 uri : voms.grid.sinica.edu.tw:15015
Assuming NCBI databases have been successfully registered in the LFC File Catalog (e.g.: /grid/euasia/blast_dbs) ...
$ lfc-ls /grid/euasia/blast_dbs Homo_sapiens.NCBI36.apr.dna.chromosome.nhr Homo_sapiens.NCBI36.apr.dna.chromosome.nin Homo_sapiens.NCBI36.apr.dna.chromosome.nsd Homo_sapiens.NCBI36.apr.dna.chromosome.nsi Homo_sapiens.NCBI36.apr.dna.chromosome.nsq
and referencied with the following SURL ....
$ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nhr srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filefe4defc1-eeca-4310-b4b0-942f19466f5c $ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filefe4defc1-eeca-4310-b4b0-942f19466f5c gsiftp gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0 bfecb259-2bcb-4c85-b16a-a328ce1cc66f $ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nin srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1 $ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1 gsiftp gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0 78a3d4d7-d6fc-4c00-8194-9c6ac3dbe419 $ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsd srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3 $ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3 gsiftp gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/storage/euasia/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3.231544.0 4ffe7397-a848-4524-bcf4-25c8f48ee1f9 $ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsi srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e $ lcg-gt srm://se01.knowledgegrid.net.my/dpm/knowledgegrid.net.my/home/euasia/generated/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e gsiftp gsiftp://se01.knowledgegrid.net.my/se01.knowledgegrid.net.my:/glite-se/euasia/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e.231545.0 ac257877-d026-40d9-8f3b-cfefc7cacb63 $ lcg-lr --vo euasia lfn:/grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsq srm://rhino.lsr.nectec.or.th/dpm/lsr.nectec.or.th/home/euasia/generated/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973 $ lcg-gt srm://rhino.lsr.nectec.or.th/dpm/lsr.nectec.or.th/home/euasia/generated/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973 gsiftp gsiftp://rhino.lsr.nectec.or.th/rhino.lsr.nectec.or.th:/dpmdisk/euasia/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973.208252.0 e8a4941f-c6ea-40e3-919a-0fbd5ee7327a
Now, in order to make easier users to access NCBI databases, Parrot allows you to create a custom namespace for any program. All file name activity passes through the Parrot name resolver, which can transform any given filename according to a series of rules that you specify.
In our use case, we can define the list of path(s) to be mounteded as follow:
$ cat mountinglist_Homo_sapiens /grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nhr /gsiftp/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0 /grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nin /gsiftp/se01.knowledgegrid.net.my:/path1/euasia/2010-01-19/file2e2ed636-2529-469d-9ba1-3e6d0411b5c1.231543.0 /grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsd /gsiftp/se01.knowledgegrid.net.my:/storage/euasia/2010-01-19/filea3f2d4c9-0719-43af-9933-f402cd9acce3.231544.0 /grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsi /gsiftp/se01.knowledgegrid.net.my:/glite-se/euasia/2010-01-19/filebb309693-73c4-4d47-a963-d390eab8fe4e.231545.0 /grid/euasia/blast_dbs/Homo_sapiens.NCBI36.apr.dna.chromosome.nsq /gsiftp/rhino.lsr.nectec.or.th:/dpmdisk/euasia/2010-01-20/file48a87069-7ac4-449f-9361-7ed703d00973.208252.0
At this point, we can instruct PARROT to mount these path(s) and run BLAST skipping the downloading of the databases in the current working directory.
$ cd /opt/cctools-current-i686-linux-2.6/bin/ $ parrot -m mountinglist_Homo_sapiens \ blast-2.2.14/bin/blastall \ -p blastn \ -d Homo_sapiens.NCBI36.apr.dna.chromosome \ -i query.input \ -o outfile.blastn
The outfile.blastn file created by BLAST looks like:
BLASTN 2.2.14 [May-07-2006]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Database: Homo_sapiens.NCBI36.apr.dna.chromosome
29 sequences; 3,652,008,333 total letters
Searching
Query= (31 letters)
Score E
Sequences producing significant alignments: (bits) Value
10 dna:chromosome chromosome:NCBI36:10:1:135374737:1 62 1e-08
2 dna:chromosome chromosome:NCBI36:2:1:242751149:1 40 0.039
1 dna:chromosome chromosome:NCBI36:1:1:247199719:1 40 0.039
12 dna:chromosome chromosome:NCBI36:12:1:132289534:1 40 0.039
13 dna:chromosome chromosome:NCBI36:13:1:114127980:1 38 0.15
6 dna:chromosome chromosome:NCBI36:6:1:170896992:1 36 0.61
3 dna:chromosome chromosome:NCBI36:3:1:199446827:1 36 0.61
16 dna:chromosome chromosome:NCBI36:16:1:88822254:1 36 0.61
14 dna:chromosome chromosome:NCBI36:14:1:106360585:1 36 0.61
X dna:chromosome chromosome:NCBI36:X:1:154913754:1 34 2.4
9 dna:chromosome chromosome:NCBI36:9:1:140273252:1 34 2.4
8 dna:chromosome chromosome:NCBI36:8:1:146274826:1 34 2.4
7 dna:chromosome chromosome:NCBI36:7:1:158821424:1 34 2.4
4 dna:chromosome chromosome:NCBI36:4:1:191263063:1 34 2.4
22 dna:chromosome chromosome:NCBI36:22:1:49591432:1 34 2.4
20 dna:chromosome chromosome:NCBI36:20:1:62435964:1 34 2.4
19 dna:chromosome chromosome:NCBI36:19:1:63806651:1 34 2.4
15 dna:chromosome chromosome:NCBI36:15:1:100338915:1 34 2.4
Y dna:chromosome chromosome:NCBI36:Y:2709521:57443437:1 32 9.6
5 dna:chromosome chromosome:NCBI36:5:1:180837866:1 32 9.6
18 dna:chromosome chromosome:NCBI36:18:1:76117153:1 32 9.6
17 dna:chromosome chromosome:NCBI36:17:1:78654742:1 32 9.6
11 dna:chromosome chromosome:NCBI36:11:1:134452384:1 32 9.6
>10 dna:chromosome chromosome:NCBI36:10:1:135374737:1
Length = 135374737
Score = 61.9 bits (31), Expect = 1e-08
Identities = 31/31 (100%)
Strand = Plus / Plus
Query: 1 ctaaaatgaaaggttttgggttttggccagc 31
|||||||||||||||||||||||||||||||
Sbjct: 277986 ctaaaatgaaaggttttgggttttggccagc 278016
Score = 38.2 bits (19), Expect = 0.15
Identities = 19/19 (100%)
Strand = Plus / Plus
Query: 11 aggttttgggttttggcca 29
|||||||||||||||||||
Sbjct: 8656761 aggttttgggttttggcca 8656779
Score = 36.2 bits (18), Expect = 0.61
Identities = 21/22 (95%)
Strand = Plus / Plus
Query: 4 aaatgaaaggttttgggttttg 25
|||||||| |||||||||||||
Sbjct: 54303149 aaatgaaatgttttgggttttg 54303170
Score = 34.2 bits (17), Expect = 2.4
Identities = 20/21 (95%)
Strand = Plus / Plus
Query: 10 aaggttttgggttttggccag 30
||||||||||| |||||||||
Sbjct: 17139118 aaggttttgggctttggccag 17139138
Score = 34.2 bits (17), Expect = 2.4
Identities = 20/21 (95%)
Strand = Plus / Plus
[..cut..]
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Sequences: 29
Number of Hits to DB: 1,479,010
Number of extensions: 84036
Number of successful extensions: 146
Number of sequences better than 10.0: 25
Number of HSP's gapped: 146
Number of HSP's successfully gapped: 146
Length of query: 31
Length of database: 3,652,008,333
Length adjustment: 18
Effective length of query: 13
Effective length of database: 3,652,007,811
Effective search space: 47476101543
Effective search space used: 47476101543
X1: 11 (21.8 bits)
X2: 15 (29.7 bits)
X3: 25 (49.6 bits)
S1: 16 (32.2 bits)
S2: 16 (32.2 bits)
[edit] References
http://www.cse.nd.edu/~ccl/software/parrot/ http://www.cse.nd.edu/~ccl/software/manuals/parrot.html
[edit] Book Chapter & Journal Articles
http://www.cse.nd.edu/~dthain/papers/
