iRODS User Group Meeting 2009


CC-IN2P3
Lyon, France
February 2 - February 5, 2009

Presentations and Minutes

Monday, February 2

8:45 AM Introduction [slides]
Jean-Yves Nief, CC-IN2P3
9:00 AM iRODS Status [slides]
Reagan Moore, RENCI
9:45 AM Break
10:15 AM CC-IN2P3 Director Speech
Dominique Boutigny, CC-IN2P3
10:30 AM ARCS Data Fabric [slides]
Pauline Mak, ARCS

The Australian Research Collaboration Service (ARCS) has implemented the Data Fabric using SRB. We are able to use Shibboleth authentication with the use of Short Lived Certificate Service (SLCS) to create GSI credentials. The SRB server is also modified such that we have automatic account creation for users from trusted Identity Providers (IdP). We have also built usage scripts that will keep a history if the federation’s usage.

With the release of iRODS 2.0, the data fabric will now be migrated to iRODS. Initial evaluation of the migration process is now complete. We have been working with iRODS developers to enable automatic account creation in iRODS. As part of the migration process, we are moving our usage scripts written to iRODS. I will discuss our experience with the migration process.

ARCS is also developing software for accessing the data fabric – Hermes and Davis. Hermes is a Java application designed to work with commons virtual file systems (commons-vfs), for which, new providers are added for SRB and iRODS. Davis is a Java servlet that uses the open standard WebDAV protocol. Client software is already part of many operating systems and this simplifies access to SRB and iRODS.

11:00 AM ASPiS: integrating iRODS with Shibboleth and provenance engines [slides]
Eric Liao

Natively, iRODS addresses authentication by means of locally managed user lists or X509 certificates. The ASPiS project is integrating iRODS with Shibboleth, to enable authentication to be devolved onto the user’s home institution, and to suppport attribute/role-based authorisation decisions within the iRODS context. The project is also integrating iRODS with external provenance engines (PASOA and KARMA), to captute provenance data for scientific processes. Both integration approaches will make use of the rule system as far as possible.

11:30 AM Using iRODS with the EnginFrame grid portal into the GRIDA3 project [slides]
Francesco Locunto, Marco Piras

GRIDA3 (Shared Resources Manager for Environmental Data Analysis and Applications) is an interdisciplinary project funded by the Italian Research Ministry.

Primarily designed for providing a holistic description of environmental problems, the project will result in an advanced problem-solving tool for the integration, through a computing portal, of human know-how, simulation software, instrumentation and resources for data communication, storage, visualization and computation.

A central part of the project is the creation of a Data Grid spanning multiple sites across federated domains, public and private: for the creation of this Data Grid we chose to use the iRODS technology.

This talk will describe how we integrated iRODS with the EnginFrame Grid Portal, in order to provide applications with the ability to integrate iRODS easily and to expose to the users the access to the Data Grid resources in a intuitive and user-friendly way.

In particular, we will demonstrate one of the GRIDA3 applications, showing how it can seamlessly use resources from the iRDOS data grid both as input and output, and how the user can access these resources from his web browser, upload new resources and so on.

12:00 PM SRB usage in BioEmergences [slides]
Dominique de Waleffe

BioEmergences is an EC funded project where biologists, mathematicians, engineers and computer scientists define, implement and systematize the production of symbolic and precise reconstruction of cells evolution starting from in vivo captures of microscope images of embryos. The computational means for the project are provided by the different partners own computing facilities for initial runs of the algorithms. Afterwards the programs are moved to run under control of a workflow management application for systematic application to multiple datasets.

The talk will briefly present the project, the workflow application. Then we will explain the differnt uses of SRB in the context of this project. Finally, we will conclude with a discussion on the potential migration to IRODS.

12:30 PM Lunch
2:00 PM Using Data Grids for Long Term Digital Preservation [slides]
Adil Hasan

In this talk we describe an FP7 integrated project focussed on long-term digital preservation called SHAMAN.

We focus on the issues concerned with using data grids in long-term preservation.

2:30 PM The Storage Abstraction Service of the SPAR project [slides]
Thomas Ledoux

The National Library of France (BnF) is building its distributed archiving and preservation system (SPAR) in order to preserve in the long term all the digital information collected or created.

This system is based on the OAIS standard but, in order to be independant of the underlying hardware infrastructure, a Storage Abstraction Service (SAS) is used. The SAS exposes its capabilities by the way of storage units, that represent some hardware designed to satisfy a given class of service, as well as records which abstracts the possible copies.

In order to implement such Service, the choice of iRods has been made. In particular, a storage unit is seen as a particular resource associated with a set of irules to comply with the said class of service.

In the presentation, we will show how such elements are defined and how the multiple operations needed for long-term preservation at the storage level can be achieved through the use of iRods.

3:00 PM The Adonis research data preservation project for digital humanities in France [slides]
Thomas Kachelhoffer

The aim of this project is to give to the french researchers on Humanities Sciences, a distributed data working space. On top of this, the project will be able to provide long term data preservation, to handle digital objects concepts, to provide data treatment facilities and some basic workflow mechanism. This activity is also connected to some European project as DARIAH. It will be currently based on two major software: iRods for the data manipulation at the file level and fedora-commons (version 3) at the digital object level. Two major french computing centers will be involved: the CINES for long term preservation and the CC-IN2P3 for data access. At this time, around twelve numerical resources centers, distributed in France, are identified to provide numerical data and high level data management for the overall Digital Humanities community.

3:30 PM iRODS as future data grid backend for TextGrid? [slides]
Wolfgang Pempe

TextGrid, which is part of the D-Grid initiative (http://www.d-grid.de), is the first project in the humanities in Germany creating a community grid for the collaborative editing, annotation, analysis, and publication of specialist text resources. The architecture of TextGrid enhances a Globus-based grid infrastructure with a specific middleware layer and an open, WebService-based service layer of specialised functionalities for textual processing. The current TextGrid data grid infrastructure is implemented by using the applicable components of Globus Toolkit 4.

With regard to the coming next phase of the TextGrid project we consider the redesign of the existing storage infrastructure taking advantage of both Fedora, its Digital Object Model and the flexible, rule-based concept of iRODS.

4:00 PM Break
4:30 PM Collaborative data life-cycle management for petascale astronomy projects [slides]
Arun Jagatheesan

In this brief talk, we look at the needs for Collaborative Data Life-cycle Management (CDLM), the advantages of CDLM and the concepts in iRODS that enable a large-scale CDLM infrastructure. We will highlight two astronomy projects as our usecase, namely LSST and ALMA. Both of these projects will have to manage several petabytes of data for several years from multiple independent agencies (or countries). iRODS was used by LSST in the Supercomputing 2008 HPC Storage Challenge. The objective of this talk is to introduce this problem and form a community of users who want to engineer solutions for similar large-scale data management problems.

5:00 PM Enabling a robust VOSpace based on iRODS [slides]
André Schaaff

VOSpace is the International Virtual Observatory Alliance interface to distributed storage. It is the visible side of the storage system. To make a VOSpace usable in the real life we need an efficient storage mechanism. After a few experiences we have focused on iRODS which is a new data grid software system developed by the SDSC Storage Resource Broker team and collaborators. Our first aim was to create a storage area for Aladin but also for the new CDS Portal which is under development. In a first step we have developed an Aladin plugin giving an access to the iRODS implementation (through the Jargon Java API) and in a second step the VOSpace interface has been added over iRODS. We have developed a VOSpace Explorer in Java to access and manage the files. It is possible to do the common actions on the files. If a VO Tool supports drag and drop it is also possible to interact through this way with the explorer.

iRODS is easy to implement and provides a good solution to ensure the robustness of a VOSpace. The installation is simple and can be done without much manpower. It is possible to start with a small configuration and to follow the evolution of the needs.

A PLASTIC compliant tool like VOSpace Explorer is useful to provide a simple access to the stored files for VO Applications.

As the main conclusion of this work we think that iRODS is a very good solution for the implementation of a robust VOSpace. And for many reasons (Open source, easy to use, flexible (definition of micro-services), follows the evolution of the architecture, etc.).

5:30 PM SRB service at STFC and the road to iRODS [slides]
Roger Downing, Kevin O'Neill

STFC (Science and Technology Facilities Council, UK) is a long-time user of SRB in varied projects for internal and external customers alike. In this talk we describe how SRB has been used to meet the requirements of our main customers, and how the challenges associated with running a production service in this environment have been addressed.

We also outline our plans for evaluation and migration to iRODS, noting which features we need to be present in order to migrate existing services without provoking a redesign of the infrastructure beneath.

Tuesday, February 3

9:00 AM Tutorial: Introduction [tutorial slides] [reference manual]
Reagan Moore, RENCI
10:30 AM Break
11:00 AM Hands-On Demo: iRODS Installation [slides]
Reagan Moore, RENCI
12:30 PM Lunch
2:30 PM Tutorial: Introduction to Rules [slides]
Michael Wan, DICE
4:00 PM Break
4:30 PM Tutorial: Assessment Criteria [table]
Reagan Moore, RENCI

Wednesday, February 4

Tutorial
Tutorial on iRODS on different topics depending upon participant needs. These topics can include: iRODS installation, logical name spaces, iCAT metadata catalog, RDA interface to remote databases, data transfer (parallel I/O versus RBUDP), iRODS clients (Unix, web browser, Jargon), fuse client interface and performance, storage system drivers, structured information resource interface (tar files, mounted collection), rules (default rules), microservices (default set), writing a micro-service from start to finish, HDF5 microservices, web access microservices, XML/XSLT microservices, SRB to iRODS migration support, authentication (challenge-response versus GSI versus Shibboleth), planned development.

9:00 AM Tutorial: Advanced features, data transfer modes, structured file implementation, mounted collections, Fuse interface, etc. [slides]
Michael Wan, DICE
10:30 AM Break
11:00 AM Tutorial: Using Remote Database Access (RDA) Interface
12:30 PM Lunch
2:30 PM Tutorial: Writing and debugging a Microservice [slides]
Michael Wan, DICE
4:00 PM Break
4:30 PM Tutorial: Writing and debugging a Microservice [example code]
Michael Wan, DICE

Thursday, February 5

9:00 AM Parallel Session: Astrophysics [minutes]
9:00 AM Parallel Session: Preservation Environments [minutes]
10:30 AM Break
11:00 AM Parallel Sessions: Preservation Environments and Medical Records
12:30 PM Lunch

Common session on GUIs, APIs, security, etc.
In this session, we will talk about topics of common interests where developments have been already made or are foreseen. These topics include GUIs (Hermes, Vbrowser, JUX, Windows explorer, web browsers), APIs (Java, Perl, Python, PHP), security (Shibboleth integration, GSI), management policies, specialized microservices (web services, XML, image processing), migration from SRB to iRODS.

2:30 PM Davies and Hermes [minutes]
Pauline Mak
2:45 PM VBrowser [slides]
Tristan Glatard
3:00 PM JUX [slides]
Pascal Calvat
3:15 PM Discussion on APIs, SRB to iRODS migration, etc.