iRODS User Group Meeting 2020


Virtual
Hosted by University of Arizona
June 9 - June 12, 2020

Original Agenda (pdf)

Group Photo

Videos

Conference Videos hosted on YouTube

Articles

iRODS UGM 2020 Proceedings (PDF)

  • S3:TNG – iRODS S3 Resource Plugin with Direct Streaming
    Justin James, Kory Draughn, Jason Coposky, Terrell Russell – iRODS Consortium

  • Integration of iRODS with IBM Spectrum Archive Enterprise Edition – A flexible tiered storage archiving solution
    Nils Haustein – IBM European Storage Competence Center
    Mauro Tridici – Euro-Mediterranean Center on Climate Change (CMCC)
  • iRODS Client: NFSRODS 1.0
    Kory Draughn, Terrell Russell, Alek Mieczkowski, Jason Coposky – iRODS Consortium
    Mike Conway – NIEHS / NIH
  • iRODS Client: AWS Lambda Function for S3 1.0
    Terrell Russell – iRODS Consortium

Presentations (June 9)



iRODS UGM 2020 Keynote - A Converasation With Your Data Platform [slides] [video]
Nirav Merchant - CyVerse / University of Arizona

Consortium Update [slides] [video]
Jason Coposky - iRODS Consortium

Technology Update [slides] [video]
Terrell Russell - iRODS Consortium
Kory Draughn - iRODS Consortium
Alan King - iRODS Consortium
Daniel Moore - iRODS Consortium
Jaspreet Gill - iRODS Consortium

Yoda and the iRODS Python Rule Engine Plugin [slides] [video]
Lazlo Westerhof - Utrecht University ITS/RDM
Chris Smeele - Utrecht University ITS/RDM
At the UGM 2018, we presented Yoda, a system for reliable, long-term storing and archiving large amounts of research data during all stages of a study. It facilitates researchers to describe, deposit, and publish research data in compliance with the FAIR principles.

Yoda deploys iRODS as its core component, customized with more than 10,000 lines of iRODS rules. With the release of the iRODS Python rule engine plugin, we sought to make use of the benefits it provides in areas of reusability, ease of development, and availability of existing libraries.

To accomplish this we have rewritten most of our rules and developed several generic wrappers and reusable utilities to make this easier. This is the story of our approach to developing Python rules and the challenges we faced along the way.

Using JSON Schemas as metadata templates in iRODS [slides] [video]
Venustiano Soancatl Aguilar - University of Groningen
In this talk, we discuss the potential of JSON schemas as metadata templates. One of the main advantages of JSON schemas is that they can be represented as strings. This feature is very convenient as strings can be stored in iRODS as AVUs, in an elasticsearch database or in any other external database. Additionally, JSON schemas are supported by programming languages such as python and java. This support makes it relatively straightforward to validate both, the schemas and the JSON metadata against the schemas. Assuming that JSON schema templates are stored somewhere else and can be accessed from iRODS, we have implemented irules to associate metadata templates with iRODS objects, ingest metadata validated against templates, display template AVUs and inherited AVUs via the command line interface. Finally, we discuss future plans regarding managing templates in our iRODS system.

S3:TNG - iRODS S3 Resource Plugin with Direct Streaming [slides] [paper] [video]
Justin James - iRODS Consortium
Kory Draughn - iRODS Consortium
Jason Coposky - iRODS Consortium
Terrell Russell - iRODS Consortium
The iRODS S3 storage resource plugin has become very important to the iRODS ecosystem. Many production systems are now spanning local disk, local or remote object stores, and tape. Last year's release of the cacheless S3 plugin enjoyed immediate uptake.

This year's update shares the design and engineering underway for the iRODS S3 plugin to provide direct streaming into and out of S3-compatible storage. This rewrite uses the new iRODS IOStreams library and in-memory buffering to make efficient multi-part transfers.

Parallel data migration between GPFS filesystems via the iRODS rule engine [slides] [video]
Ilari Korhonen - KTH Royal Institute of Technology
At the PDC Center for High Performance Computing at KTH, in collaboration with our colleagues at the supercomputing center of Linköping University, we operate the iRODS-based section of the national research data storage for Sweden. We have a heterogeneous, asymmetric data grid based on iRODS with several underlying storage solutions and technologies. At PDC we have the performance tier of the system running on top of GPFS filesystems. Our GPFS cluster alongside the filesystems it is hosting, are due for an upgrade - since we would like to deploy the newest generation of GPFS (5.0.x) for its space efficiency and several other enhancements. For this we prepared by initially building the cluster with two physical filesystems to accommodate the envisioned upgrade and (online) data migrations. After the cluster has been upgraded to the latest software we will upgrade (reformat) the on-disk GPFS filesystems one at a time, migrating the data online between the filesystems via iRODS. All data will remain accessible at all times and users uninterrupted, not realizing their data objects have been migrated underneath. At this moment the first step of this operation has been done, i.e. the other physical filesystem has been drained of iRODS resources and those have been migrated to its pair. This was done via the asynchronous and parallel rule execution of iRODS 4.2.x with a set of custom rules developed with the iRODS Consortium. This enabled us to gain more parallelism of the system, not only with the parallel read/write performance of GPFS with iRODS parallel streams but also the checksumming of the migrated data objects in parallel, jobs being launched from the iRODS delay execution queue. The successes and challenges of this process are to be presented.

SPONSOR MESSAGE [slides] [video]
Jon Toor - Cloudian

Policy-Encapsulated Objects [slides] [video]
Arcot Rajasekar - University of North Carolina at Chapel Hill
With increasing movement of data objects across distributed and remote storage data objects lose their policies that were applied and used where they were generated, created or administered. When a file is moved from a local storage to a remote storage, all such metadata including metadata about ownership, creation, modification audit trail and security and permission information are not transferred. Moreover, all links to the original is lost and there is no lineage captured or maintained and as the data object gets copied and moved across multiple storages and modified at various stages the information about all these actions are not captured and is lost forever. Even policy-based data management systems, such as iRODS, that instrument policy enforcement points within the data management infrastructure and apply policies as computer actionable rules but do not control once the object is copied out of its domain. We propose the concept of a policy encapsulated object (PEO) that encodes policies that govern the life-cycle of a data object as part of the data payload and serve as a gatekeeper for the data. Also, to make the system self-contained we propose inclusion of an execution infrastructure (similar to the iRODS rule engine) which will run on top of any operating system and capture all lineage and administrative policies.

By including policies to verify the trustworthiness of the execution infrastructure within the PEO, a trusted environment can be implemented. Each PEO can verify that it is in a trusted environment while controlling manipulation of the associated data set. By providing mechanisms for a data object to be aware of its environment, PEOs enable controlled operations including redaction, integrity checking, derived product generation, data caching, and access control. A PEO can characterize all provenance information needed to instantiate a derived data product, including governing policies, and the required trusted environment. There is a strong link between trusted environments and containers used for reproducible computing. We discuss various issues related to policy encapsulated objects.

Integration of iRODS with IBM Spectrum Archive Enterprise Edition - A flexible tiered storage archiving solution [slides] [paper] [video]
Nils Haustein - IBM European Storage Competence Center
Mauro Tridici - Euro-Mediterranean Center on Climate Change (CMCC)
Tape storage is suitable for storing large volumes of data over long periods of time at lower cost, but the access time to data on tape is significantly higher than to data on disk.

Tiered storage file systems with tape storage are a blessing and a curse. The blessing is that the user can see all files regardless if these are stored on disk or tape. Cursing starts when the user opens a file that is stored on tape because the recall takes one or more minutes. User is not aware that the file is on tape because standard file systems do not indicate whether the file is on disk or on tape. It gets even worse if the many users simultaneously open several files that are on tapes. This causes even longer waiting times because transparent recalls are not tape optimized.

Moreover, since it is not possible to set file system quota on tape storage, storage capacity may grow out of control.

To address these challenges, the recall requests coming from multiple users can be queued and recalled periodically in a tape optimized manner whereby the files are sorted by the tape-ID and the location on tape. The combination of iRODS with IBM Spectrum Archive Enterprise Edition can accommodate this.

We demonstrate how to prevent transparent recalls when migrated files are accessed through iRODS and instead perform tape optimized recalls with IBM Spectrum Archive. Furthermore, we demonstrate configuring quotas in iRODS that applies to the entire file system comprised of disk and tape storage tiers. And we demonstrate harvesting metadata from the file content and using this for subsequent searches.

Presentations (June 10)



SmartFarm Data Management [slides] [video]
Kieran Murphy - Agriculture Victoria
Agriculture Victoria's research group is geographically disperse, with research data from research 'SmartFarms' requiring many manual steps. Data management challenges increase with large datasets generated with new sensing technologies. This requires the development of standardised, automated, on line, authenticated and verifiable standard processes for uploading data for storage and analytics on computing facilities.

Working with iRODS, Agriculture Victoria are piloting new data management workflows of 'SmartFarm' data, and this talk will discuss lessons from small, medium and high data Agriculture SmartFarm use cases using edge computing and collaborative data infrastructure and the flow on development of capability for AVR researchers.

Data management in autonomous driving projects [slides] [video]
Marcin Stolarek - Aptiv
Radosław Rowicki - Aptiv
Kacper Abramczyk - Aptiv
Mateusz Rejkowicz - Aptiv
Aptiv deployed iRODS in production around 1.5 year ago, together with the start of the development phase of one of the big projects on autonomous driving. The tool was selected after a few POC installations. The major advantage of iRODS we recognized at the time was a number of side projects and plugins available.

In such industrial projects, it's quite common to have multiple partners working on different parts of a workflow. Tracking data status - migrating them between partners and within engineering groups responsible for data collection, manual and automatical analysis is a fairly complicated task.

From a technical perspective, our deployment is based on two DNS round-robin groups of iRODS resource servers, both groups are using Lustre filesystem as a storage backend. During testing, we reached ~90Gbps, which was our estimated data collection rate. Besides HPC filesystems resource daemons are also configured to use AWS S3 buckets connected over AWS direct connect(30Gbps).

I'll explain our configuration with a DNS round-robin trick. Share our current struggles related to automatic registration of files from Lustre filesystem, audit rule engine. Difficulties we had in early stages(adoption) and current issues.

CyVerse Discovery Environment: Extensible Data Science workbench and data-centric collaboration platform powered by iRODS [slides] [video]
Sarah Roberts - CyVerse / University of Arizona
Sriram Srinivasan - CyVerse / University of Arizona
Nirav Merchant - CyVerse / University of Arizona
Tina Lee - CyVerse / University of Arizona
The Discovery Environment, a web-based Data Science workbench that supports data management, analysis and collaboration tasks for diverse communities of users from astronomers to zoologists, is actively utilized by thousands of scientists world-wide. In this presentation we highlight how we have leveraged iRODS alongside other frameworks like Kubernetes, NodeJS, React, and Asynchronous Tasks to meet researchers' growing demands for reproducible, extensible, collaborative and scalable analysis environments. We also provide an overview of the Terrain API which provides developers with programmatic access to extend and adopt the Discovery Environment's underlying cyberinfrastructure. Finally, we touch upon our Visual and Interactive Computing Environment (VICE), our newest service that allows researchers to use Jupyter Notebooks, RStudio, Rshiny and other custom web-based, interactive data analysis and visualization tools. VICE provides secure out-of-the-box, single sign-on access to all container (Docker)-based applications and can manage CPU- and GPU-based analysis with configurable resource allocation per task.

iRODS and Federated Identity authentication: current limitations and perspective [slides] [video]
Claudio Cacciari - SURF
Stefan Wolfsheimer - SURF
Hylke Koers - SURF
Arthur Newton - SURF
Tasneem Rahaman-Khan - SURF
Matthew Saum - SURF
Gerben Venekamp - SURF
iRODS does not support natively authentication protocols for federated identity management, such as SAML or OpenID Connect (OIDC). Additional security measures, like two factor authentication (2FA), are neither supported. There are some third-party plugins or modules that support a limited sub-set of those features, but a comprehensive and flexible solution is missing. In this presentation we would like to outline use cases and explain the limits of the current implementation. Consider a web application against which a user authenticates using OIDC. The application is connected to iRODS to upload data on behalf of the user. We want the interaction between iRODS and the web application to be transparent for the user. The existing plugin (auth_plugin_openid) is not suitable because it requires an explicit authentication from the user. We could make the web application pass the OAuth2 access token to iRODS and validate it through a Pluggable Authentication Module (PAM) extension acting as an OIDC client. Since the token expires after a while, it would need to be refreshed on the iRODS side using an refresh token. The current implementation does not support this workflow, especially dealing with two tokens.

A comprehensive solution would be able to overcome those and other limitations. At the same time, it would simplify the life of the users and of the administrators. For example, when an iRODS instance supports multiple authentication protocols and the client is a single entry point shared among multiple users, like a WebDAV endpoint based on Davrods, the administrator is forced to expose a different endpoint for each authentication protocol because the protocol is defined client-side. Enabling the server to support a fall-through mechanism, would allow the client to just pass the credentials without the need to pick one of the protocols in advance.

SURF has started to develop a proof of concept that aims to achieve that solution extending the current iRODS PAM support so that it can deal with an arbitrary exchange of tokens and challenges and delegating the implementation of the specific federated identity protocols to dedicated PAM modules. In parallel the iRODS consortium promoted the design of a more general implementation through the discussion in the Authentication Working Group. The group has adopted the idea of supporting a flexible conversation between client and server, but rather than implementing it on the PAM side, it decided to extend the iRODS API to support different authentication methods through plugins.

This presentation describes the main scenarios related to the support of federated identity management in iRODS and the possible solutions.

The Past, Present and Future of iRODS at the Texas Advanced Computing Center [slides] [video]
Chris Jordan - The University of Texas at Austin
The Texas Advanced Computing Center has operated iRODS services for over 10 years, both for shared support of general purpose research data management, and as a dedicated service supporting specialized cyberinfrastructure projects. We will provide a brief history of iRODS at TACC, and give an overview of the current uses of iRODS and iRODS-based cyberinfrastructure at TACC. Projects utilizing iRODS at TACC have data collections ranging from a few terabytes to a few petabytes, and span the gamut from CT scanning through genome sequencing and archival of digital artworks; we will briefly discuss how TACC utilizes iRODS to support this wide variety of use cases, and how we plan to deploy iRODS in the future to support the continued growth of research data in both size and complexity.

iRODS_CSharp [slides] [video]
Reink Fidder - Utrecht University
Jelle Teeuwissen - Utrecht University
Best Student Technology Award Winner
We are two computer science students at the Utrecht University. We are currently in our second year of our bachelor's degree.

We are both partaking in the honours programme, which is also the reason we created this client library.

As part of our honours requirements, we worked as part of the Care2Report (C2R) research team (https://sites.google.com/view/care2report). This research is aimed at creating a program which can transcribe and summarize medical consultation, so that doctors don't have to spend a lot of time writing consultation reports and can spend more time actually consulting.

Utrecht University uses a system called YODA for cloud storage, which is a portal that uses iRODS as a backend. For our assignment, we needed to create a way to upload logs from the C2R system to YODA and since the program is written mainly in C#, we decided to create a client that could be used to establish a connection to the YODA backend and transfer files with.

Since many researchers at Utrecht University use C# for their programming, we figured this would be a problem that would be encountered more often, so we thought it was a good idea to create a solution that wasn't just a way to solve our problem, but could also be used by others. And so, we started building a general iRODS C# client library.

The finished product is a client library which performs all the basic tasks that an iRODS client library should be able to perform, such as collection operations (create/remove/rename), data object operations (create/download/upload), metadata operations and a variety of queries.

The repository can be found at https://github.com/UtrechtUniversity/irods-Csharp

As for the impact on individuals, society, science and systems & technology; all areas are affected in roughly the same way. Anybody who wishes to get access to iRODS from their C# code will no longer need to create a way to use some other client library, but can use the native C# client library. This decreases the amount of work needed and increases the performance and ease of use.

In conclusion, the client library we have created can be viewed simply as a tool to make iRODS more accessible. Even though the main motivation for creating it was our own project, we hope there will be others that can use the functionalities we have created, or perhaps even improve on it.

Using iRODS to build a research data management service in Flanders [slides] [video]
Ingrid Barcena Roig - KU Leuven
This presentation will discuss how iRODS is being used by the Flemish Supercomputing Centre (VSC) to implement a new research data management service highly coupled with the VSC High Performance Computing infrastructure. The current status of the project as well as the future plans will be presented.

The Tier-1 supercomputing infrastructure in Flanders has until 2018 mainly been targeted at users with serious calculation issues (typical HPC/HTC workloads). Although this platform in its current form is already very successful, the current focus on compute no longer meets all the needs of many researchers. More and more users have computational work that makes intensive use of large data sets. Migrating this data to and from the compute infrastructure whenever it is to be used for a calculation is very inefficient because of the scale.

Therefore, VSC decided on 2018 to start a new service focused on research data management. The new Tier-1 Data service aims to provide a service to allow users to store research data during the active phase of the research data life cycle (that is, data that is being collected and analysed) and has not yet being published. This service is restricted to data of research projects that are using the VSC Tier-1 Compute infrastructure.

This Tier-1 Data service is based on iRODS and has as primary goal to offer the users a platform to easy manage research data and help them to apply the FAIR principles to their research data from the very beginning of their projects. This should make it easier to transfer their research data at the end of the project to institutional or domain specific repositories for publication and preservation and when applicable ensure they are made publicly available (open access). This platform should also help the researchers to run their scientific workflows more efficiently by providing tools to automate data collection, data quality control and stage data from and to the Tier-1 Compute system.

The platform has recently started its pilot phase. During this phase a reduced number of research groups will be invited to build their research workflows using the new data service. The pilot projects selected are from different scientific domains (Climate Change studies, Humanities and Arts, Biological research, Life science, Plasma Astrophysics, …), have a strong collaborative nature between research groups of several Flemish universities and the usage of the new data service should facilitate the way they create, manage, share and reuse research data.

Application of iRODS to NIEHS Data Management [slides] [video]
Mike Conway - NIEHS / NIH
Deep Patel - NIEHS / NIH
This will be a survey of current NIEHS data management strategy, in two parts. First will be an overview of data management challenges at NIEHS and the context in which we are employing iRODS, including developments in data governance policy, data sharing policy, knowledge management, LIMS (Laboratory Information Management Systems), standard workflow languages and pipelines, and cloud migration.

The second part will be a review of technology developments, including collaborative development of Metadata Templates, work on web interfaces, standard pluggable search integration, indexing, and developments in the GA4GH Cloud Work Stream.

It is anticipated that several releases of various code libraries will also be announced.

iRODS Client: NFSRODS 1.0 [slides] [paper] [video]
Kory Draughn - iRODS Consortium
Terrell Russell - iRODS Consortium
An update from last year's preview, this 1.0 release now provides multi-user support for NFSv4 ACLs by handling calls from nfs4_setfacl and nfs4_getfacl. It also supports sssd for easier AD/LDAP integration and secure connections to iRODS via SSL. Coupled with the Hard Links iRODS Rule Engine Plugin, NFSRODS 1.0 can provide a direct NFSv4.1 mount point to users in enterprise environments.

Presentations (June 11)



SPONSOR MESSAGE [video]
Jay Aikat - RENCI

iRODS Rule Engine Plugin: Hard Links 4.2.8.0 [slides] [video]
Kory Draughn - iRODS Consortium
Terrell Russell - iRODS Consortium
This new C++ rule engine plugin provides an iRODS system the ability to convey hard links to its users. An iRODS system stores a hard link when replicas of two different iRODS data objects with different logical paths share a common physical path on the same host. When this occurs, metadata is added to both logical data objects for bookkeeping. This talk will explain the original use cases for hard links in iRODS and introduce Conway Diagrams to help visualize the various corner cases.

Creating an iRODS zone with Terraform [slides] [video]
Brett Hartley - Wellcome Sanger Institute
A year ago, Sanger had 2 types of zones: production and development zones.

The development zones originally were for testing and development of both server and client components. Over time, these zones became more and more necessary for client side testing. Server side testing became increasingly limited. For the most part this was fine, because we didn't really need to do server side testing of potentially breaking changes, because we hadn't upgraded in a while (most zones were 4.1.12 at the time)

The decision was made that we should upgrade both the iRODS version and the operating system version to 4.2.7 and Ubuntu 18.04. This meant upgrading iRODS on over 100 machines, with minimal disruption to the services and the >9PB they serve. Part of any good upgrade process is testing on a suitable test infrastructure. Our objective was to produce an Infrastructure as Code template to create an iRODS zone, so that zones could be created whenever needed, in our OpenStack environment, freeing up resources when they were no longer required.

The end product has been used extensively in our upgrade testing, and has proven to be a useful tool for other miscellaneous testing. Being able to stand up a new zone in minutes, rather than days has also added 2 more types of zone: testing zones, which we spin up to test specific parts of iRODS, e.g. to produce simple reproducers for otherwise hard to find bugs, and demonstration zones, which have features that we are looking to add to development and production zones in the future.

Building a national Research Data Management (RDM) infrastructure with iRODS in the Netherlands [slides] [video]
Saskia van Eeuwijk - SURF
Hylke Koers - SURF
In the Netherlands a lot of universities are looking at iRODS to support their researchers, as they recognize the powerful potential of the tool in two areas: support for secure cooperation, and support over the entire research data life cycle. Unfortunately, support teams in universities are hesitant to introduce the tool for two reasons:
  • iRODS in itself is more suitable for IT-power users
  • The support needed of iRODS within the university asks specific knowledge.
SURF, a national organization providing IT support and infrastructure for universities, stepped in and is now working closely together with six universities towards a national RDM infrastructure based on iRODS.

SURF offers a hosted environment for iRODS for all participating universities, thus creating possibilities for the researchers without the need for universities to invest upfront. Also, SURF unburdens the universities by offering a hosted, supported environment. YODA, open source software created by the University of Utrecht (UU) on top of iRODS, is being used to also attract users that have high demands in user friendliness, thanks to a web interface designed to guide the researchers in many steps of the data life cycle, from the ingestion of the data to their publication. SURF offers together with UU the support for the combined environments. The service is in pre-production state at the moment. Already, the participating universities join in the development of YODA and iRODS.

In the next two years, we hope to prove to the participating universities specifically, but also to the other universities in the Netherlands, that iRODS and YODA are useful RDM tools for a lot of researchers. Early 2022 we plan to expand the service and the cooperation. We hope by that time we can truly state that iRODS an YODA are an important part of the RDM infrastructure in the Netherlands.

In our presentation we want to focus on describing a case study for the use of iRODS, not for a specific research group, but for an entire nation to enhance the support of their researchers by working together on this iRODS based infrastructure.

iRODS at Bristol Myers Squibb: Status and Prospects. Leveraging iRODS for scientific applications in Amazon AWS Cloud [slides] [video]
Mohammad Shaikh - Bristol Myers Squibb
Oleg Moiseyenko - Bristol Myers Squibb
The iRODS practice at Bristol Myers Squibb is growing as we continue use it as the primary system of record across several different scientific projects at multiple cloud environments. This presentation shares the latest updates on how Bristol Myers Squibb is leveraging iRODS to manage and enrich various datasets in Amazon AWS Cloud. We will cover typical data flows, architectural patterns, as well as interesting approaches for how we manage AWS Lambda functions to update an iRODS Catalog with events that occur in one or more S3 buckets.

Keeping Pace with Science: the CyVerse Data Store in 2020 and the Future [slides] [video]
Tony Edgin - CyVerse / University of Arizona
Edwin Skidmore - CyVerse / University of Arizona
This talk will describe the current features of the CyVerse Data Store and plans for its evolution. Since its inception in 2010, the Data Store has leveraged the power and versatility of iRODS by continually extending the functionality of CyVerse's cyber-infrastructure. These features include project-specific storage, offsite replication, third-party service and application integrations, several data access methods, event stream publishing for indexing, and optimizations for accessing large sets of small files. Current efforts to enhance the Data Store include project-specific THREDDS Data Servers, S3 integration to allow bidirectional data flow between third-party storage and compute, and integration with CyVerse's Continuous Analysis platform, an event-driven container-native execution platform.

iRODS Logical Quotas Policy Plugin [slides] [video]
Jonathon Anderson - University of Colorado Research Computing
Kory Draughn - iRODS Consortium
Terrell Russell - iRODS Consortium
University of Colorado Research Computing uses iRODS to provision space in its PetaLibrary/archive research data storage service. This storage is implemented as top-level collections and is sold at a $/TB/year rate. In our experience on other platforms, implementing storage allocations with user and/or group quotas leads to confusion, particularly when individual users have access to multiple discrete storage allocations, as ownership metadata falls out-of-sync from the logical spacial hierarchy of the file system. To provide more logical quotas atop the iRODS collection hierarchy, the iRODS logical quotas policy plugin tracks the logical size of a collection--calculated as the total size of all data objects nested within it--as collection-level metadata that is consulted before and updated after i/o. This allows us to place a logical size limit on a collection, more closely matching our end-users expectations of how storage allocations should behave. This talk covers our deployment experience and details about the plugin implementation.

iRODS Policy Composition: Principles and Practice [slides] [video]
Jason Coposky - iRODS Consortium
Terrell Russell - iRODS Consortium
Historically a single static policy enforcement point, such as acPostProcForPut, was the sole location for all policy implementation. With the addition of a continuation code to the rule engine plugin framework, we may now configure multiple policies to be invoked for any given policy enforcement point. This subsequently allows for a separation of concerns and clean policy implementation. The policy developers now have the ability to separate the "when" (the policy enforcement points) from the "what" (the policy itself). How the policy is then invoked becomes a matter of configuration rather than implementation.

Given this new approach, multiple policies can be configured together, or composed, without the need to touch the code. For example, the Storage Tiering capability is effectively a collection of several basic policies: Replication, Verification, Retention, and the Violating Object Discovery. All of these policies are configured via metadata annotating root resources, and taken as a whole provide a flexible system for automated data movement.

iRODS Client: AWS Lambda Function for S3 1.0 [slides] [paper] [video]
Terrell Russell - iRODS Consortium
Under development for less than six months, this new AWS Lambda function updates an iRODS Catalog with events occurring in one or more S3 buckets. Files created, renamed, or deleted in S3 appear quickly in iRODS.

The following AWS configurations are supported with the 1.0 release:
  • S3 -> Lambda -> iRODS
  • S3 -> SNS -> Lambda -> iRODS
  • S3 -> SQS -> Lambda -> iRODS

Lightning Talk - iRODS / Globus Partnership Announcement [slides] [video]
Vas Vasiliadis - Globus
Jason Coposky - iRODS Consortium

Lightning Talk - Development Plan for iRODS Kubernetes Storage Driver [slides] [video]
Illyoung Choi - CyVerse / University of Arizona

Lightning Talk - A Demo of irods/irods_demo [video]
Alan King - iRODS Consortium

Lightning Talk - Upgrading iRODS from 4.1.12 to 4.2.7: Re-live the thrills and spills of an iRODS Administrator! [video]
John Constable - Wellcome Sanger Institute

Lightning Talk - Ansible Modules for iRODS using python-irodsclient [video]
John Xu - CyVerse / University of Arizona

Lightning Talk - More Transport, Please! [slides] [video]
Kory Draughn - iRODS Consortium

Lightning Talk - irods-fish [video]
Tony Edgin - CyVerse / University of Arizona

Lightning Talk - Using iRODS as an entry point to VITAM for long-term data preservation [slides] [video]
Samuel Viscapi - CINES

Lightning Talk - CyVerse Continuous Analysis: Even a cave man can do it! [video]
Calvin McLean - CyVerse / University of Arizona

Lightning Talk - The delay server rewrite: A tour of query_processor [video]
Alan King - iRODS Consortium

Closing Remarks - Call to Action [video]
Nirav Merchant - CyVerse / University of Arizona