iRODS User Group Meeting 2023


Chapel Hill, NC
Hosted by iRODS Consortium
June 13 - June 16, 2023

Original Agenda (pdf)

Group Photo

Videos

Conference Videos hosted on YouTube

Articles

iRODS UGM 2023 Proceedings (PDF)

  • iRODS HTTP API
    Kory Draughn, Terrell Russell - iRODS Consortium
  • Towards rich and standardized metadata in iRODS
    Mariana Montes and Paul Borgermans - KU Leuven
  • iRODS S3 API: Presenting iRODS as S3
    Terrell Russell, Violet White - iRODS Consortium
  • GenQuery2: A more standardized, powerful parser for the iRODS namespace
    Kory Draughn, Terrell Russell - iRODS Consortium

Presentations (June 14)


iRODS UGM 2023 Keynote - Data Science at RENCI [slides] [video]
Ashok Krishnamurthy - Renaissance Computing Institute (RENCI)

iRODS Consortium Update [slides] [video]
Terrell Russell - iRODS Consortium

iRODS Technology Update [slides] [video]
Kory Draughn - iRODS Consortium

iBridges: A comprehensive way of interfacing with iRODS [slides] [video]
Christine Staiger - Utrecht University
Tim van Daalen - Wageningen University
John Mc Farland - University of Groningen
iRODS is a rich middleware providing means to facilitate data management for research. It implements all necessary concepts like resources, metadata, permissions, and rules. However, in research most of the concepts are still new. Hence, researchers and their support staff are challenged using the current interfaces and tools to 1) learn about those concepts and 2) familiarise themselves with the different APIs and command line interfaces. This creates the need for a steep learning curve for researchers and research supporters, slowing down the adoption of iRODS. To ease the usage of iRODS we present iBridges.

iBridges is a standalone desktop application, written in Python, to provide users of Windows, Linux, and MacOS with a graphical user interface (GUI) to interact with iRODS servers. The tool is agnostic to any rules/policies in the server. Out-of-the-box iBridges supports three main functions: browsing and manipulating data objects, upload/download data, and searching data collections.

Research data management still is an evolving topic for which new tools are constantly developed. To allow for easy creation of workflows employing other services and to integrate them with data managed in iRODS, we kept the code as modular and simple as possible. This also allows to add new features as the understanding and development in research data management progresses.

We demonstrate how such an integration works for an electronic lab notebook and an audio transcription tool.

10 years at CyVerse: Some iRODS Administration Practices [slides] [video]
Tony Edgin - CyVerse / University of Arizona
iRODS is extremely flexible in its configuration. Furthermore, it is not an opinionated system. This makes it powerful, but it also makes it difficult to manage. During this talk I will present a few practices and lessons I've learned over the last ten years maintaining CyVerse's iRODS grid. These practices will include how to decommission a storage server with no or minimal downtime, how to asynchronously replicate data to an off-site resource server, and how to transfer large sets of small files more quickly. Afterwards, I will offer to organize an interest group that would meet periodically to discuss best practices for iRODS administration.

ManGO: A web portal and framework built on top of iRODS for active research data management [slides] [video]
Paul Borgermans, Mariana Montes, and Ingrid Barcena Roig - KU Leuven
At the University of Leuven. Belgium, we are building the infrastructure and software layers to leverage iRODS as a major building block in active research data management. This involves various workflows and processing of data and metadata during the lifetime of a research project. One of the important components consists of a modular and adaptable web portal built using the iRODS Python client. Given the wide range of use cases, the web framework employs some classical architectural patterns to decouple specialised domain specific needs from the core system. It also has features that make it behave like a content management system, including a (view) template override system that make the representation of collections and data objects dependent on for example specific metadata or collection structure. Metadata is a prime focus to steer many aspects of this framework along its core use for research data, and a considerable effort was also put in a user friendly metadata schema management system. In this talk, we will present the current status as well as near future plans.

iRODS Object Store on Galaxy Server: Application of iRODS to a Real Time, Multi-user System [slides] [video]
Kaivan Kamali, Nate Coraor, John Chilton, Marius van den Beek, and Anton Nekrutenko - Penn State University
Galaxy (https://galaxyproject.org) is an open-source platform for data analysis that enables users to 1) Use tools from various domains through its graphical web interface, 2) Run code in interactive environments such as Jupyter or RStudio, 3) Manage data by sharing and publishing results, workflows, and visualizations, and 4) Ensure reproducibility by capturing the necessary information to repeat data analyses.

To store data Galaxy utilizes ObjectStore as its data virtualization layer. It abstracts Galaxy's domain logic for data persistence technology. Currently, Galaxy mainly uses a Disk ObjectStore for data persistence. To extend Galaxy's data persistence capabilities, we had previously extended Galaxy's ObjectStore to support iRODS. In this work, we discuss the steps in deploying iRODS Object Store on the USA-based Galaxy server (usegalaxy.org) and the challenges we faced. To the best of our knowledge, after CyVerse (https://cyverse.org/about), this is one of the few application of iRODS to a real time, multi-user system.

iRODS HTTP API [slides] [paper] [video]
Kory Draughn and Terrell Russell - iRODS Consortium
The iRODS Protocol has remained relatively static for more than 20 years. This is a testament to its original planning, but also means any redesign would carry a heavy upgrade and migration cost. Additionally, the protocol is novel to most developers and differs in implementation across client programming languages which hurts both approachability and adoption. This talk covers the design, implementation, and early performance results of a new HTTP API for interacting with iRODS.

An update on Yoda: Using iRODS to manage data throughout your research [slides] [video]
Lazlo Westerhof - Utrecht University
The landscape of research data management can present challenges to researchers seeking to manage, share, and publish their work. Since 2014, Utrecht University has been addressing these challenges through the development of Yoda, a research data management system designed to facilitate researchers to securely deposit, describe, share, publish and preserve large amounts of research data in compliance with the FAIR principles during all phases of a research project.

Yoda was previously presented at the iRODS user group meeting in 2018. Since then, it has undergone significant development regarding workflows, metadata editing, asynchronous processes, plugins, and APIs. Yoda has been continuously improved and has been deployed in our institutes for several years. It is currently used by more than 10,000 researchers and students and manages over 3 petabytes of data. Additionally, Yoda is publicly available as open-source software with a permissive license.

This session will explore the evolution and progression of Yoda and its continued integration with the iRODS platform over the past five years. We will discuss Yoda design principles, new features and how they are implemented in iRODS.

The iRODS CLI we deserve [slides] [video]
Derek Dong, Kory Draughn, and Terrell Russell - iRODS Consortium
The current iRODS iCommands are a culmination of many years of effort, but they are beginning to show their age, especially in terms of design and extensibility. We aim to create a brand new CLI that focuses on using modern libraries (iRODS or otherwise), modern C++, being extensible and modular, and provide a single binary, rather than ~50. This talk will cover the current plans and progress towards this effort.

GoCommands: A cross-platform command-line client for iRODS [slides] [video]
Illyoung Choi, Edwin Skidmore, and Nirav Merchant - CyVerse / University of Arizona
The diversity of scientific computing platforms has increased significantly, ranging from small devices like Raspberry Pi to large computing clusters. However, accessing iRODS data on these varied platforms remains a common but challenging requirement. The official command-line tool for iRODS, iCommands, is limited to a few platforms like CentOS7 and Ubuntu 18/20. As a result, users on other platforms like MacOS, Windows, and Raspberry Pi OS have no straightforward performant means of accessing iRODS.

GoCommands is another command-line tool for iRODS designed to address the portability issue of iCommands. Written in Go programming language, building its executable for diverse platforms is straightforward. The tool is a single executable that does not require any dependency installation. Pre-built binaries for MacOS, Linux (any distros), and Windows, regardless of their CPU architectures, are already available. In addition, the tool does not require elevated privileges for installation and run. This makes it possible for users on nearly any platform to access iRODS.

One of the noteworthy new commands introduced in GoCommands is 'bput', which enables efficient uploading of many small files. GoCommands also includes a reimplementation of 'put', 'get', and 'sync' commands in iCommands. By default, GoCommands transfers data in parallel, which greatly improves the performance of accessing iRODS from various platforms. GoCommands showed 127MBps for upload and 134MBps for download in CyVerse Discovery Environment when accessing CyVerse Data Store.

Additionally, GoCommands is capable of working with iCommands' configuration file. We will be providing a demo on how to install, configure, and use GoCommands.

GoCommands is currently deployed to several research projects for managing data. In this talk, we will be presenting how MagAO-X astronomy project and Open Forest Observatory project manage data using GoCommands.

We expect the new tool, GoCommands, will allow researchers utilizing diverse computational platforms to readily use it for managing their data.

Presentations (June 15)


Authentication in iRODS 4.3: Investigating OAuth2 and OpenID Connect (OIDC) [slides] [video]
Martin Flores - iRODS Consortium
This talk will provide an overview and demonstration of exploratory work with OAuth 2.0, OpenID Connect, and the new iRODS HTTP API. A successful proof of concept will show the community how iRODS integrations with other authentication services may be best handled in the future. Feedback and insights are welcome.

RSpace + iRODS: Update: Plans and Opportunities [slides] [video]
Rory Macneil - Research Space
Terrell Russell - iRODS Consortium
This presentation will include an overview and brief history of the RSpace + iRODS integration, a description of the next phase of development to expose metadata in RSpace to iRODS, and an outline of the vision of RSpace + iRODS as a unifying element in Research Commons and other research infrastructures. The last part will use Digital Research Alliance of Canada's Research Commons and EOSC's EUDAT Collaborative Data Infrastructure as examples.

Towards rich and standardized metadata in iRODS [slides] [paper] [video]
Mariana Montes and Paul Borgermans - KU Leuven
Metadata is a crucial feature to manage and find data in iRODS, especially if used in a systematic way. However, human manipulation of metadata is prone to errors, from typos to inconsistency in case, spelling and format. In order to tackle this issue, we have developed a "metadata schema" management tool in which an iRODS user can design a form meant for systematic application of a specific metadata schema. This form consists of a collection of fields of different types: from different scalar input fields through multiple-choice fields to composite fields. When a user adds metadata using this schema, they get a form that includes validation, which can relate to the format or the possible values. The iRODS attribute name of the metadata inserted via a schema follows a pattern with namespacing, including the identifier of the schema as a prefix. In addition, it is possible to generate a hierarchical structure with composite fields (nested schemas), such as the name and contact information of a person. In this case, the components are namespaced with a combination of the name of the schema and the name of the composite field.

The metadata schema itself is stored in JSON format, which can also be used to import and export its contents. Moreover, a lifecycle was designed so that only stable schemas can be used for metadata annotation but at the same time the schema can evolve into new versions. Concretely, it is possible to have multiple versions of a schema, among which: a draft that can be edited, a "published" version to be used in annotation, and all "archived" versions, which cannot be used anymore.

Via this tool, we expect users to be able to design metadata schemas of varying complexity for their whole team to apply systematically. This will increase the uniformity and thus usability of metadata and can be used to enforce the inclusion of metadata to certain collections or data objects.

Integrating iRODS with Project Eureka and Open OnDemand [slides] [video]
Boyd Wilson and David Reynolds - Omnibond
This talk will discuss how we are integrating iRODS into our next evolution of CloudyCluster called Project Eureka. Part of Eureka is a project-based interface in Open OnDemand which will include a storage management UI built to work directly with iRODS. We will show a demonstration of the work in progress and request feedback from the community.

rirods: An R client for iRODS [slides] [video]
Martin Schobben - Vienna University of Technology
Mariana Montes - KU Leuven
Christine Staiger - Utrecht University
Terrell Russell - iRODS Consortium
In this talk we present a new client for iRODS: the R package rirods. In contrast to its predecessor (Chytracek et al. 2015), this package is pure R (rather than C++) and transfers data over HTTP, communicating with the iRODS REST API.

R is a very popular language in data science, and we expect that many R users who are not familiar with Python, or the command line, will benefit from interacting with iRODS through this R package. We will showcase the main functionalities of the package and what we have planned for the future. Crucially, we offer the equivalents of iCommands 'iput', 'iget', 'ils', 'imeta' and a few other functions, but also rirods-specific functions that allow the user to directly stream between memory and iRODS without staging files locally.

iRODS S3 API: Presenting iRODS as S3 [slides] [paper] [video]
Terrell Russell and Violet White - iRODS Consortium
S3 has taken over the storage world for a number of good reasons. Many software libraries, tools, and applications now read and write the S3 protocol directly. This talk describes a new iRODS client API that presents the iRODS namespace as S3. It will discuss the requirements, the design, the initial implementation, and future work.

Using iRODS Rules to Automate Trash Management Policy [slides] [video]
Urvika Gola - CyVerse / University of Arizona
Best Student Technology Award Winner
Effective trash removal policies are essential for data storage and management. This presentation will share a solution that harnesses microservices and dynamic policy enforcement points (PEPs) in iRODS for efficient, automated trash management for both data objects and collections implemented as rule logic. Data can be put into trash through a variety of methods, and dynamic Policy Enforcement Points (PEPs) offer the flexibility to make informed policy decisions for each approach. By constructing four distinct policy enforcement points - pre, post, except, and finally varieties, we ensure that our trash management system adeptly handles various data movement techniques, ensuring optimal efficiency. We will provide an in-depth exploration of the dynamic PEPs and microservices used in our solution, elaborating on their implementation, advantages, and challenges we have overcome. At the end of this presentation, our goal is to provide attendees with insights into the power of iRODS microservices and dynamic PEPs, enabling them to leverage this knowledge for streamlining their own data management needs.

Updates on iRODS Data Repository Service Adapter [slides] [video]
Mike Conway and Deep Patel - NIEHS / NIH
The GA4GH Data Repository Service (DRS) standard is part of a family of standards for distributed, federated data analysis. Using standard workflow languages such as WDL, CWL, and Nextflow, these standards allow workflows to dispatch containerized tasks to run at appropriate locations, including across cloud providers and on-prem compute environments. The DRS standard provides an abstraction over distributed data sources, allowing these workflow tasks to authorize data access and access underlying data sets.

A DRS implementation over iRODS allows the iRODS data grid to expose data to this federated analysis ecosystem. The Federated Analysis System Project (FASP) components represent a formalization of the iRODS 'compute to data' pattern for the important Genomics and Health community.

GenQuery2: A more standardized, powerful parser for the iRODS namespace [slides] [paper] [video]
Kory Draughn and Terrell Russell - iRODS Consortium
The iRODS GenQuery interface has long defined the way users and administrators can search the iRODS namespace, its storage systems, users, and metadata, while honoring the iRODS permission model. The next generation of GenQuery, GenQuery2, is now available for experimentation. However, there is still a lot of work to do. This talk will cover its expanded syntax and capabilities and what is to come.

Lightning Talk - What to tell about RDM to whom? [slides] [video]
Ander Astudillo - SURF

Lightning Talk - Yoda and RSpace deployment at the Leibniz Institute on Aging [slides] [video]
Rory Macneil - Research Space
Lazlo Westerhof - Utrecht University

Lightning Talk - Teaching old dog new tricks: Fun with iRODS at CyVerse 2023 edition [video]
Nirav Merchant - CyVerse / University of Arizona

Lightning Talk - Beyond Data Management with Globus [slides] [video]
Vas Vasiliadis - Globus

Lightning Talk - Azure as native storage plugin (Like S3) [slides] [video]
Jan Graaf - Netherlands Cancer Institute (NKI)

Lightning Talk - What would we like from a Rust iRODS client library? [video]
Phillip Davis - iRODS Consortium

Lightning Talk - Let's bring light to dark data with iRODS: Come write a new proposal with CyVerse [video]
Nirav Merchant - CyVerse / University of Arizona

Lightning Talk - Dataverse integration dashboard: pulling data from iRODS [slides] [video]
Ingrid Barcena Roig - KU Leuven