Documentation


Overviews and Summaries


Training Materials


Blog Posts


Technical Documentation


Communication with the iRODS Community


Common Citations (iRODS 4.x)

Hao Xu, Ben Keller, Antoine de Torcy, Jason Coposky (2016) QueryArrow: Bidirectional Integration of Multiple Metadata Sources. 8th iRODS User Group Meeting, University of North Carolina at Chapel Hill. June 2016. (PDF)

Reagan W. Moore, Hao Xu, Mike Conway, Arcot Rajasekar, Jon Crabtree, Helen Tibbo (2016) Trustworthy Policies for Distributed Repositories. 133pp. (publisher)

Hao Xu, Jason Coposky, Ben Keller, Terrell Russell (2015) Pluggable Rule Engine Architecture. 7th iRODS User Group Meeting, University of North Carolina at Chapel Hill. June 2015. (PDF)

Hao Xu, Jason Coposky, Dan Bedard, Jewel H. Ward, Terrell Russell, Arcot Rajasekar, Reagan Moore, Ben Keller, Zoey Greer (2015) A Method for the Systematic Generation of Audit Logs in a Digital Preservation Environment and Its Experimental Implementation In a Production Ready System. 12th International Conference on Digital Preservation, University of North Carolina at Chapel Hill. November 2-6, 2015. (PDF) (direct link)

Terrell Russell, Jason Coposky, Harry Johnson, Ray Idaszak, Charles Schmitt (2013) iRODS Composable Resources. 5th iRODS User Group Meeting, University of North Carolina at Chapel Hill. June 2013. (PDF)

Reagan Moore, Arcot Rajasekar, Hao Xu (2015) DataNet Federation Consortium Preservation Policy Toolkit. 12th International Conference on Digital Preservation, University of North Carolina at Chapel Hill. November 2-6, 2015. (PDF) (direct link)

Arcot Rajasekar, Terrell Russell, Jason Coposky, Antoine de Torcy, Hao Xu, Michael Wan, Reagan W. Moore, Wayne Schroeder, Sheau-Yen Chen, Mike Conway, Jewel H. Ward (2015) The integrated Rule-Oriented Data System (iRODS 4.0) Microservice Workbook. 248pp. (PDF) (amazon)

Presentations

  • iRODS User Group Meeting 2016: June 2016

    Click here to see conference photos!
    The iRODS Consortium in 2016 [slides] [video]
    Jason Coposky, iRODS Consortium

    iRODS 4.2 Overview [slides] [video]
    Terrell Russell, iRODS Consortium

    Auditing with the Pluggable Rule Engine [slides] [video]
    Terrell Russell, iRODS Consortium
    Abstract
    This short talk demonstrate applications of the pluggable rule engine in auditing.

    A Geo-Distributed Active Archive Tier [slides] [video]
    Earle Philhower, III, Western Digital
    Abstract
    iRODS makes it incredibly easy to preserve and share data generated by researchers, but as data volumes increase, the costs of maintaining all that data on primary storage becomes prohibitive. We present an advanced architecture that enables long term data retention and high availability in iRODS using a two-tiered design comprised of primary storage and a geographically distributed, object store-based, HGST Active Archive System for easy active archival. This model employs an existing high performance (expensive) primary storage system, coupled with an affordable, ultra-high capacity HGST Active Archive System back-end. Thanks to the iRODS abstraction layer and rules engine, this data tiering is automated and completely transparent to end users. We will discuss the solution architecture, provide a brief description of active archives in general and the HGST Active Archive System specifically, including its synchronous geographic replication capabilities, and present performance statistics for the available archived data.

    Testing Object Storage Systems with iRODS at Bayer [slides] [video]
    Othmar Weber, Bayer Business Systems

    Advancing the Life Cycle of iRODS for Data [slides] [video]
    David Sallack, Panasas

    Having it Both Ways: Bringing Data to Computation & Computation to Data with iRODS [slides] [video]
    Nirav Merchant, University of Arizona

    Integrating HUBzero and iRODS [slides] [video]
    Rajesh Kalyanam, Purdue University
    Abstract
    Geospatial data is now increasingly used with tools in diverse fields such as agronomy, hydrology and sociology to gain a better understanding of scientific data. Funded by the NSF DIBBS program, the GABBS project seeks to create reusable building blocks aiding researchers in adding geospatial data processing, visualization and curation to their tools. GABBS leverages the HUBzero cyberinfrastructure platform and iRODS to build a web-based collaborative research platform with enhanced geospatial capabilities. HUBzero is unique in its availability of a rapid tool development kit that simplifies web-enabling existing tools. Its support for dataset DOI association enables citable tool results. In short, it provides a seamless path from data collection, to simulation and publication and can benefit from iRODS data management at each step. Scientific tools often require and generate metadata with their outputs. Given the structured nature of geospatial data, automatic metadata capture is vital in avoiding repetitive work. iRODS microservices enable this automation of data processing, metadata capture and indexing for searchability. They also allow for similar offline ingestion of external research data. The iRODS Fuse filesystem mounts directly onto the hub, enabling tools to refer to local file paths, simplifying development. We will discuss this work of integrating iRODS with HUBzero in the GABBS project and share our experience and lessons learned with the iRODS user community.

    iRODS Data Integration with CloudyCluster Cloud-Based HPC [slides]
    Boyd Wilson, Omnibond
    Abstract
    This talk, with interactive Q&A;, is presented in anticipation of integrating iRODS with CloudyCluster to add simplified data management to CloudyCluster’s easy, self-service, on-demand, public, cloud-based HPC provisioning. An overview of CloudyCluster will be provided with goals of the pending integration. We will also seek feedback from the community to help direct the integration. The end goal is to provide advanced computational and data management resources to the long tail of science and those without easy access to computational resources. This presentation will focus on our efforts to develop a comprehensively secure cyberinfrastructure including iRODS, addressing issues from the datacenter level through to iRODS auditing, to provide a perspective on the effort required and the areas of most concern when developing secure infrastructure.
    The challenges of using iRODS to support a broad community with data privacy levels ranging from HIPAA to open access will be discussed, and techniques for data segregation and auditing will be presented, to address a range of potential use cases. We will also present on the policies and rules used to support data management generally and HIPAA specifically in distributed iRODS installations.

    Getting R to talk to iRODS
    Bernhard Sonderegger, Nestlé Institute of Health Sciences
    Abstract
    The R language is an environment with a large and highly active user community in the field of data science. At NIHS we have developed the R-irods package which allows user-friendly access to irods data objects and metadata from the R language. Information is passed to the R functions as native R objects (e.g. data-frames) to facilitate integration with existing R code and to allow data access using standard R constructs.
    To maximize performance and maintain a simple architecture, the implementation heavily relies on the icommands C++ code wrapped using Rcpp bindings. The R-irods package has been engineered to have semantics equivalent to the icommands and can easily be used as a basis for further customization. At the NIHS we have created an ontology aware package on top of R-irods to ensure consistent metadata annotations and to facilitate query construction.

    Davrods, an Apache WebDAV Interface to iRODS [slides] [video]
    Chris Smeele & Ton Smeele, Utrecht University
    Abstract
    Utrecht University has developed a WebDAV compliant interface to IRODS 4.1 to facilitate drag-n-drop moving data in and out of iRODS 4 using an operating system’s native interface. The presentation highlights the solution’s design principles and the resulting architecture.
    DAVrods builds on Apache’s mod_dav capabilities. We will share benchmark data that we have collected and conclude with a demonstration.

    iRODS 4.3 [slides]
    Terrell Russell, iRODS Consortium

    Bidirectional Integration of Multiple Metadata Sources [slides] [PDF]
    Hao Xu, DICE Group
    Abstract
    We describe a generalize query language that allows us to integrate multiple types of data sources. The language provides both upquery and update, and customizable data policy. We demonstrate its use in iRODS, with applications such as graph based metadata, indexing, and metadata access. We also show that it can be proven that using both relational database and graph database provide the same behavior.

    DFC architecture & An iRODS Client for Mobile Devices [slides]
    Jonathan Crabtree, Odum Institute
    Mike Conway, DICE Group
    Matthew Krause, DICE Group

    Abstract
    In a collaboration between CyVerse, DataNet Federation Consortium, and Odum Institute developers, the CyVerse Discovery Environment has been ported as a general infrastructure to support data-driven research. This is the first step towards a broader community effort to standardize and adopt these tools, extending iRODS as a full service data management and computation environment.
    The session will include a review of recent extensions to Jargon to power Virtual Collections, Metadata Templates, and other facilities that are powering DataNet Federation interfaces.

    NFS-RODS: A Tool for Accessing iRODS Repositories via the NFS Protocol [slides] [video]
    Danilo Oliveira, UFPE
    Abstract
    Data center and data evolution has been dramatic in the last few years with the advent of cloud computing and the massive increase of data due to the Internet of Everything. The Integrated Rule-Oriented Data System (iRODS) helps in this changing world with virtualizing data storage resources regardless the location where the data is stored.
    This paper presents a tool implemented for accessing iRODS repositories through the NFS protocol. This tool integrates NFS to the iRODS server through common operating system commands on a remote iRODS repository via the NFS protocol.

    MetaLnx: An Administrative and Metadata UI for iRODS [slides] [video]
    Stephen Worth, EMC

    Academic Workflow for Research Repositories [slides]
    Randy Splinter, DDN
    Abstract
    Traditionally, the sharing and retention of research data has been a contentious issue. Sharing data over WANs has been limited by the available storage technologies. NAS solutions while excellent for sharing data over a LAN have never had the same success over WANs. The successful implementation of object storage solutions has opened a door into the ability to share data over WAN links. By coupling that ability to share objects over a WAN with middleware like iRODS provides the research community with the ability to provide more stringent controls over the data including
    Better control of ACLs including:
    Implementing data retention policies to meet regulatory regulations
    Loss of IP due to faulty loss
    Virtualization of multiple storage silos under a single namespace
    Extensive metadata tags and searching of those tags
    Extensible rules engine to implement functionality such as:
    HSM style functionality between storage devices
    Data migration based upon set criterion
    Some of the advantages of this approach include:
    Ease of administration: Once rules are tested and in place the system can be managed with a minimum of administrative overhead
    Automating workflows to guarantee consistency and reproducibility in the science that is produced
    Ease of auditing for both usage and back charging and for maintaining adequate data security compliance
    Using storage platforms like DDN WOS remote replication becomes simple and provides a straightforward way.

    Application of iRODS Metadata Manager for Cancer Genome Analysis Workflow [slides]
    Lech Nieroda, University of Cologne
    Abstract
    NGS is an increasingly cost efficient and reliable method to provide whole genomes or exomes in a relatively short time.
    The massive amounts of resulting data pose challenges during various stages of its lifecycle: organizing and storing of input data, high throughput processing and analysis in an HPC Cluster and effective reviewing and secure sharing of the results. Traditional file systems quickly meet their limits when content based metadata handling is required.
    As a computing center, that has been driving NGS workflows for many years, we are constantly looking for solutions to optimize these workflows to maximize output and quality. We have decided to use IRODS, a comprehensive data management system that would allow customized metadata attributes, fine grained protection rules as well as a query system to quickly organize and review the results.
    In this paper we describe our design and experiences with the integration of iRODS with an automated pipeline, which was developed within our participation in the BMBF funded project SMOOSE to optimize relevant workflows for cancer studies to clinical use. The workflow was taken from the department of Translational Genomics from the University of Cologne. The focus of the workflow lies in sequencing and analysis of cancer genomes with the goal of identifying novel and potentially clinically relevant alterations. The gained insights can lead to personalized therapy with higher efficacy and reduced toxicity.

    Status and Prospects of Kanki: An Open Source Cross-Platform Native iRODS Client Application [slides] [video]
    Ilari Korhonen, University of Jyvaskala
    Abstract
    The current state of development of project Kanki is discussed and additionally some prospects for future development of Kanki are laid out. Kanki is an open source cross-platform native iRODS client application which was introduced to the iRODS community at the 7th Annual iRODS Users Group Meeting in 2015. Afterwards project Kanki was released open source with a 3-clause BSD license in September 2015. Since then 9 releases have been made, from which the latest 6 have been available in addition to the source code as pre-built binary packages for x86-64 CentOS Linux 6/7 and OS X 10.10+. The Kanki build environment at the University of Jyvaskala is running out of Jenkins continuous integration for both previously mentioned platforms and the Linux builds are currently being executed in disposable containers instantiated from pre-built Docker images. This provides an excellent framework for (regression) testing of the client suite. Currently the immediate goals of development to be discussed are: stability, testing, ease of install and use, a complete iRODS basic feature set for graphical icommands alternatives. The prospects for more advanced future developments to be discussed are: a fully extensible modular metadata editor with pluggable attribute editor widgets, a fully extensible modular search user interface with pluggable condition widgets, data grid analytics and visualization with VTK integration.

    DAM Secure File System [slides] [video]
    Paul Evans, Daystrom Technology Group
    iRODS Feature Requests and Discussion [PDF]
    Reagan Moore, DICE Group

    Previous years presentations will be available soon...

Additional Presentations


Papers and White Papers