Kenan Center, 300 Kenan Center Drive
Chapel Hill, North Carolina, USA
June 7 - June 9, 2016
iRODS 4.2 has introduced the new rule engine plugin interface. This interface offers the possibility of rule engines which support iRODS rules written in various languages. This paper introduces an audit plugin that emits a single AMQP message for every policy enforcement point within the iRODS server. We illustrate both the breadth and depth of these messages as well as some introductory analytics. This plugin may prove useful from instrumentation of a production iRODS installation to helping debug a confusing emergent distributed rule engine behavior.
iRODS makes it incredibly easy to preserve and share data generated by researchers, but as data volumes increase, the costs of maintaining all that data on primary storage becomes prohibitive. We present an advanced architecture that enables long term data retention and high availability in iRODS using a two-tiered design comprised of primary storage and a geographically distributed, object store-based, HGST Active Archive System for easy active archival. This model employs an existing high performance (expensive) primary storage system, coupled with an affordable, ultra-high capacity HGST Active Archive System back-end. Thanks to the iRODS abstraction layer and rules engine, this data tiering is automated and completely transparent to end users. We will discuss the solution architecture, provide a brief description of active archives in general and the HGST Active Archive System specifically, including its synchronous geographic replication capabilities, and present performance statistics for the available archived data.
Geospatial data is now increasingly used with tools in diverse fields such as agronomy, hydrology and sociology to gain a better understanding of scientific data. Funded by the NSF DIBBS program, the GABBS project seeks to create reusable building blocks aiding researchers in adding geospatial data processing, visualization and curation to their tools. GABBS leverages the HUBzero cyberinfrastructure platform and iRODS to build a web-based collaborative research platform with enhanced geospatial capabilities. HUBzero is unique in its availability of a rapid tool development kit that simplifies web-enabling existing tools. Its support for dataset DOI association enables citable tool results. In short, it provides a seamless path from data collection, to simulation and publication and can benefit from iRODS data management at each step. Scientific tools often require and generate metadata with their outputs. Given the structured nature of geospatial data, automatic metadata capture is vital in avoiding repetitive work. iRODS microservices enable this automation of data processing, metadata capture and indexing for searchability. They also allow for similar offline ingestion of external research data. The iRODS Fuse filesystem mounts directly onto the hub, enabling tools to refer to local file paths, simplifying development. We will discuss this work of integrating iRODS with HUBzero in the GABBS project and share our experience and lessons learned with the iRODS user community.
This talk, with interactive Q&A, is presented in anticipation of integrating iRODS with CloudyCluster to add simplified data management to CloudyCluster's easy, self-service, on-demand, public, cloud-based HPC provisioning. An overview of CloudyCluster will be provided with goals of the pending integration. We will also seek feedback from the community to help direct the integration. The end goal is to provide advanced computational and data management resources to the long tail of science and those without easy access to computational resources.
This presentation will focus on our efforts to develop a comprehensively secure cyberinfrastructure including iRODS, addressing issues from the datacenter level through to iRODS auditing, to provide a perspective on the effort required and the areas of most concern when developing secure infrastructure.
The challenges of using iRODS to support a broad community with data privacy levels ranging from HIPAA to open access will be discussed, and techniques for data segregation and auditing will be presented, to address a range of potential use cases. We will also present on the policies and rules used to support data management generally and HIPAA specifically in distributed iRODS installations.
The R language is an environment with a large and highly active user community in the field of data science. At NIHS we have developed the R-irods package which allows user-friendly access to irods data objects and metadata from the R language. Information is passed to the R functions as native R objects (e.g. data-frames) to facilitate integration with existing R code and to allow data access using standard R constructs.
To maximize performance and maintain a simple architecture, the implementation heavily relies on the icommands C++ code wrapped using Rcpp bindings.
The R-irods package has been engineered to have semantics equivalent to the icommands and can easily be used as a basis for further customization. At the NIHS we have created an ontology aware package on top of R-irods to ensure consistent metadata annotations and to facilitate query construction.
Utrecht University has developed a WebDAV compliant interface to IRODS 4.1 to facilitate drag-n-drop moving data in and out of iRODS 4 using an operating system's native interface. The presentation highlights the solution's design principles and the resulting architecture. Davrods builds on Apache's mod_dav capabilities. We will share benchmark data that we have collected and conclude with a demonstration.
In a collaboration between CyVerse, DataNet Federation Consortium, and Odum Institute developers, the CyVerse Discovery Environment has been ported as a general infrastructure to support data-driven research. This is the first step towards a broader community effort to standardize and adopt these tools, extending iRODS as a full service data management and computation environment.
The session will include a review of recent extensions to Jargon to power Virtual Collections, Metadata Templates, and other facilities that are powering DataNet Federation interfaces.
Data center and data evolution has been dramatic in the last few years with the advent of cloud computing and the massive increase of data due to the Internet of Everything. The Integrated Rule-Oriented Data System (iRODS) helps in this changing world with virtualizing data storage resources regardless the location where the data is stored.
This paper presents a tool implemented for accessing iRODS repositories through the NFS protocol. This tool integrates NFS to the iRODS server through common operating system commands on a remote iRODS repository via the NFS protocol.
Traditionally, the sharing and retention of research data has been a contentious issue. Sharing data over WANs has been limited by the available storage technologies. NAS solutions while excellent for sharing data over a LAN have never had the same success over WANs. The successful implementation of object storage solutions has opened a door into the ability to share data over WAN links.
By coupling that ability to share objects over a WAN with middleware like iRODS provides the research community with the ability to provide more stringent controls over the data including:
NGS is an increasingly cost efficient and reliable method to provide whole genomes or exomes in a relatively short time.
The massive amounts of resulting data pose challenges during various stages of its lifecycle: organizing and storing of input data, high throughput processing and analysis in an HPC Cluster and effective reviewing and secure sharing of the results.
Traditional file systems quickly meet their limits when content based metadata handling is required.
As a computing center, that has been driving NGS workflows for many years, we are constantly looking for solutions to optimize these workflows to maximize output and quality. We have decided to use IRODS, a comprehensive data management system that would allow customized metadata attributes, fine grained protection rules as well as a query system to quickly organize and review the results.
In this paper we describe our design and experiences with the integration of iRODS with an automated pipeline, which was developed within our participation in the BMBF funded project SMOOSE to optimize relevant workflows for cancer studies to clinical use. The workflow was taken from the department of Translational Genomics from the University of Cologne. The focus of the workflow lies in sequencing and analysis of cancer genomes with the goal of identifying novel and potentially clinically relevant alterations. The gained insights can lead to personalized therapy with higher efficacy and reduced toxicity.
8:00 – 9:00 AM: Registration & Breakfast
9:00 AM – 5:00 PM: Training
5:30 – 6:30 PM: Members Meet & Greet, Kenan Center Terrace
5:30 – 6:00 PM: Omnibond Birds of a Feather, Room 204
This hands-on workshop taught how to plan and deploy an iRODS 4.2 installation and explored storage resource composition, metadata operations, and rule development using graphical and command line interfaces.
In-depth experience with iRODS 4.2. The iRODS development team at RENCI guided students through advanced topics such as using multiple rule engine plugins, PAM authentication, federation, and configuration for high availability.
Prerequisites (4.1.9, installed via ftp on EC2 via slipofpaper).