Thanks to the pluggable rule engine framework in iRODS 4.2, rules can now be written in languages other than the native iRODS rule language. In particular, iRODS 4.2 ships with a Python rule engine plugin, enabling rules to be written in Python.
There are some notable differences between rules in the native rule
engine and in the Python rule engine. It is important to note that the
native iRODS rule language is a domain-specific language, which has
benefits and detriments. One benefit is that syntactic sugar is provided
for many common tasks: for example, consider Line 15 in the native rule
engine example below. foreach(*GenQOut)
iterates over the rows of the
returned SQL result, while doing the same thing in Python requires
manual iteration over the “serialized” (that is, turned into a Python
dictionary) genQueryOut object, as in Lines 20-21 in the Python rule
engine example below. Another “benefit” is that the native iRODS rule
engine is very safe, in that it does not have the ability to modify the
filesystem other than through iRODS. With Python, you have the ability
to run arbitrary Python code as an iRODS rule, which could potentially
cause problems by modifying the filesystem directly. With great power
comes great responsibility.
NOTE: The delay(){}
and remote(){}
microservice calls do not work in
the Python rule engine. If you need to use delayed or remote execution,
you will need to use delayExec()
or remoteExec()
, respectively.
ALSO NOTE: The remoteExec()
microservice requires you to pass in code
written for the default rule engine ON THE REMOTE SERVER. This may mean,
for example, that even though your rule is written in Python, the code
you pass into remoteExec()
must be written in the iRODS native rule
language, or vice versa.
myTestRule {
#Input parameters are:
# Structure holding the query
#Output parameter is:
# Structure holding the query result
#Output from running the example is:
# List of the number of files and size of files in collection /tempZone/home/rods/large-coll
*ContInxOld = 1;
*Count = 0;
*Size = 0;
msiMakeGenQuery("DATA_ID, DATA_SIZE",*Condition,*GenQInp);
msiExecGenQuery(*GenQInp, *GenQOut);
msiGetContInxFromGenQueryOut(*GenQOut,*ContInxNew);
while(*ContInxOld > 0) {
foreach(*GenQOut) {
msiGetValByKey(*GenQOut, "DATA_SIZE", *Fsize);
*Size = *Size + double(*Fsize);
*Count = *Count + 1;
}
*ContInxOld = *ContInxNew;
if(*ContInxOld > 0) {msiGetMoreRows(*GenQInp,*GenQOut,*ContInxNew);}
}
writeLine("stdout","Number of files in *Coll is *Count and total size is *Size");
}
INPUT *Coll = "/tempZone/home/rods/large-coll", *Condition="COLL_NAME like *Coll”
OUTPUT ruleExecOut
def testRule(rule_args, callback):
condition = rule_args['*Condition'][1:-1]
continue_index_old = 1
size = 0
count = 0
inDict = {}
inDict[PYTHON_MSPARAM_TYPE] = PYTHON_GENQUERYINP_MS_T
retVal = callback.msiMakeGenQuery('DATA_NAME, DATA_SIZE', condition, inDict)
inDict = retVal[PYTHON_RE_RET_OUTPUT][2]
inDict[PYTHON_MSPARAM_TYPE] = PYTHON_GENQUERYINP_MS_T
outDict = {}
outDict[PYTHON_MSPARAM_TYPE] = PYTHON_GENQUERYOUT_MS_T
retVal = callback.msiExecGenQuery(inDict, outDict)
outDict = retVal[PYTHON_RE_RET_OUTPUT][1]
dummy = {}
dummy[PYTHON_MSPARAM_TYPE] = PYTHON_INT_MS_T
while continue_index_old > 0:
for row in range(0, int(outDict['rowCnt'])):
keyStr = 'value_' + str(row) + '_1'
size = size + int(outDict[keyStr])
count = count + 1
continue_index_old = int(outDict['continueInx'])
if continue_index_old > 0:
outDict[PYTHON_MSPARAM_TYPE] = PYTHON_GENQUERYOUT_MS_T
retVal = callback.msiGetMoreRows(inDict, outDict, dummy)
outDict = retVal[PYTHON_RE_RET_OUTPUT][1]
callback.writeLine('stdout', 'Number of files in ' + coll + 'is ' + str(count) + 'and total size is ' + str(size))
INPUT *Condition="COLL_NAME like '/tempZone/home/rods/large_coll'"
OUTPUT ruleExecOut
In both rule engines, the preceding rules:
You can’t reference INPUT variables within other INPUT variables
For example, in the native rule language, the *Condition variable is
defined as “COLL_NAME like *Coll”. In the Python rule engine, the
*Coll variable is omitted and *Condition is defined as “COLL_NAME
like ‘/tempZone/home/rods/large-coll’”. This is because the Python rule
engine receives the variables as strings, and doesn’t know how to look
inside the strings to do the substitution.
The Python rule function takes two arguments, rule_args and callback
The rule_args dictionary contains the INPUT and OUTPUT variable lists.
The callback object enables the calling of other iRODS rules and
microservices from within the Python rule engine.
The INPUT and OUTPUT variables are accessible from the rule_args dictionary object
Variables arrive wrapped in double quotes
This is why Line 2 is condition = rule_args['*Condition’][1:-1]
instead of just condition = rule_args[‘*Condition’]
. The [1:-1]
removes the first and last characters from the string, removing the
double quotes.
The Python language does not allow variables passed by reference into rules and/or microservices
Even so, you must provide a dummy variable of the “appropriate type”
when calling a rule or microservice
For example, see Lines 6-8 of the Python rule language implementation.
Normally, msiMakeGenQuery takes a genQueryInp passed by reference as the
third argument, which is how it returns the populated genQueryInp to the
caller. Since Python doesn’t have a native genQueryInp type, you can
pass in a dict with the PYTHON_MSPARAM_TYPE element set to
PYTHON_GENQUERY_INP_MS_T. The Python rule engine plugin looks for an
object with this key in the dictionary and creates a genQueryInp object.
“Return by reference” variables are available in the returned value
Rule/microservice calls in the Python rule language return a dictionary
containing the iRODS error status (PYTHON_RE_RET_STATUS), the iRODS
error code (PYTHON_RE_RET_CODE), and the rule/microservice parameters
(a list, PYTHON_RE_RET_OUTPUT). For example, to get the returned
genQueryInp object from the call to msiMakeGenQuery, on Line 9, we
retrieve retVal[PYTHON_RE_RET_OUTPUT][2], since the inDict was the
third parameter of that microservice call.