Skip to content
Paul Pound edited this page Sep 16, 2013 · 2 revisions

4.1 System structure

4.1.1 Original System Diagram

Original System Diagram

4.1.2 Complete/Proposed System Diagram

Complete System Diagram

4.1.3 Description of the proposed System:

  1. Fedora process JMS messages using the JMS listener. JMS listener pulls t2flow document from the content model.
  2. HTTP POST will be done using the t2flow and then the objects PID and possible DSID will be sent to Taverna as inputs.
  3. Taverna executes the given workflow by executing SOAP calls to the microservices. Objects or entire files can also be processed using the external services back and forth.
  4. The services use the Tuque api to communicate with Fedora.
  5. Content reader and writer services were created to allow Taverna Server to read and write directly to Fedora and will be useful if your workflow uses other external services that you want to pass Fedora content to and then store the results.

4.1.4 Logging

Logging is the systems method of tracking key events that occur during the processing of data. Logging.php contains all of the methods used for opening/closing the log file as well as methods to set the write path and to do the writing.

Currently the activity logs are stored on the server in a location defined in the root config.xml. This serves as the location for the majority of the listener and services logs. This log file must be writable to both the apache user and the user that starts the listener. The logs will contain status messages reporting successful or unsuccessful system calls that are performed during the transfer and manipulation of data by various microservices.

There are two other log files that the listeners will write to, both are hardcoded to write to the directory the listeners are started from. These are the application.log and the error.log.

The error.log file will contain fatal errors and the application.log prints out the amount of ram the listeners are using (checking the application log can help you determine that the listener is still running).

The apache error_log can also be useful when troubleshooting issues.

4.2 Guide to defining a new Service

4.2.1 Adding a new local microservice

Define the Service wrapper
  • In a new php file create a new class that extends the IslandoraService class. The RoblibServices.php is an example of such a file.

  • In the constructor of this class create call the parent constructor, call the connect function and define your dispatch_map array

    • The __dispatch_map is where you define the input/output parameters for the WSDL.
     $this->__dispatch_map['pdf'] = array(
       'in' => array(
         'pid' => 'string',
         'dsid' => 'string',
         'outputdsid' => 'string',
         'label' => 'string'
       ),
       'out' => array('exit_status' => 'int')
     );
    

This will generate a WSDL with the inputs and outputs that you specify in this map Note: the parameters specified here are the ones you will define the microservice functions with (below).

  • Create a function for each service defined in the __dispatch_map. In this function create an array to pass to the parent's service function. Using the service function is not absolutely required but it can save some coding and keeps logging and error handling consistent.

function pdf($pid, $dsid = 'OBJ', $outputdsid = "PDF", $label = 'PDF') { $params = array('class' => 'Pdf', 'function' => 'toPdf'); return $this->service($pid, $dsid, $outputdsid, $label, $params); } ```

The array must contain the name of the class, the function to call in this class and any other parameters needed by this function.

Implement the service

  • In another new php file create a class that extends the Derivatives class.

  • In this class you will create methods that create the derivative files and add the derivatives as data streams to Fedora objects. By extending the Derivative class you can take advantage of the Devivative classes connection to Fedora and it's log functions.

    function toPdf($dsid = 'PDF', $label = 'PDF') {
     $this->log->lwrite('Starting processing', 'PROCESS_DATASTREAM', $this->pid, $dsid);
     $return = MS_SYSTEM_EXCEPTION;
     if (file_exists($this->temp_file)) {
       $output_file = $this->temp_file . '_pdf.pdf';
       $command = 'convert ' . $this->temp_file . ' ' . $output_file . ' 2>&1';
       $pdf_output = array();
       exec($command, $pdf_output, $return);
       if (file_exists($output_file)) {
         $log_message = "$dsid derivative created using convert with command - $command || SUCCESS";
         $return = $this->add_derivative($dsid, $label, $output_file, 'application/pdf', $log_message);
       }
       else {
         $this->log->lwrite("Could not find the file '$output_file' for output: " . implode(', ', $pdf_output) . "\nReturn value: $return", 'FAIL_DATASTREAM', $this->pid, $dsid, NULL, 'ERROR');
       }
     }
     else {
       $this->log->lwrite("Could not create the $dsid derivative! could not find file $this->temp_file " . $return . ' ' . implode($pdf_output), 'FAIL_DATASTREAM', $this->pid, $dsid, NULL, 'ERROR');
     }
     return $return;
    }
    
  • Your methods should return 0 for success and a negative number for errors. This will allow you to use looping etc. when designing a workflow in Taverna Workbench.

  • Put your new php file in the listeners include directory.

  • Update the includes in your service wrapper php file (the file that includes the class that extends IslandoraService) so that it includes this file (you can use the RoblibServices.php as a guide).

Update the config.xml file

You will need to make the Soap server aware of your new new service/s.

  • Put your service wrapper php file in the same directory as the soap_serv.php file.

  • update the config.xml to add your service/s to the list of services.

    <config>
     <!-- the path the the microservices main directory.
     The microservices should not be in a web accesible location as there
     are config files with sensitive information.  
     -->
         <path>/opt/php_listeners</path>
     <!-- 
     a list of classes for the soap server to load, see RoblibServices.php for an example
     The service element should be a classname and this class should be implemented 
     in a file with the same name ie. classname.php.  The new file must be put in 
     the same directory as this file.
     -->
         <services>
            <service>RoblibServices</service>
            <service>MyNewServiceClass</service>
         </services>
    </config>
    

These functions are going to be called using REST when an fedora object is modified

4.2.2 Taverna Workbench - Import a new service (WSDL)

Now that you have set up your dispatch map for your new microservice and added it’s function call so that the proper microservice will be executed, you are now ready to import this service into Taverna Workbench. Open Taverna Workbench In the Design workspace inside of the service panel, click ‘Import new services’ Select WSDL service... Point the URL to the permanent directory of your SOAP server, in our case, it was: http://ipaddress/soap_serv.php?wsdl This will generate a list of services contained in your __dispatch_map descriptions plus any other services configured in config.xml. Drag this over into your workflow workspace and add the necessary inputs and outputs and then save it as ‘$DSID.t2flow’ for easy reference Save this t2flow file in the appropriate content model and you now have the ability to use your newly fashioned, web-facing microservice.

4.2.3 External Microservices

For external services, you will need to either find trusted hosts of WSDL files to send your data to, or import them directly through Taverna Workbench by searching for them. This functionality will be aided by our content reader and writer microservices which will handle the sending/receiving of data to the target and write the derived output from the source back to fedora.

The importing of external WSDLs is the same process as described above as the generated WSDLs are now web-facing (if you have the proper URL). Most (or at least many) publicly available web services use XML to communicate. That is, they take XML documents as parameters and return XML documents containing results. Taverna has local services for XML manupulation, but they're not very well documented. The one that we've used with some success is "Transform XML with parameters", which takes a source string (XML) and an XSL string used to transform it.

XSL is a standard way of transforming XML. You can google it if you need to learn some in-depth things, but you're basically looking at something like this:

<xsl:stylesheet
    xmlns:xsl="http://w3.org/1999/XSL/Transform"
    xmlns:srv="URL of service definition">
    <xsl:output omit-xml-declaration="yes" /> // this might be necessary sometimes, try it if things break
    <xsl:template match="/srv:Name of first element of response">
    <xsl:value-of select="srv:path/srv:to/srv:element" />
    </xsl:template>
</xsl:stylesheet>

4.3 Communicating with Taverna

Taverna Server 2.4.1 supports both REST and SOAP APIs; you may use either API to access the service and any of the workflow runs hosted by the service. This simple guide just discusses the REST API.

  1. The client starts by creating a workflow run. This is done by POSTing a T2flow document to the service at the address http://ipordomain:8080/taverna/rest/runs with the content type application/vnd.taverna.t2flow+xml.
  2. The result of the POST is an HTTP 201 Created that gives the location of the created run (in a Location header), http://iprordomain:8080/taverna/rest/runs/UUID (where UUID is a unique string that identifies the particular run; this is also the name of the run that you would use in the SOAP interface). Note that the run is not yet actually doing anything.
  3. Next, you need to set up the inputs to the workflow ports. To set the input port, FOO, to have the value BAR, you would PUT a message like this to the URI http://ipordomain:8080/taverna/rest/runs/UUID/input/input/FOO
<t2sr:runInput xmlns:t2sr="http://ns.taverna.org.uk/2010/xml/server/rest/">
<t2sr:value>BAR</t2sr:value>
</t2sr:runInput>
  1. Now you can start the file running. This is done by using a PUT to set http://ipordomain:8080/taverna/rest/runs/UUID/status to the plain text value Operating.
  2. Now you need to poll, waiting for the workflow to finish. To discover the state of a run, you can (at any time) do a GET onhttp://ipordomain:8080/taverna/rest/runs/UUID/status; when the workflow has finished executing, this will return Finished instead of Operating (or Initialized, the starting state).
  3. Every workflow run has an expiry time, after which it will be destroyed and all resources (i.e., local files) associated with it cleaned up. By default in this release, this is 20 minutes after initial creation. To see when a particular run is scheduled to be disposed of, do a GET on http://ipordomain:8080/taverna/rest/runs/UUID/expiry; you may set the time when the run is disposed of by PUTting a new time to that same URI. Note that this includes not just the time when the workflow is executing, but also when the input files are being created beforehand and when the results are being downloaded afterwards; you are advised to make your clients regularly advance the expiry time while the run is in use.
  4. The outputs from the workflow are files created in the out subdirectory of the run's working directory. The contents of the subdirectory can be read by doing a GET on http://ipordomain:8080/taverna/rest/runs/UUID/wd/out which will return an XML document describing the contents of the directory, with links to each of the files within it. Doing a GET on those links will retrieve the actual created files (as uninterpreted binary data).
  5. Thus, if a single output FOO.OUT was produced from the workflow, it would be written to the file that can be retrieved fromh ttp://ipordomain:8080/taverna/rest/runs/UUID/wd/out/FOO.OUT and the result of the GET on http://ipordomain:8080/taverna/rest/runs/UUID/wd/out would look something like this:
<t2sr:directoryContents xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:t2sr="http://ns.taverna.org.uk/2010/xml/server/rest"
xmlns:t2s="http://ns.taverna.org.uk/2010/xml/server/">
<t2s:file xlink:href="http://ipordomain:8080/taverna/rest/runs/UUID/wd/out/FOO.OUT"
t2sr:name="FOO.OUT">out/FOO.OUT</t2s:file>
</t2sr:directoryContents>
  1. The standard output and standard error from the T2 Command Line Executor subprocess can be read via properties of the special I/O listener. To do that, do a GET on http://ipordomain:8080/taverna/rest/runs/UUID/listeners/io/properties/stdout (or .../stderr). Once the subprocess has finished executing, the I/O listener will provide a third property containing the exit code of the subprocess, called exitcode.
  2. Note that the supported set of listeners and properties will be subject to change in future versions of the server, and should not be relied upon.
  3. Once you have finished, destroy the run by doing a DELETE on http://ipordomain:8080/taverna/rest/runs/UUID. Once you have done that, none of the resources associated with the run (including both input and output files) will exist any more. If the run is still executing, this will also cause it to be stopped. All operations described above have equivalents in the SOAP service interface.

4.4 Guide to the JMS listener

Basically the Islandora php listener listens for messages from Fedora. Fedora sends a message every time a Fedora object is ingested or modified. When the listener receives a massage it inspects the modified objects content model for workflows to run. If it finds a workflow it will post this to Taverna and follow the steps listed in 4.3 above.

the API for taverna sender can be found here: https://github.com/roblib/php_listeners/blob/taverna-1.x/tavernaSender.php

< previous page | next >