Skip to content

Metadata extraction from the Pure Research Information System.

License

Notifications You must be signed in to change notification settings

lulibrary/puree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purée

Metadata extraction from the Pure Research Information System.

Status

Gem Version Maintainability

Installation

Add this line to your application's Gemfile:

gem 'puree'

And then execute:

$ bundle

Or install it yourself as:

$ gem install puree

Configuration

# For Extractor and REST modules.
config = {
  url:      'https://YOUR_HOST/ws/api/YOUR_API_VERSION',
  username: 'YOUR_USERNAME',
  password: 'YOUR_PASSWORD',
  api_key:  'YOUR_API_KEY'
}

Purée is tested using known data within a Pure installation.

Purée version Pure API version
< 2.0 < 59
>= 2.0, < 2.5 59, 510
>= 2.5, < 2.7 511, 512
2.7 513
2.8 514
2.9 515, 516, 517, 518, 519, 520, 521, 522, 523, 524

Extractor module

# Configure an extractor for a resource
extractor = Puree::Extractor::Dataset.new config
# Find out how many records are available
extractor.count
#=> 1000
# Fetch a random record
extractor.random
#=> #<Puree::Model::Dataset:0x00c0ffee>
# Fetch the metadata for a record with a particular identifier
dataset = extractor.find 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<Puree::Model::Dataset:0x00c0ffee>
# Access specific metadata e.g. an internal person's name
dataset.persons_internal[0].name
#=> #<Puree::Model::PersonName:0x00c0ffee @first="Foo", @last="Bar">
# Select a formatting style for a person's name
dataset.persons_internal[0].name.last_initial
#=> "Bar, F."

XMLExtractor module

Get Ruby objects from Pure XML.

Single record

xml = '<project> ... </project>'
# Configure an XML extractor
xml_extractor = Puree::XMLExtractor::Project.new xml
# Get a single piece of metadata
xml_extractor.title
#=> "An interesting project title"
# Get all the metadata together
xml_extractor.model
#=> #<Puree::Model::Project:0x00c0ffee>

Homogeneous record collection

xml = '<result>
        <dataSet> ... </dataSet>
        <dataSet> ... </dataSet>
        ...
      </result>'
# Get an array of datasets
Puree::XMLExtractor::Collection.datasets xml
#=> [#<Puree::Model::Dataset:0x00c0ffee>, ...]

Heterogeneous record collection

xml = '<result>
        <contributionToJournal> ... </contributionToJournal>
        <contributionToConference> ... </contributionToConference>
        ...
      </result>'
# Get a hash of research outputs
Puree::XMLExtractor::Collection.research_outputs xml
#=> {
#     journal_articles: [#<Puree::Model::JournalArticle:0x00c0ffee>, ...],
#     conference_papers: [#<Puree::Model::ConferencePaper:0x00c0ffee>, ...],
#     theses: [#<Puree::Model::Thesis:0x00c0ffee>, ...],
#     other: [#<Puree::Model::ResearchOutput:0x00c0ffee>, ...]
#   }

REST module

Query the Pure REST API.

Client

# Configure a client
client = Puree::REST::Client.new config
# Find a person
client.persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>
# Find a person, limit the metadata to ORCID and employee start date
client.persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx',
                    params: {fields: ['orcid', 'employeeStartDate']}
#=> #<HTTP::Response:0x00c0ffee>
# Find five people, response body as JSON
client.persons.all params: {size: 5}, accept: :json
#=> #<HTTP::Response:0x00c0ffee>
# Find three active academics
params = {
  size: 3,
  employmentTypeUri: ['/dk/atira/pure/person/employmenttypes/academic'],
  employmentStatus: 'ACTIVE'
}
client.persons.all_complex params: params
#=> #<HTTP::Response:0x00c0ffee>
# Find research outputs for a person
client.persons.research_outputs id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>

Resource

# Configure a resource
persons = Puree::REST::Person.new config
# Find a person
persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>

REST module with XMLExtractor module

Query the Pure REST API and get Ruby objects from Pure XML.

# Configure a client
client = Puree::REST::Client.new config
# Find projects for a person
response = client.persons.projects id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
# Extract metadata from XML
Puree::XMLExtractor::Collection.projects response.to_s
#=> [#<Puree::Model::Project:0x00c0ffee>, ...]