Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace STCN reader #53

Open
wants to merge 32 commits into
base: develop
Choose a base branch
from
Open

Replace STCN reader #53

wants to merge 32 commits into from

Conversation

tijmenbaarda
Copy link
Contributor

In this pull request, I replace the existing STCN reader with another one that uses STCN's JSON API instead of their linked data service. In addition, I added more fields as per Jeroen's request and added a separate reader for STCN's persons database.

Because of Jeroen's request to add more fields to the STCN reader, which were partly not available in the RDF version of the STCN API, I looked into STCN's alternative JSON API. This API has a few advantages:

  • It gives access to the native data and is exactly aligned to the public web interface
  • It gives richer results, including some fields that we needed but that were not available before
  • Its search API seems to be (much) faster for search than the SPARQL endpoint
  • It uses the same query language as in the public web interface

Except for bibliographical items, STCN also features two types of biographical items: one for authors and other contributors and one for publishers/printers. I decided to make separate readers for authors (leaving aside publishers/printers for now), because the data types are completely different and it does not make sense to show both of them in a single table (there are no shared fields), but in principle it would be possible because they share the same search API.

The record ontology now needs an update, but I will make a separate PR for that.

Copy link

@jgonggrijp jgonggrijp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your decision to switch to the JSON API, though I'm also somewhat disappointed with STCN that they don't seem to have given their RDF API a first-class treatment.

I have a bunch of nitpicks on code details, but nothing major. As a general remark, I suggest learning to use map and filter as a habit; they help to reduce code complexity and to outfactor small functions that can be tested separately.

Other than that, I think your code is well-factored at the macro scale. Well done on the CERL outfactor.

edpop_explorer/cerl.py Show resolved Hide resolved
edpop_explorer/edpopxshell.py Show resolved Hide resolved
edpop_explorer/edpopxshell.py Outdated Show resolved Hide resolved
edpop_explorer/cerl.py Show resolved Hide resolved
edpop_explorer/cerl.py Outdated Show resolved Hide resolved
edpop_explorer/readers/stcn.py Outdated Show resolved Hide resolved
Comment on lines +143 to +148
if place is None:
return None
else:
field = LocationField(place)
field.location_type = LocationField.LOCALITY
return field

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of these methods, you do a lot of "if the value is None return None, otherwise return a new class instance based on the value". You could probably outfactor that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see no way of doing that without outfactoring the parts after else: to different functions, which I think is more complicated. Is there another way of outfactoring?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that those parts would have to be new functions, but you could end up with the same number of functions in the end by generating the rest of the function with a metafunction.

However, there is an easier alternative which I now notice you already used in the _get_dating method. Just replace

if value is None:
    return None
# rest of function

by

if value is not None:
    # rest of function

which works because every function returns None by default. Saves a line per method.

edpop_explorer/readers/stcn.py Show resolved Hide resolved
edpop_explorer/readers/stcn.py Outdated Show resolved Hide resolved
edpop_explorer/readers/stcn.py Show resolved Hide resolved
@tijmenbaarda
Copy link
Contributor Author

Thanks a lot for the thorough review! I have addressed most issues; where I did not I have made comments above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants