Skip to content

Commit

Permalink
Merge pull request #520 from ckan/drop-support-old-versions
Browse files Browse the repository at this point in the history
Drop support old versions
  • Loading branch information
amercader authored Mar 15, 2023
2 parents 89a98d7 + a168881 commit c68eedf
Show file tree
Hide file tree
Showing 38 changed files with 123 additions and 502 deletions.
12 changes: 3 additions & 9 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.8'
- name: Install requirements
Expand All @@ -19,7 +19,7 @@ jobs:
needs: lint
strategy:
matrix:
ckan-version: ["2.10", 2.9, 2.9-py2, 2.8, 2.7]
ckan-version: ["2.10", 2.9]
fail-fast: false

name: CKAN ${{ matrix.ckan-version }}
Expand Down Expand Up @@ -55,14 +55,8 @@ jobs:
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
- name: Setup extension (CKAN >= 2.9)
if: ${{ matrix.ckan-version != '2.7' && matrix.ckan-version != '2.8' }}
run: |
ckan -c test.ini db init
ckan -c test.ini harvester initdb
- name: Setup extension (CKAN < 2.9)
if: ${{ matrix.ckan-version == '2.7' || matrix.ckan-version == '2.8' }}
run: |
paster --plugin=ckan db init -c test.ini
paster --plugin=ckanext-harvest harvester initdb -c test.ini
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.harvest --disable-warnings ckanext/harvest/tests
105 changes: 10 additions & 95 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,8 @@ Configuration

Run the following command to create the necessary tables in the database (ensuring the pyenv is activated):

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester initdb

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester initdb --config=/etc/ckan/default/production.ini

Finally, restart CKAN to have the changes take effect::

sudo service apache2 restart
Expand Down Expand Up @@ -213,7 +207,7 @@ IF you want to set a timeout for harvest jobs, you can add this configuration op

ckan.harvest.timeout = 1440

The timeout value is in minutes, so 1440 represents 24 hours.
The timeout value is in minutes, so 1440 represents 24 hours.
Any jobs which are timed out will create an error message for the user to see.

If you don't specify this setting, the default will be False and there will be no timeout on harvest jobs.
Expand Down Expand Up @@ -289,9 +283,9 @@ The following operations can be run from the command line as described underneat
import) without involving the web UI or the queue backends. This is
useful for testing a harvester without having to fire up
gather/fetch_consumer processes, as is done in production.

harvester run-test {source-id/name} force-import=guid1,guid2...
- In order to force an import of particular datasets, useful to
- In order to force an import of particular datasets, useful to
target a dataset for dev purposes or when forcing imports on other environments.

harvester gather-consumer
Expand Down Expand Up @@ -335,22 +329,17 @@ The following operations can be run from the command line as described underneat

The commands should be run with the pyenv activated and refer to your CKAN configuration file:

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester --help

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester sources

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester sources --config=/etc/ckan/default/production.ini
**Note that on CKAN >= 2.9 all commands with an underscore in their name changed.** They now use a hyphen instead of an underscore (e.g. ``gather_consumer`` changed to ``gather-consumer``).

Authorization
=============

Starting from CKAN 2.0, harvest sources behave exactly the same as datasets
Harvest sources behave exactly the same as datasets
(they are actually internally implemented as a dataset type). That means they
can be searched and faceted, and that the same authorization rules can be
applied to them. The default authorization settings are based on organizations.
Expand Down Expand Up @@ -700,10 +689,10 @@ harvester run-test
You can run a harvester simply using the ``run-test`` command. This is handy
for running a harvest with one command in the console and see all the output
in-line. It runs the gather, fetch and import stages all in the same process.
You must ensure that you have pip installed ``dev-requirements.txt``
You must ensure that you have pip installed ``dev-requirements.txt``
in ``/home/ckan/ckan/lib/default/src/ckanext-harvest`` before using the
``run-test`` command.

This is useful for developing a harvester because you can insert break-points
in your harvester, and rerun a harvest without having to restart the
gather_consumer and fetch_consumer processes each time. In addition, because it
Expand All @@ -727,35 +716,17 @@ handles the gathering and another one that handles the fetching and importing.
To start the consumers run the following command (make sure you have your
python environment activated):

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester gather-consumer

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini

On another terminal, run the following command:

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester fetch-consumer

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini

Finally, on a third console, run the following command to start any
pending harvesting jobs:

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester run

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini

The ``run`` command not only starts any pending harvesting jobs, but also
flags those that are finished, allowing new jobs to be created on that particular
source and refreshing the source statistics. That means that you will need to run
Expand All @@ -771,14 +742,8 @@ circumstance, ensure that the gather & fetch consumers are running and have
nothing more to consume, and then run this abort command with the name or id of
the harvest source:

ON CKAN >= 2.9::

(pyenv) $ ckan --config=/etc/ckan/default/ckan.ini harvester job-abort {source-id/name}

ON CKAN <= 2.8::

(pyenv) $ paster --plugin=ckanext-harvest harvester job_abort {source-id/name} --config=/etc/ckan/default/production.ini


Setting up the harvesters on a production server
================================================
Expand Down Expand Up @@ -855,42 +820,6 @@ following steps with the one you are using.
startsecs=10


ON CKAN <= 2.8::


; ===============================
; ckan harvester
; ===============================

[program:ckan_gather_consumer]

command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini

; user that owns virtual environment.
user=ckan

numprocs=1
stdout_logfile=/var/log/ckan/std/gather_consumer.log
stderr_logfile=/var/log/ckan/std/gather_consumer.log
autostart=true
autorestart=true
startsecs=10

[program:ckan_fetch_consumer]

command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini

; user that owns virtual environment.
user=ckan

numprocs=1
stdout_logfile=/var/log/ckan/std/fetch_consumer.log
stderr_logfile=/var/log/ckan/std/fetch_consumer.log
autostart=true
autorestart=true
startsecs=10


There are a number of things that you will need to replace with your
specific installation settings (the example above shows paths from a
ckan instance installed via Debian packages):
Expand Down Expand Up @@ -952,16 +881,9 @@ following steps with the one you are using.
Paste this line into your crontab, again replacing the paths to paster and
the ini file with yours:

ON CKAN >= 2.9::

# m h dom mon dow command
*/15 * * * * /usr/lib/ckan/default/bin/ckan -c /etc/ckan/default/ckan.ini harvester run
ON CKAN <= 2.8::

# m h dom mon dow command
*/15 * * * * /usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini

This particular example will check for pending jobs every fifteen minutes.
You can of course modify this periodicity, this `Wikipedia page <http://en.wikipedia.org/wiki/Cron#CRON_expression>`_
has a good overview of the crontab syntax.
Expand All @@ -973,16 +895,9 @@ following steps with the one you are using.
Paste this line into your crontab, again replacing the paths to paster/ckan and
the ini file with yours:

ON CKAN >= 2.9::

# m h dom mon dow command
0 5 * * * /usr/lib/ckan/default/bin/ckan -c /etc/ckan/default/ckan.ini harvester clean-harvest-log

ON CKAN <= 2.8::

# m h dom mon dow command
0 5 * * * /usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester clean_harvest_log --config=/etc/ckan/default/production.ini

This particular example will perform clean-up each day at 05 AM.
You can tweak the value according to your needs.

Expand All @@ -992,17 +907,17 @@ Extensible actions
Recipients on harvest jobs notifications
----------------------------------------

:code:`harvest_get_notifications_recipients`: you can *chain* this action from another extension to change
:code:`harvest_get_notifications_recipients`: you can *chain* this action from another extension to change
the recipients for harvest jobs notifications.

.. code-block:: python
@toolkit.chained_action
def harvest_get_notifications_recipients(up_func, context, data_dict):
""" Harvester plugin notify by default about harvest jobs only to
""" Harvester plugin notify by default about harvest jobs only to
admin users of the related organization.
Also allow to add custom recipients with this function.
Return a list of dicts with name and email like
{'name': 'John', 'email': 'john@source.com'} """
Expand All @@ -1021,7 +936,7 @@ Tests
You can run the tests like this::

cd ckanext-harvest
nosetests --reset-db --ckan --with-pylons=test-core.ini ckanext/harvest/tests
pytest --ckan-ini=test.ini ckanext/harvest/tests

Here are some common errors and solutions:

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
21 changes: 1 addition & 20 deletions ckanext/harvest/harvesters/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from ckan.logic.schema import default_create_package_schema
from ckan.lib.navl.validators import ignore_missing, ignore
from ckan.lib.munge import munge_title_to_name, substitute_ascii_equivalents
from ckan.lib.munge import munge_title_to_name, munge_tag

from ckanext.harvest.model import (HarvestObject, HarvestGatherError,
HarvestObjectError, HarvestJob)
Expand All @@ -25,25 +25,6 @@
from ckanext.harvest.interfaces import IHarvester
from ckanext.harvest.logic.schema import unicode_safe

if p.toolkit.check_ckan_version(min_version='2.3'):
from ckan.lib.munge import munge_tag
else:
# Fallback munge_tag for older ckan versions which don't have a decent
# munger
def _munge_to_length(string, min_length, max_length):
'''Pad/truncates a string'''
if len(string) < min_length:
string += '_' * (min_length - len(string))
if len(string) > max_length:
string = string[:max_length]
return string

def munge_tag(tag):
tag = substitute_ascii_equivalents(tag)
tag = tag.lower().strip()
tag = re.sub(r'[^a-zA-Z0-9\- ]', '', tag).replace(' ', '-')
tag = _munge_to_length(tag, model.MIN_TAG_LENGTH, model.MAX_TAG_LENGTH)
return tag

log = logging.getLogger(__name__)

Expand Down
8 changes: 3 additions & 5 deletions ckanext/harvest/harvesters/ckanharvester.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
from __future__ import absolute_import
import six
import requests
from requests.exceptions import HTTPError, RequestException

import datetime

from six.moves.urllib.parse import urlencode
from urllib.parse import urlencode
from ckan import model
from ckan.logic import ValidationError, NotFound, get_action
from ckan.lib.helpers import json
Expand Down Expand Up @@ -119,8 +118,7 @@ def validate_config(self, config):
raise ValueError('default_groups must be a *list* of group'
' names/ids')
if config_obj['default_groups'] and \
not isinstance(config_obj['default_groups'][0],
six.string_types):
not isinstance(config_obj['default_groups'][0], str):
raise ValueError('default_groups must be a list of group '
'names/ids (i.e. strings)')

Expand Down Expand Up @@ -520,7 +518,7 @@ def get_extra(key, package_dict):
if existing_extra:
package_dict['extras'].remove(existing_extra)
# Look for replacement strings
if isinstance(value, six.string_types):
if isinstance(value, str):
value = value.format(
harvest_source_id=harvest_object.job.source.id,
harvest_source_url=harvest_object.job.source.url.strip('/'),
Expand Down
14 changes: 2 additions & 12 deletions ckanext/harvest/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def package_list_for_source(source_id):
query = logic.get_action('package_search')(context, search_dict)

base_url = h.url_for(
'{0}_read'.format(DATASET_TYPE_NAME),
'{0}.read'.format(DATASET_TYPE_NAME),
id=harvest_source['name']
)

Expand Down Expand Up @@ -124,7 +124,7 @@ def link_for_harvest_object(id=None, guid=None, text=None):
obj = logic.get_action('harvest_object_show')(context, {'id': guid, 'attr': 'guid'})
id = obj.id

url = h.url_for('harvest_object_show', id=id)
url = h.url_for('harvest.object_show', id=id)
text = text or guid or id
link = '<a href="{url}">{text}</a>'.format(url=url, text=text)

Expand All @@ -138,13 +138,3 @@ def harvest_source_extra_fields():
continue
fields[harvester.info()['name']] = list(harvester.extra_schema().keys())
return fields


def bootstrap_version():
if p.toolkit.check_ckan_version(max_version='2.7.99'):
return 'bs2'
else:
return (
'bs2' if
p.toolkit.config.get('ckan.base_public_folder') == 'public-bs2'
else 'bs3')
Loading

0 comments on commit c68eedf

Please sign in to comment.