Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In 1101 base workflow #21

Merged
merged 5 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 43 additions & 45 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Description of the app
SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
AWS_REGION_NAME=### Default AWS region.
DSS_INPUT_QUEUE=### The DSS SQS input queue to which submission messages are sent.
```

### Optional
Expand Down
1 change: 1 addition & 0 deletions dsc/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ class Config:
"WORKSPACE",
"SENTRY_DSN",
"AWS_REGION_NAME",
"DSS_INPUT_QUEUE",
]

OPTIONAL_ENV_VARS: Iterable[str] = ["LOG_LEVEL"]
Expand Down
8 changes: 8 additions & 0 deletions dsc/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
class InvalidDSpaceMetadataError(Exception):
pass


class InvalidSQSMessageError(Exception):
pass


class ItemMetadatMissingRequiredFieldError(Exception):
pass
30 changes: 30 additions & 0 deletions dsc/item_submission.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking some of the other file reorganizing and renaming into consideration -- and thanks BTW, it's feeling good to navigate around! -- I could envision this file called something more high level like items.py. If other classes ever made sense to add specific to items, it would be a natural place for it.

I don't have evidence or even a pointable philosophy to support it, but when the classname mirrors the filename, it feels a bit off. While we may not need more "item" type classes, it's tight coupling between file and class names.

Totally optional.

Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import json
import logging
from dataclasses import dataclass
from typing import Any

from dsc.utilities.aws.s3 import S3Client

logger = logging.getLogger(__name__)


@dataclass
class ItemSubmission:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think ItemSubmission would benefit from some kind of validate() method? I'm unsure offhand what that might look for, but I'm noticing now that we call generate_and_upload_dspace_metadata() in the for loop of the workflow and it might be helpful to check things look good before we fire it off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have a valid_dspace_metadata method in wiley-deposits but I may want to rework it since that had specific field names to validate while these workflows may be less constrained. But yes, there should be some validation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: If we find that dataclass module is the right fit for this module, I propose calling the proposed validate() method in a __post_init__ method!

"""A class to store the required values for a DSpace submission."""

dspace_metadata: dict[str, Any]
bitstream_uris: list[str]
metadata_s3_key: str
metadata_uri: str = ""
Comment on lines +17 to +18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but maybe worth considering renaming metadata_s3_uri to match the pattern of metadata_s3_key.

Copy link
Contributor Author

@ehanson8 ehanson8 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to metadata_s3_uri and bitstream_s3_uris for clarity and consistency in the upcoming PR


def upload_dspace_metadata(self, bucket: str) -> None:
"""Upload DSpace metadata to S3 using the specified bucket and keyname.

Args:
bucket: The S3 bucket for uploading the item metadata file.
"""
s3_client = S3Client()
s3_client.put_file(json.dumps(self.dspace_metadata), bucket, self.metadata_s3_key)
metadata_uri = f"s3://{bucket}/{self.metadata_s3_key}"
logger.info(f"Metadata uploaded to S3: {metadata_uri}")
self.metadata_uri = metadata_uri
10 changes: 10 additions & 0 deletions dsc/workflows/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""dsc.workflows.

All primary functions used by CLI are importable from here.
"""

from dsc.workflows.base import BaseWorkflow

__all__ = [
"BaseWorkflow",
]
Loading
Loading