Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to intercept File IO by specifying a wrapper class #155

Merged
merged 5 commits into from
Aug 29, 2024

Conversation

andrew4699
Copy link
Contributor

Description

This PR adds the ability to provide wrappers around File IO by specifying a wrapper factory. It comes with a no-op wrapper factory which performs no wrapping. I did not implement any specific type of wrapper as there can be a separate discussion on which interceptors would be useful to put in the repo if any, but it should be uncontroversial that the ability to intercept this is generally useful for use cases such as metrics.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • I added a unit test testFileIOWrapper.
  • All existing tests work with the NoOp wrapper.
  • I ran through the quickstart and verified that I could perform all the same table operations.

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue
  • I have signed and submitted the ICLA and if needed, the CCLA. See Contributing for details.

@andrew4699 andrew4699 requested a review from a team as a code owner August 16, 2024 23:56
@andrew4699
Copy link
Contributor Author

@collado-mike I changed it to just be FileIOFactory and added MetricRegistryAware to be consistent with the other xAware interfaces.

collado-mike
collado-mike previously approved these changes Aug 21, 2024
Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Minor suggestion for an additional metric and add a javadoc and we'll be good


@Override
public void deleteFile(InputFile file) {
io.deleteFile(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

measure files deleted? Seems like a useful metric (especially if it spikes real high real fast 😅)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some testing for measuring deleted files. Originally I left it out as MeasuredFileIOFactory is just a test class but I may as well add it as an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! didn't realize this was a test class. It seems super useful to me. Maybe move it into src/main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation isn't that great since it doesn't clean up IOs and its metrics may not be super useful, it was just intended to be super simple to confirm that File IO is wrapped properly.

@@ -60,6 +62,16 @@ public MetaStoreManagerFactory getMetaStoreManagerFactory() {
return metaStoreManagerFactory;
}

@JsonProperty("io")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factoryType could be a top level config, but I made it io => factoryType to push a grouping of io-related configs. For example if you choose to make a metrics-emitting implementation of FileIOFactory, you may want some further configs.

@andrew4699 andrew4699 force-pushed the aguterman-fileio-wrapper branch from 92c2539 to 55b34b8 Compare August 21, 2024 16:09
@andrew4699 andrew4699 force-pushed the aguterman-fileio-wrapper branch from 55b34b8 to f56f773 Compare August 26, 2024 16:39
@andrew4699 andrew4699 force-pushed the aguterman-fileio-wrapper branch 2 times, most recently from b863463 to 7b92ae1 Compare August 26, 2024 16:43
@andrew4699 andrew4699 requested a review from takidau as a code owner August 26, 2024 16:43
@andrew4699 andrew4699 force-pushed the aguterman-fileio-wrapper branch 2 times, most recently from baccf83 to 5ed91cc Compare August 26, 2024 17:05
@andrew4699 andrew4699 force-pushed the aguterman-fileio-wrapper branch from a6f6846 to 434e1e4 Compare August 26, 2024 22:57
@takidau takidau merged commit 07c8444 into apache:main Aug 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants