Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to a single COPY operation per table instead of per chunk #308

Open
wemrysi opened this issue May 9, 2022 · 4 comments
Open

Revert to a single COPY operation per table instead of per chunk #308

wemrysi opened this issue May 9, 2022 · 4 comments

Comments

@wemrysi
Copy link
Contributor

wemrysi commented May 9, 2022

The per-chunk strategy appears to result in rather poor performance once data size increases. We'd like to revert to using a single COPY operation per-table to attempt to reclaim some performance. Some of the previous reliability concerns can be ameliorated via source buffering.

In the case that we still run into reliability issues due to timeouts/long-running transactions a possible solution would be to define a maximum duration between writes to the COPY stream. If the threshold is reached, we commit the current operation and begin anew on the next chunk from upstream. This should avoid timeouts for slow sources while preserving performance where possible.

@wemrysi
Copy link
Contributor Author

wemrysi commented May 9, 2022

<1>

@wemrysi
Copy link
Contributor Author

wemrysi commented May 9, 2022

Still debugging some issues with restructuring the COPY using flow sinks. May need to revert to something like the pre-flow implementation if the problem persists.

@jsantos17
Copy link
Contributor

jsantos17 commented May 9, 2022

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

@wemrysi
Copy link
Contributor Author

wemrysi commented May 10, 2022

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

Hm, yeah that might be enough, good idea. We're seeing 3MiB chunks on the problematic instance now, so maybe we try rechunking to 32MiB and see if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants