Revert to a single COPY operation per table instead of per chunk #308

wemrysi · 2022-05-09T16:12:33Z

The per-chunk strategy appears to result in rather poor performance once data size increases. We'd like to revert to using a single COPY operation per-table to attempt to reclaim some performance. Some of the previous reliability concerns can be ameliorated via source buffering.

In the case that we still run into reliability issues due to timeouts/long-running transactions a possible solution would be to define a maximum duration between writes to the COPY stream. If the threshold is reached, we commit the current operation and begin anew on the next chunk from upstream. This should avoid timeouts for slow sources while preserving performance where possible.

The text was updated successfully, but these errors were encountered:

wemrysi · 2022-05-09T16:12:38Z

<1>

wemrysi · 2022-05-09T23:42:19Z

Still debugging some issues with restructuring the COPY using flow sinks. May need to revert to something like the pre-flow implementation if the problem persists.

jsantos17 · 2022-05-09T23:45:44Z

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

wemrysi · 2022-05-10T00:09:09Z

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

Hm, yeah that might be enough, good idea. We're seeing 3MiB chunks on the problematic instance now, so maybe we try rechunking to 32MiB and see if that helps.

wemrysi mentioned this issue May 10, 2022

Rechunk to 32MiB chunks prior to writing #310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert to a single COPY operation per table instead of per chunk #308

Revert to a single COPY operation per table instead of per chunk #308

wemrysi commented May 9, 2022

wemrysi commented May 9, 2022

wemrysi commented May 9, 2022

jsantos17 commented May 9, 2022 •

edited

Loading

wemrysi commented May 10, 2022 •

edited

Loading

Revert to a single COPY operation per table instead of per chunk #308

Revert to a single COPY operation per table instead of per chunk #308

Comments

wemrysi commented May 9, 2022

wemrysi commented May 9, 2022

wemrysi commented May 9, 2022

jsantos17 commented May 9, 2022 • edited Loading

wemrysi commented May 10, 2022 • edited Loading

jsantos17 commented May 9, 2022 •

edited

Loading

wemrysi commented May 10, 2022 •

edited

Loading