Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tap with fastsync #8

Open
wants to merge 57 commits into
base: release
Choose a base branch
from
Open

Update tap with fastsync #8

wants to merge 57 commits into from

Conversation

brose7230
Copy link

No description provided.


columns.extend(['_SDC_EXTRACTED_AT','_SDC_DELETED_AT','_SDC_BATCHED_AT'])

query_df = df = pd.DataFrame(columns=columns) #TODO: delete?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need Pandas here? This can slow down performance and may not be needed if the goal is to just create a csv.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this works as is right now and will require rewriting/testing this, are you cool with leaving it as pandas and I can create another ticket in the R&R epic for updating this method? If its best to get it switched to the most efficient asap, then i can make time this week to update this last part, but if its not pressing then i can do the ticket and get back to it.

tap_mssql/sync_strategies/full_table.py Outdated Show resolved Hide resolved
raise Exception('Length must be at least 1!')

if 0 < length < 8:
LOGGER.info('Length is too small! consider 8 or more characters')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but this could be upgraded to a Warn

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

import datetime
import glob

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these imports are being used here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

select_sql = "SELECT {} FROM {}.{}".format(
",".join(escaped_columns), escaped_db, escaped_table
)
if fastsync:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider moving this logic to a new folder called fast_sync? Then in full_table.py the conditional formatting can be used to decide whether to use generate_select_sql or the fast_sync_generate_select_sql.

This isn't essential, just an idea to keep fast sync separate from other methods already being used in other replication types (like incremental)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah good call. I planned on having a condition within full_table.sync_table() that would call one or the other, so that was something i meant to do but forgot to make note of. This is updated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, i think i read 'folder' as 'function' lol


if catalog_entry.tap_stream_id == "dbo-InputMetadata":
revert_ouput_converter(open_conn, prev_converter)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete blank line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed




def generate_random_string(length: int = 8) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be another function that makes sense to move to a file in a new fast_sync directory.

@@ -0,0 +1,172 @@
"""Functions that write chunked gzipped files."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to move this file up one level out of sync_strategies to tap_mssql.

Copy link
Author

@brose7230 brose7230 Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used that earlier on during the fastsync work and it is no longer needed so i trashed it in the latest push

@@ -2,9 +2,18 @@
# pylint: disable=duplicate-code,too-many-locals,simplifiable-if-expression

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Fast Sync section to the readme.md? There are a few required config changes like - "fastsync_batch_rows" that should be called out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -2,11 +2,19 @@
# pylint: disable=too-many-arguments,duplicate-code,too-many-locals

import copy
import csv

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few changes available in master aren't included here. The change to logical.py that updates the state even if no records changed is not here. See line 301 of logical.py in master - this isn't anywhere in this PR.

@brose7230
Copy link
Author

@eric-roll i didnt want to miss anything from master so i just created a new branch with all these updates: #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants