-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tap with fastsync #8
base: release
Are you sure you want to change the base?
Conversation
|
||
columns.extend(['_SDC_EXTRACTED_AT','_SDC_DELETED_AT','_SDC_BATCHED_AT']) | ||
|
||
query_df = df = pd.DataFrame(columns=columns) #TODO: delete? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Pandas here? This can slow down performance and may not be needed if the goal is to just create a csv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this works as is right now and will require rewriting/testing this, are you cool with leaving it as pandas and I can create another ticket in the R&R epic for updating this method? If its best to get it switched to the most efficient asap, then i can make time this week to update this last part, but if its not pressing then i can do the ticket and get back to it.
raise Exception('Length must be at least 1!') | ||
|
||
if 0 < length < 8: | ||
LOGGER.info('Length is too small! consider 8 or more characters') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but this could be upgraded to a Warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tap_mssql/sync_strategies/common.py
Outdated
import datetime | ||
import glob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of these imports are being used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
tap_mssql/sync_strategies/common.py
Outdated
select_sql = "SELECT {} FROM {}.{}".format( | ||
",".join(escaped_columns), escaped_db, escaped_table | ||
) | ||
if fastsync: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider moving this logic to a new folder called fast_sync
? Then in full_table.py
the conditional formatting can be used to decide whether to use generate_select_sql
or the fast_sync_generate_select_sql
.
This isn't essential, just an idea to keep fast sync separate from other methods already being used in other replication types (like incremental)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah good call. I planned on having a condition within full_table.sync_table() that would call one or the other, so that was something i meant to do but forgot to make note of. This is updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, i think i read 'folder' as 'function' lol
|
||
if catalog_entry.tap_stream_id == "dbo-InputMetadata": | ||
revert_ouput_converter(open_conn, prev_converter) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
||
|
||
|
||
def generate_random_string(length: int = 8) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be another function that makes sense to move to a file in a new fast_sync
directory.
@@ -0,0 +1,172 @@ | |||
"""Functions that write chunked gzipped files.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to move this file up one level out of sync_strategies
to tap_mssql
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i used that earlier on during the fastsync work and it is no longer needed so i trashed it in the latest push
@@ -2,9 +2,18 @@ | |||
# pylint: disable=duplicate-code,too-many-locals,simplifiable-if-expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a Fast Sync section to the readme.md? There are a few required config changes like - "fastsync_batch_rows" that should be called out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tap_mssql/sync_strategies/common.py
Outdated
@@ -2,11 +2,19 @@ | |||
# pylint: disable=too-many-arguments,duplicate-code,too-many-locals | |||
|
|||
import copy | |||
import csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a few changes available in master aren't included here. The change to logical.py
that updates the state even if no records changed is not here. See line 301 of logical.py
in master - this isn't anywhere in this PR.
@eric-roll i didnt want to miss anything from master so i just created a new branch with all these updates: #11 |
No description provided.