Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster/better addition of dataset entries #769

Merged
merged 2 commits into from
Oct 13, 2023
Merged

Faster/better addition of dataset entries #769

merged 2 commits into from
Oct 13, 2023

Conversation

bennybp
Copy link
Contributor

@bennybp bennybp commented Oct 13, 2023

Description

Improve the efficiency of adding large numbers of entries to a dataset

  • Fix expensive SQL queries. No need to retrieve all existing entries when adding a single entry. Use batching for large numbers
  • Improve usability of add_entries functions in dataset models

For the first, local tests of adding 10,000 entries to a singlepoint dataset one-by-one show a reduction of 75%, with consistent (rather than steadily increasing) insertion time.

See openmm/spice-dataset#82 (comment) for the initial motivation

Changelog description

Improve the efficiency of adding large numbers of entries to a dataset

Status

  • Code base linted
  • Ready to go

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #769 (6085e73) into main (d1252ac) will decrease coverage by 0.02%.
The diff coverage is 45.00%.

Additional details and impacted files

@bennybp bennybp merged commit b032eb0 into main Oct 13, 2023
16 checks passed
@bennybp bennybp deleted the fix_ds_entry branch October 13, 2023 18:55
@bennybp bennybp changed the title [WIP] Faster/better addition of dataset entries Faster/better addition of dataset entries Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant