Skip to content
This repository has been archived by the owner on Nov 7, 2019. It is now read-only.

Collection tables #16

Merged
merged 6 commits into from
Aug 5, 2017
Merged

Collection tables #16

merged 6 commits into from
Aug 5, 2017

Conversation

b5
Copy link
Member

@b5 b5 commented Jul 21, 2017

This PR allows collections to scale, and makes collections do cooler stuff. It relies on datatogether/sql_datastore#2 to do that cooler stuff.

In the interest of getting the Collections feature out the door, I hacked in a very-terrible implementation of collections where they were stored as JSON on the collections table. This is bad. Won't scale. Will be slow. Badbadnotgood.

So here we refactor away the bad bits, replacing them with paginated collection items, allowing collections to scale up to 10s / 100s of thousands of entries if needed.

Cool. "Scale". Whatever. The more interesting part is that collection items are now composed of entries of Urls, which we can do more interesting things with. For example, if I add a url to my collection, and you add the same url to your collection, they're now formally linked. If Sentry crawls that link and updates the status of that URL, it updates for both of our collections. I can make notes about that url in my collection, and you can do the same in yours, and those notes won't get crossed, but we can create views that show both of our notes on that urls. If I add metadata about that Url, your collection can see it. If that url is in a primer, we can surface that too.

Things are (hopefully) about to get fuego once we surface this stuff in the frontend.

id, url,created,updated,creator_key_id,
name,email,event_name,agency_name,
agency_id,subagency_id,org_id,suborg_id,subprimer_id,
ftp,database,interactive,many_files,
comments
from uncrawlables
where url = $1;`
FROM uncrawlables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why uncrawlables?

Copy link
Member

@dcwalk dcwalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Once again, in no way comprehensive or technical --

  • wondering at the choice of uncrawlables for terminology?
  • I don't always understand why code bits are commented out (will they go away eventually? be reintroduced?)
  • should some TODOs inline make their way to issues?

@b5
Copy link
Member Author

b5 commented Jul 24, 2017

Thx for review!

  • Consensus is forming that our "uncrawlables" phrasing is a problem (@ebarry would also like to drop the uncrawlables phrase), and there are others that are worth considering. This a carry over from our Data Rescue work, and I'm totally down to change it. I'm hoping the next PR on this repo will be refactoring naming conventions, but I want to first write a glossary of terms (on a technical level) and then start a discussion from there about changing naming conventions moving forward.
  • Code bits are often commented out b/c they represent work that I'm not fully willing to part with yet, or require further consideration. They will absolutely go away with rounds of refactoring, just not to that level of maturity yet :)
  • Oh my, if someone did a Find all on TODO and started to generate issues from that I'd be forever grateful, especially if those issues referenced the name of the file that the TODO is in & included a copy of the comment. If that doesn't happen, it's no worries, during rounds of refactoring I try to cut into the TODO list

@dcwalk
Copy link
Member

dcwalk commented Aug 3, 2017

Cool, I've created #17 and #18 from your comments. Good to go from me (for what that is worth)

@b5
Copy link
Member Author

b5 commented Aug 5, 2017

Thx team, merging!

@b5 b5 merged commit 0e3d00d into master Aug 5, 2017
@b5 b5 deleted the collection_tables branch October 25, 2017 18:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants