Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPL-823 Import PBMC pool plates into Sequencescape #3857

Closed
11 tasks
KatyTaylor opened this issue Jul 10, 2023 · 13 comments · Fixed by #4050 or #4017
Closed
11 tasks

DPL-823 Import PBMC pool plates into Sequencescape #3857

KatyTaylor opened this issue Jul 10, 2023 · 13 comments · Fixed by #4050 or #4017
Assignees

Comments

@KatyTaylor
Copy link
Contributor

KatyTaylor commented Jul 10, 2023

User story
As a member of the SSR team, who has received a plate from faculty containing thawed, pooled PBMCs, I would like to be able to import them into Sequencescape along with all the required data.

Who are the primary contacts for this story
Lesley, Abby, Katy, Liz H

Who is the nominated tester for UAT
Abby, Liz H

Acceptance criteria
To be considered successful the solution must allow:

  • I can upload the plate as a "LRC PBMC Pools Input" plate in Sequencescape
  • Record pools, with the following information:
    • The samples that make up the pool
  • For each sample, record:
    • Study --> currently has to be the same across the whole manifest
    • Donor id
    • Supplier sample name (Vacutainer barcode)
    • Which well it's in
    • Unique tag_depth index (assign automatically on upload, not provided by user), to avoid tag clash and enable compound sample creation when sequencing request is made later on
  • Consider whether to insert rows into the stock resource table in the MLWH or not.
  • Update documentation - it currently has instructions from DPL-809 (linked below) - add similar instructions regarding this import.

References
This story has a non-blocking relationship with:

Additional information
This will use the normal manifest process - SSRs generate the manifest & barcode labels, send them to faculty, who use the barcodes on their plates and send the manifest back filled in.

This is new functionality for manifests - existing manifests that import pools (e.g. multiplexed library manifests) do not include the individual sample information.

@KatyTaylor KatyTaylor added scRNA Size: M Medium - medium effort & risk scRNA MVP - cell banking labels Jul 10, 2023
@KatyTaylor KatyTaylor changed the title DPL-823 Import frozen PBMC tubes into Sequencescape DPL-823 Import frozen PBMC pool tubes into Sequencescape Sep 18, 2023
@KatyTaylor KatyTaylor changed the title DPL-823 Import frozen PBMC pool tubes into Sequencescape DPL-823 Import PBMC pool plates into Sequencescape Sep 18, 2023
@KatyTaylor
Copy link
Contributor Author

Just checked whether plates that go down the existing 5' 10X pipeline have pools in wells, where each individual sample is listed in the LIMS - no, they have one sample per well as far as the LIMS knows (queries below).

SELECT l.id, l.name, l.created_at, r.map_id, COUNT(a.id) AS num_aliquots
FROM labware l
JOIN plate_purposes pp ON l.plate_purpose_id = pp.id
JOIN receptacles r ON l.id = r.labware_id
JOIN aliquots a ON r.id = a.receptacle_id
WHERE pp.name IN ('LBC Stock', 'LBC Cherrypick', 'LBC Aggregate', 'LBC 5p GEX Dil') AND l.created_at > '2023-01-01 00:00:00'
GROUP BY r.id
-- HAVING COUNT(a.id) > 1
ORDER BY l.id DESC, r.map_id;

@Lesley84
Copy link

  1. Who will upload the sample manifest? BioResource team? SSRs? Will need to be picked up by SSRs and Bioresource, but likely to be SSRs. I understand Krista is setting up a meeting where this could be discussed.

  2. Is there any additional data that should be recorded against the plate / samples, except for that mentioned above? Cell counting is a consideration, however, this is likely to be standardised to a set figure that faculty will supply at for immediate loading on the 10x chip.

  3. Will these plates always contain samples that are new to Sequencescape? Or might some previously banked cells from the cell extraction pipeline end up on one of these plates, for instance? The likelihood of this is low, but it may be an edge case we need to address in the longer term, can we pick this up for discussion on Monday?

  4. How will these plates be barcoded? SSRs generate barcodes and send them to faculty? Faculty generate their own barcodes? Assumption would be SSRs, but is a discussion to pick up with :-)

  5. Lesley confirmed on Slack that we need to record the individual sample information - can I ask why this is? I think when multiplexed library manifests are imported, the pool is represented as just one sample. Apologies, I take this back just the information for the pool will be required.

@KatyTaylor
Copy link
Contributor Author

KatyTaylor commented Oct 16, 2023

Removed this from description as it's no longer correct if we're not recording individual sample information. Pasting here in case useful in future:

  • I can record the following against it:
    • Study
    • For each of the samples:
      • Collected By (site)
      • Donor id
      • Which well it's in
    • Retention Instruction (This field is now mandatory for all samples to comply with Sanger sample retention policy.)

Former outstanding questions:

  • Think we don't import untagged pools anywhere in other manifests (check this with SSRs?), so this will be bigger than a normal sample manifest story.
  • Might need to use the tag_depth field to differentiate the samples in a pool (created to avoid tag clash detection for Cardinal), as these are untagged samples being pooled together.

@KatyTaylor
Copy link
Contributor Author

KatyTaylor commented Oct 17, 2023

Some notes from the meeting yesterday:
There's still uncertainty about whether we should import individual donor information here, or just record information at the pool level.

  • By the time the data gets to the iseq_flowcell table in the MLWH, NPG will probably want just a single sample representing the pool, because they can't cope with multiple rows representing different samples with the same tags - this discussion was had for Cardinal and we used "compound samples" to overcome this. Should check this with NPG.
  • The LIMS may need to know which donors make up the pools, in order to prevent multiple samples from the same donor at different time points being pooled together for sequencing.
  • It would be neater if the data looked the same regardless of which entry point the samples came in at (fresh blood, pooled PBMCs, or cDNA)

It looks to me at the moment like the best solution is to use compound samples from the pools plate stage onwards - so each pool would be represented by a single compound sample, but there would be underlying data that recorded which donor samples it was made up of. This would mean:
a) For samples coming through SeqOps from PBMC isolation, converting the samples to "compound samples" at the point of pooling (sanger/limber#1390).
b) For samples coming from faculty ready-pooled (this story), including individual donor information on the manifest and importing them as "compound samples".

However, it needs to be discussed with NPG and with HumGen informatics.

Update:
NPG confirmed they would like to see one row in the iseq_flowcell table per donor pool, as in the Cardinal project.
Have emailed Vivek but not heard back yet.

@KatyTaylor
Copy link
Contributor Author

Confirmed with Danni, these count as 'derivative samples', so we don't need to include a 'retention instruction' as a column of the manifest. All derivative samples are destroyed after 2 years, unless the sample custodian has requested long term storage / to be returned.

@KatyTaylor KatyTaylor added Size: L Large - large effort & risk and removed Size: M Medium - medium effort & risk labels Dec 12, 2023
@KatyTaylor
Copy link
Contributor Author

Q1. Study is currently specified at the whole manifest level. Is it OK here to mandate that all samples in a manifest will fall under the same study?

Q2. Should supplier name be included in the manifest? For the samples coming into SeqOps as blood, we are setting the supplier name to the Blood Vac tube barcode, so if multiple aliquots are taken from the same Blood Vac tube, we know they came from the same donor sample.
We are already including donor id, which is different - multiple samples from the same donor would have the same donor id (but different supplier names / Blood Vac tube barcodes).

@KatyTaylor
Copy link
Contributor Author

Some ideas as to how to implement this:

  1. Generate sheet with ~20 rows per well to fill in.
    1. Ignore rows not filled in (get them to delete autogenerated fields, or check whether a specific field (donor_id) is filled in).
    2. Can continue to generate sanger sample ids upfront, just waste some
    3. New ‘well’ field, that allows multiple rows per well, and adds tag_depth?
  2. Comma-separated list of donor ids, one row per well.
    1. Get rid of ‘collected by’ field - is it needed?
    2. Need some validation on that list, as easy to make mistakes.
    3. Generate (all or additional) sanger sample ids on upload instead of upfront, because otherwise you’d just have one per pool/well.
  3. Have only one sample per well; store list of donor ids in a field for clash detection.
    1. Would be inconsistent with pools coming through from SeqOps (unless we did the same with them).

Mock ups (empty and filled in) for option 1 (multiple rows per well):
Screenshot 2024-02-07 at 15 07 58

Screenshot 2024-02-07 at 15 08 14

Mock ups (empty and filled in) for option 1 (single row per well):
Screenshot 2024-02-07 at 15 08 23

Screenshot 2024-02-07 at 15 08 33

@KatyTaylor KatyTaylor added the On Hold On hold label Feb 21, 2024
@KatyTaylor
Copy link
Contributor Author

KatyTaylor commented Feb 21, 2024

Putting 'on hold' and back in 'to do' until I've had a conversation with R&D. It might not be required (or could be greatly simplified at least), as donor id 'clash' detection (#3940) might not be required.

@KatyTaylor KatyTaylor removed the On Hold On hold label Feb 27, 2024
@KatyTaylor
Copy link
Contributor Author

From Abby on working group meeting 22/02/2024:

it was agreed that samples can be pooled together at sequencing where donors are the same because at this point they will be tagged. We just need to have the donor clash prevention step for pooling before chip loading as at this point there will be no barcoding

There was a question of whether we should still import sample-level information at this stage, to keep the data consistent with that coming down the other route (pooled in SeqOps).
Decided with Andrew & Abdullah yesterday (26/02/2024) to keep it simple and only import the information we need - i.e. modify this story to just import pool-level information.
Emailed Vivek to check with him if this is OK from the data analysis point of view.

@KatyTaylor
Copy link
Contributor Author

Modifying the story description / acceptance criteria (explained in comment above), so archiving the old version here:

User story
As a member of the SSR team, who has received a plate from faculty containing thawed, pooled PBMCs, I would like to be able to import them into Sequencescape along with all the required data.

Who are the primary contacts for this story
Lesley, Abby, Katy, Liz H

Who is the nominated tester for UAT
Abby, Liz H

Acceptance criteria
To be considered successful the solution must allow:

  • I can upload the plate as a "LRC PBMC Pools Input" plate in Sequencescape
  • Record pools, with the following information:
    • The samples that make up the pool
  • For each sample, record:
    • Study --> currently has to be the same across the whole manifest
    • Collected By (site)
    • Donor id
    • Which well it's in
    • Unique tag_depth index (assign automatically on upload, not provided by user), to avoid tag clash and enable compound sample creation when sequencing request is made later on
  • Update documentation - it currently has instructions from DPL-809 (linked below) - add similar instructions regarding this import.

References
This story has a non-blocking relationship with:

Additional information
This will use the normal manifest process - SSRs generate the manifest & barcode labels, send them to faculty, who use the barcodes on their plates and send the manifest back filled in.

This is new functionality for manifests - existing manifests that import pools (e.g. multiplexed library manifests) do not include the individual sample information.

@KatyTaylor
Copy link
Contributor Author

I had half implemented the story before this requirements change. Old code archived in DPL-823-Import-PBMC-pool-plates-into-Sequencescape branch. It allows download of a sample manifest with multiple rows per well, but does not yet allow upload of it.

@KatyTaylor KatyTaylor added the On Hold On hold label Mar 4, 2024
@KatyTaylor
Copy link
Contributor Author

KatyTaylor commented Mar 4, 2024

Check the data gets into the MLWH OK - stock resource table & samples table, then compound samples table when the samples get to the sequencing stage.

See if stock_resource table allows multiple rows for a barcode and coordinate:

SELECT Count(*)
FROM stock_resource
GROUP BY labware_human_barcode, labware_coordinate
HAVING Count(*) > 1;

@KatyTaylor
Copy link
Contributor Author

We decided in discussion with HumGen Informatics to still import the sample-level information within the pools. Notes here - https://docs.google.com/document/d/1y1s9v4324qDVhuLmxnlYihOZOPTpGxVq49EQPugTuYM/edit?usp=sharing

I have reinstated the old acceptance criteria in the story description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment