LLMs and MIMIC #1593

aegis301 · 2023-07-20T08:00:49Z

aegis301
Jul 20, 2023

Hello,

I'm currently researching possibilities for LLMs working in intensive care. MIMIC and especially MIMIC-Note are of high interest for me in this field. However, I was wondering if there are any official guidelines yet in place for working with these models. I assume that I can't just send MIMIC Data over an API like langchain in order to feed them into a running model like ChatGPT. Locally training and running a model should be within the scope of possibilities however, correct? Data safety and reasonable care are high priorities in our research and we want to make sure we're operating within the given legal guidelines.

Thanks so much in advance for the answer.

Answered by alistairewj

Jul 20, 2023

Thank you for being thoughtful about this!

It is clear that sending data over an API is a violation of the DUA.
It's also clear that by storing data on Google Cloud Platform, we are to some extent trusting GCP to house the data, albeit within our complete control (we can delete it, and the data are encrypted such that GCP staff cannot read it).

For LLMs, we are carrying on with the principles behind that second point. So, as of now (July 2023), our stance is:

Sending data to an external API endpoint (Claude, OpenAI, etc): Not permitted.
Deploying a model within your controlled cloud environment and using it (e.g. Azure OpenAI): Permitted.
Running an LLM locally on your own hardware/acc…

View full answer

alistairewj · 2023-07-20T14:44:14Z

alistairewj
Jul 20, 2023
Maintainer

Thank you for being thoughtful about this!

It is clear that sending data over an API is a violation of the DUA.
It's also clear that by storing data on Google Cloud Platform, we are to some extent trusting GCP to house the data, albeit within our complete control (we can delete it, and the data are encrypted such that GCP staff cannot read it).

For LLMs, we are carrying on with the principles behind that second point. So, as of now (July 2023), our stance is:

Sending data to an external API endpoint (Claude, OpenAI, etc): Not permitted.
Deploying a model within your controlled cloud environment and using it (e.g. Azure OpenAI): Permitted.
Running an LLM locally on your own hardware/access controlled local hardware: Permitted.

For the 2nd option, you often have to request to omit human review of the data. It's easy to do but it's an extra step. If the reasons are asked, they are two-fold: it's highly sensitive data, and you have not been granted the right to share the data with the humans for review.

In general we try to keep data on HIPAA compliant cloud services, to be safe, but it's not a strict requirement.

3 replies

aegis301 Jul 20, 2023
Author

Thank you so much for your answer and your work! We will happily comply with your guidelines and hope others will use this thread as guidepost as well.

aegis301 Aug 26, 2023
Author

I'm sorry @alistairewj but I have another specific question concerning this: Is the MIMIC Demo dataset available for working with APIs like the chatGPT API or hosted LLMs? We are trying to build something with Pandas AI and so far there are no locally versions of it available for working locally. But of course we want to operate within your regulations! Thanks for working with us on this.

alistairewj Aug 28, 2023
Maintainer

You can share the demo dataset with the APIs. It is shared under a different license which permits that use.

LevanBokeria · 2024-09-27T10:19:31Z

LevanBokeria
Sep 27, 2024

Thank you for clarifying answers! @alistairewj A follow up on this:

We would like to use an externally-managed HPC (in the UK) to house the data and develop models on the data on the HPC. Encrypting the data would ensure that the HPC admins have no access to it. However, it would make data analysis and model-development near impossible. Are there any other ways in which we could work with MIMIC data on HPCs while adhering to the DUA? For example, would it be sufficient to sign an agreenment with the HPC where we outline our roles and define access limitations to the data, such that the HPC admins will be lawfully bound to not access particular data?

Would you have any pointers to any groups in the UK who have successfully worked with MIMIC data on HPCs?

Thank you so much!

2 replies

LevanBokeria Sep 27, 2024

To clarify, the HPC in question is the Baskerville HPC: https://docs.baskerville.ac.uk/

tompollard Sep 27, 2024
Maintainer

@LevanBokeria please drop us a message at contact@physionet.org and we can try to come up with a solution together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMs and MIMIC #1593

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

LLMs and MIMIC #1593

aegis301 Jul 20, 2023

Replies: 2 comments · 5 replies

alistairewj Jul 20, 2023 Maintainer

aegis301 Jul 20, 2023 Author

aegis301 Aug 26, 2023 Author

alistairewj Aug 28, 2023 Maintainer

LevanBokeria Sep 27, 2024

LevanBokeria Sep 27, 2024

tompollard Sep 27, 2024 Maintainer

aegis301
Jul 20, 2023

Replies: 2 comments 5 replies

alistairewj
Jul 20, 2023
Maintainer

aegis301 Jul 20, 2023
Author

aegis301 Aug 26, 2023
Author

alistairewj Aug 28, 2023
Maintainer

LevanBokeria
Sep 27, 2024

tompollard Sep 27, 2024
Maintainer