Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic data generation using existing data #11

Open
jvhgit opened this issue Dec 16, 2024 · 3 comments
Open

Synthetic data generation using existing data #11

jvhgit opened this issue Dec 16, 2024 · 3 comments

Comments

@jvhgit
Copy link

jvhgit commented Dec 16, 2024

Hi,

I was wondering about the following.
Is it possible to use existing data such a private code repositories or private documentation and use that as input for the synthetic data generation?
I am looking for a scalable way to leverage unstructered information in my field/company into structered chat format without the need for actual direct user feedback (as you would get from thumbs up/down kind of feedback in an application). I think this could give a model a better starting point for company specific use cases.

Kind regards,

@jvhgit jvhgit changed the title Synthetic data generation Synthetic data generation using existing data Dec 16, 2024
@davidberenstein1957
Copy link
Member

@jvhgit, This is possible for sure, but that beats the purpose of the tool in the current state it is in. In the short term, we do plan on adding support for ingesting custom and private data from the Hub for RAG (#10) and evaluation(#9) purposes. Feel free add some more thoughts and user scenarios that might help us shape the tool to make it as useful as possible to you and the community.

@Israel-Laguan
Copy link

@davidberenstein1957 what about ingesting a json or csv (maybe you can define the schema or we can select in a dropdown, etc.), but the idea of using existing data is sound in these scenarios

@davidberenstein1957
Copy link
Member

Hi @Israel-Laguan,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants