Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically Generating URLs to Public API Documents & Reducing Hallucination (Invalid URLs) #8

Open
in-c0 opened this issue Jan 16, 2025 · 0 comments
Assignees

Comments

@in-c0
Copy link
Collaborator

in-c0 commented Jan 16, 2025

Issue

The current method involves manually prompting ChatGPT with the following:

"Generate the URLs pointing to the appropriate page of popular public APIs based on this table"
(Refer to api-docs-urls.csv for the table format)

Then, I repeatedly request "10 more" results and merge the tables together. However, this approach often leads to hallucinations (invalid URLs) or incorrect data. This reduces the utility of the generated table and creates additional overhead in validation and correction.

🤔

Proposed Solution

1 - One agent generates and maintains a list of "names" of APIs - hopefully this will lead to less hallucination

2 - One agent tries to find all candidate URLs to the API document page - based on the API names from step 1.

3 - One agent validates the URLs (head requests) so that only live and reachable URLs make it into the final table

4 - One agent connects these candidate URLs to their "best matches" (e.g. Privacy, TOS, etc) (chances are that several APIs exist under a single known name, e.g. OpenAI API -> OpenAI GPT-4 API or OpenAI Embedding API ... in that case, we might have to generate new columns for the sub-APIs)

It's a vague pipeline yet, I'll try to update it as I begin to work on it sometime soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant