You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current method involves manually prompting ChatGPT with the following:
"Generate the URLs pointing to the appropriate page of popular public APIs based on this table"
(Refer to api-docs-urls.csv for the table format)
Then, I repeatedly request "10 more" results and merge the tables together. However, this approach often leads to hallucinations (invalid URLs) or incorrect data. This reduces the utility of the generated table and creates additional overhead in validation and correction.
🤔
Proposed Solution
1 - One agent generates and maintains a list of "names" of APIs - hopefully this will lead to less hallucination
2 - One agent tries to find all candidate URLs to the API document page - based on the API names from step 1.
3 - One agent validates the URLs (head requests) so that only live and reachable URLs make it into the final table
4 - One agent connects these candidate URLs to their "best matches" (e.g. Privacy, TOS, etc) (chances are that several APIs exist under a single known name, e.g. OpenAI API -> OpenAI GPT-4 API or OpenAI Embedding API ... in that case, we might have to generate new columns for the sub-APIs)
It's a vague pipeline yet, I'll try to update it as I begin to work on it sometime soon!
The text was updated successfully, but these errors were encountered:
Issue
The current method involves manually prompting ChatGPT with the following:
"Generate the URLs pointing to the appropriate page of popular public APIs based on this table"
(Refer to api-docs-urls.csv for the table format)
Then, I repeatedly request "10 more" results and merge the tables together. However, this approach often leads to hallucinations (invalid URLs) or incorrect data. This reduces the utility of the generated table and creates additional overhead in validation and correction.
🤔
Proposed Solution
1 - One agent generates and maintains a list of "names" of APIs - hopefully this will lead to less hallucination
2 - One agent tries to find all candidate URLs to the API document page - based on the API names from step 1.
3 - One agent validates the URLs (head requests) so that only live and reachable URLs make it into the final table
4 - One agent connects these candidate URLs to their "best matches" (e.g. Privacy, TOS, etc) (chances are that several APIs exist under a single known name, e.g. OpenAI API -> OpenAI GPT-4 API or OpenAI Embedding API ... in that case, we might have to generate new columns for the sub-APIs)
It's a vague pipeline yet, I'll try to update it as I begin to work on it sometime soon!
The text was updated successfully, but these errors were encountered: