Skip to content

Wooah3400/terraform-genai-knowledge-base

 
 

Repository files navigation

Generative AI Knowledge Base

Description

Tagline

Fine tune an LLM model to answer questions from your documents.

Detailed

This solution showcases how to extract question & answer pairs out of documents using Generative AI. It provides an end-to-end demonstration of QA extraction and fine-tuning of a large language model (LLM) on Vertex AI. Along the way, the solution utilizes Document AI Character Recognition (OCR), Firestore, Vector Search, Vertex AI Studio, and Cloud Functions.

Architecture

Knowledge Base using Generative AI

  • Uploading a new document triggers the webhook Cloud Function.
  • Document AI extracts the text from the document file.
  • Indexes the document text in Vector Search.
  • A Vertex AI Large Language Model generates questions and answers from the document text.
  • The questions and answers pairs are saved into Firestore.
  • A fine tuning dataset is generated from the Firestore database.
  • After human validation, a fine tuned Large Language Model is deployed and saved in the Model Registry.

Prerequisites

Documentation

Deployment Duration

Configuration: 2 mins Deployment: 6 mins

Cost

Cost Details

Inputs

Name Description Type Default Required
disable_services_on_destroy Whether project services will be disabled when the resources are destroyed. bool false no
documentai_location Document AI location, see https://cloud.google.com/document-ai/docs/regions string "us" no
firestore_location Firestore location, see https://firebase.google.com/docs/firestore/locations string "nam5" no
labels A set of key/value label pairs to assign to the resources deployed by this blueprint. map(string) {} no
project_id The Google Cloud project ID to deploy to string n/a yes
region The Google Cloud region to deploy to string "us-central1" no
unique_names Whether to use unique names for resources bool false no

Outputs

Name Description
bucket_docs_name The name of the docs bucket created
bucket_main_name The name of the main bucket created
docs_index_endpoint_id The ID of the docs index endpoint
docs_index_id The ID of the docs index
documentai_processor_id The full Document AI processor path ID
firestore_database_name The name of the Firestore database created
neos_tutorial_url The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution
predictions_notebook_url The URL to open the notebook for model predictions in Colab
unique_id The unique ID for this deployment

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

  • Storage Admin: roles/storage.admin

The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.

APIs

A project with the following APIs enabled must be used to host the resources of this module:

  • Google Cloud Storage JSON API: storage-api.googleapis.com

The Project Factory module can be used to provision a project with the necessary APIs enabled.

Contributing

Refer to the contribution guidelines for information on contributing to this module.

Security Disclosures

Please see our security disclosure process.

Packages

No packages published

Languages

  • HCL 29.2%
  • Jupyter Notebook 27.6%
  • Python 23.3%
  • Shell 12.3%
  • Makefile 4.6%
  • Go 3.0%