ℹ️Finetuning BERT for Question Answering in Finance

Open model in Hugging Face 🤗

Introduction

This project aims to develop a model capable of answering questions based on documents while enhancing content comprehension and generating new context to support the answers. This is particularly useful in educational tools or support systems where deeper understanding and additional context are required.

Objective

Develop a question answering model that can generate new content to support the answers.
Use the SQuAD (Stanford Question Answering Dataset) for training and testing.
Implement the model using HuggingFace datasets and transformers.

Dataset Description

The dataset used in this project is the Stanford Question Answering Dataset (SQuAD). It provides a robust foundation for training question answering systems and can be augmented with generative tasks. Additional data focused on financial information is used to further finetune the model for the financial domain.

Steps and Tasks Performed

1. Download and Prepare the Dataset

Download the SQuAD dataset from HuggingFace datasets.
Split the dataset into training and testing sets.

2. Text Preprocessing

Perform text preprocessing on the training data using an auto tokenizer.
Explore the tokenized output to return the start and end positions of the answer from the context.

3. Hyperparameter Tuning

Use the hyperopt library to find optimal parameters for number of epochs, batch size, learning rate, etc.

4. Model Fine-Tuning

Fine-tune the distilbert-base-uncased model on the SQuAD dataset using TensorFlow.
Save the model and results to the HuggingFace Hub using callbacks.
Use MLflow to log parameters and metrics for experiment tracking.
Utilize TensorBoard callback to log accuracy and loss graphs.

5. Model Evaluation

Reprocess the validation data for evaluation.
Make predictions on the validation data.
Post-process predictions (which are in the form of probabilities) to extract the answer.

6. Evaluation Metrics

Format the actual answers and predicted answers.
Evaluate using the SQuAD evaluation metric.

7. Model Inference

Use the saved model from the HuggingFace Hub to make inferences.
Test with random data using the Pipeline API for quick inferencing.

8. Financial Data Integration

Create financial data from the company's financial reports in the same format as the SQuAD data for training and testing.
Fine-tune the model again on this financial data so that the model can answer questions frequently asked in the financial domain.

Folder Structure

data/: Contains the dataset files.
notebooks/: Jupyter notebooks for data analysis, text preprocessing, and model training.
docs/: Output files including mlruns, visualizations, and plots.

Installation and Usage

Clone the Repository:

git clone https://github.com/MariahFerns/QuestionAnswering-BERT-Finetuned-Finance.git
cd QuestionAnswering-BERT-Finetuned-Finance

Install the required libraries:
```
pip install -r requirements.txt
```
Run Jupyter Notebooks: Navigate to the notebooks/ folder and open the notebooks to explore data analysis and model development.

Results and Findings

The fine-tuned distilbert-base-uncased model showed promising results in question answering tasks.
Evaluation metrics indicated a good level of accuracy ~ 67% in the model's predictions.
The integration of financial data allowed the model to effectively answer domain-specific financial questions with accuracy.

Conclusion

This project successfully developed a model capable of answering questions specifically on financial data. The use of the SQuAD dataset and HuggingFace tools proved effective in training and evaluating the model. This system can be particularly useful in financial systems like banks where quick responses are required by the senior leadership.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
docs/mlruns		docs/mlruns
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ℹ️Finetuning BERT for Question Answering in Finance

Open model in Hugging Face 🤗

Introduction

Objective

Dataset Description

Steps and Tasks Performed

1. Download and Prepare the Dataset

2. Text Preprocessing

3. Hyperparameter Tuning

4. Model Fine-Tuning

5. Model Evaluation

6. Evaluation Metrics

7. Model Inference

8. Financial Data Integration

Folder Structure

Installation and Usage

Results and Findings

Conclusion

About

Packages

Languages

License

MariahFerns/QuestionAnswering-BERT-Finetuned-Finance

Folders and files

Latest commit

History

Repository files navigation

ℹ️Finetuning BERT for Question Answering in Finance

Open model in Hugging Face 🤗

Introduction

Objective

Dataset Description

Steps and Tasks Performed

1. Download and Prepare the Dataset

2. Text Preprocessing

3. Hyperparameter Tuning

4. Model Fine-Tuning

5. Model Evaluation

6. Evaluation Metrics

7. Model Inference

8. Financial Data Integration

Folder Structure

Installation and Usage

Results and Findings

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages