Skip to content

A neural network architecture that automatically generate captions from images.

Notifications You must be signed in to change notification settings

echatzidaki/Image_Captioning

Repository files navigation

Image-Captioning

Instructions

  1. Clone this repo: https://github.com/cocodataset/cocoapi
git clone https://github.com/cocodataset/cocoapi.git  
  1. Setup the coco API (also described in the readme here)
cd cocoapi/PythonAPI  
make  
cd ..
  1. Download some specific data from here: http://cocodataset.org/#download (described below)
  • Under Annotations, download:

    • 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
    • 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
  • Under Images, download:

    • 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
    • 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
    • 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)
  1. The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order (0_Dataset.ipynb, 1_Preliminaries.ipynb, 2_Training.ipynb, 3_Inference.ipynb).

Notebooks

Notebook 0: Dataset

  1. Initialize the COCO API
  2. Plot a Sample Image

Notebook 1: Preliminaries

  1. Explore the Data Loader
  2. Use the Data Loader to Obtain Batches
  3. Experiment with the CNN Encoder
  4. Implement the RNN Decoder

Notebook 2: Training

  1. Training Setup
  2. Train the Model
  3. Validate the Model

Notebook 3: Inference

  1. Get Data Loader for Test Dataset
  2. Load Trained Models
  3. Finish the Sampler
  4. Clean up the Captions
  5. Generate Predictions

About

A neural network architecture that automatically generate captions from images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published