Tackling SMALL NORB : Data Efficient Modelling

The smallNORB dataset is a datset for 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).
Curated and Open Sourced by Yann LeCun & Fu Jie Huang. Source

What makes this Dataset challenging?

Do not use full dataset for experiments, work with 25%, 50%, 75%. Data efficency is paramount.
The Test set and Train set are made up of different toys, so your model can win if and only if it learns what is common between toys in train and test set e.g Humans -> have limbs and legs, what they wear doesnt add much to their classfication.
The images are limited in features, they are binocular images. two images taken with slight devaition theta, simmilar to human vision (stereovision),and the images have small spatial resolution(96 x 96). so model can win only if it is able to leverage from this scarce features
Positional metadata is a nominal cateroigal variable, it has positional information, but how will model learn from it and correlate with image data and this combination of nominal categorical variables? Lets find out

Salient Features Applied

We have 2 channels of grayscale images, to utilize Transfer Learning out of the box we need 3 channels or modify network architectures, can we do better?
How about handpicking features that might be helpful? The images have no textures in them which is another hurdle, to tackle both of this, we can make a 3rd channel but of what?
Image Gradients a.k.a Sobel Edges will be an ideal 3rd Channel because it creates an image emphasising edges which will mimic textures here and help us to use Pretrained models without tweaking theri architecture as well.
Since we are dealing with only a subset of data, we need to take maximum advantage of limited features available so that our moded (has small bias) and at the same time not overfit (have small variance). This is a Tradeoff Game however this reduces our search space to models with lesser parameters and better feature propagation. so potential candidates would be MobileNets,InceptionNets,ResNets,DenseNets
InceptionNets dont work with images of resolution 96x96, what's the go around? Upscale the spatial resolution.
MobileNets worked out of the box and gave test accuracies in 95% Zone
ResNets worked similarly and gave good performance across different data regimes(25%,50%,75%)
DenseNets performed the best and gave accuracies close to SOTA by using (75% of training : 75% of 75% for training and 25% of 75% for validation), it makes sense because DenseNets oncontary to resnets which have additive connections, densenets concatenate features directly from previous layers similar to Feature Pyramid Networks, which works here because we have information scarcity so densenets here alleviate vanishing gradients problems & strengthen feature propagation.
In these experiments, DenseNet121 was experimened thouroughly and it was the most data efficent model but it also involves few more tricks discussed below.
Positonal Metadata : Thinking beyond images is important to push the last 1-2 of accuracy%, because the positional metadata has information about lightining, azimuth and camera angle which supplements image data because it indireclty tells about shadow position, its shape and is an important feature in this task.
To use Positonal Metadata, the final architecture has to be modified, after tha images are convoliuted and reduced to a mutilimensional vector, conctenate the image vector with this positional metadata vector and retrain the network with pretrained weigths loaded in the layers prior to this modification.
One Caveat would be Image augmentations such as Horizontal & Vertical Flips, RandomBrighntess, Colorjitter will no longer make sense as positional metadata information will be left useless with these augmentations.
Another tweak was to get rid of plenty of whitespaces, this can be done with cropping inwards and upscaling it back to original dimension, this boosts accuracy further as local convolutions can focus/attenuate more on figurine rather than background
Upscaling to 224 x 224 enhances accuracy by a slight margin across three domains i.e. (25%, 50%, 75%)

Thus, after exhaustive experiments, on private data, by training on 18.75% ,37.5% and 56.35% train data, I achieved accuracies of 97.65%, 98.20% and 98.757% respectively. In comparison global SOTA with 100% data is 98.77%. Source

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
50_small_norb.ipynb		50_small_norb.ipynb
README.md		README.md
SmallNorb75.ipynb		SmallNorb75.ipynb
small_norb25.ipynb		small_norb25.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tackling SMALL NORB : Data Efficient Modelling

What makes this Dataset challenging?

Salient Features Applied

Thus, after exhaustive experiments, on private data, by training on 18.75% ,37.5% and 56.35% train data, I achieved accuracies of 97.65%, 98.20% and 98.757% respectively. In comparison global SOTA with 100% data is 98.77%. Source

About

Releases

Packages

Languages

SK124/TacklingSmallNORB

Folders and files

Latest commit

History

Repository files navigation

Tackling SMALL NORB : Data Efficient Modelling

What makes this Dataset challenging?

Salient Features Applied

Thus, after exhaustive experiments, on private data, by training on 18.75% ,37.5% and 56.35% train data, I achieved accuracies of 97.65%, 98.20% and 98.757% respectively. In comparison global SOTA with 100% data is 98.77%. Source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages