Legend has it that the artificial neural network (ANN) is infamously known as the universal approximator, which can fit any existing function. By exploiting this fact, we can build a network that approximates a function that maps spatial positions (x, y, z) and camera rays (these rays are acquired through the calculation of the camera matrix involving viewing directions (θ (rotating along the y-axis), ϕ (rotating along the x-axis)) and the spatial positions) to RGB pixels. Such a network, called the Neural Radiance Field, or NeRF in short, can be used to solve the problem of novel view synthesis of a scene. The network is coerced to overfit the function, which generates an RGB image (and also a depth map). These generated images (the final images are procured by computing the transmittance that is applied to the freshly generated images) from multiple angles are then collected, rendering the 3D representation of a certain object. In this project, a bulldozer from the Tiny NeRF dataset is used.
This notebook contains the implementation of this project.
The visualization of the training process.
The table below reports quantitative results of the implemented NeRF.
Metrics | Test Dataset |
---|---|
PSNR | 17.486 |
SSIM | 0.381 |
The loss curve on the train set and the validation set.
PSNR curve on the train set and the validation set.
SSIM curve on the train set and the validation set.
This GIF shows the qualitative result of the NeRF.
The rendered 3D view of a bulldozer viewed from x = 0, y = 0, z = 3.5, ϕ = −15°, and θ = 0° to 360°.