Neural Radiance Field

Part1: Fit a Neural Field to a 2D Image

Model Architecture

The model is a neural field network designed to map 2D coordinates to RGB color values. The architecture is as follows:

  1. Positional Encoding: The input 2D coordinates are first transformed using a positional encoding function. This function applies a series of sine and cosine transformations to the coordinates, effectively increasing the dimensionality of the input and allowing the model to capture high-frequency details.

Neural Network Structure:

Input Layer: The input to the network is the positionally encoded coordinates.

Hidden Layers: The network consists of three hidden layers, each with a ReLU activation function. The number of neurons in each hidden layer is determined by the hidden_dim hyperparameter.

Output Layer: The final layer outputs three values, corresponding to the RGB color channels, and applies a Sigmoid activation function to ensure the outputs are in the range [0, 1].

Hyperparameters

For the training process, the following hyperparameters were chosen:

Hidden Dimension (hidden_dim): This determines the number of neurons in each hidden layer. Various values were tested, including 64 and 256.

Positional Encoding Layers (pos_encoding_L): This parameter controls the number of sine and cosine transformations applied to the input coordinates. Values of 5 and 10 were explored.

Learning Rate: A learning rate of 0.01 was used for the Adam optimizer.

Batch Size: A batch size of 10,000 coordinates was used for each training iteration.

Number of Iterations: The model was trained for 2000 iterations.

1.1comparison

Training Process and Results

PSNR Curve: The Peak Signal-to-Noise Ratio (PSNR) was calculated at regular intervals during training to monitor the model's performance. The PSNR curve showed a steady increase, indicating that the model was learning to reconstruct the image more accurately over time.

Visualization of Predicted Images: During training, the predicted images were visualized at various iterations. Initially, the predictions were noisy and lacked detail, but as training progressed, the images became clearer and more closely resembled the target image.

1.3psnr_curve_Fox

1.2training_progress_Fox

Additional Experiment

The optimization process was also run on another image from the collection. A single set of hyperparameters was chosen for this experiment, and the PSNR curve was plotted to show the model's performance over time. The visualization of the training process demonstrated similar improvements in image quality as seen in the initial experiment.

1.3psnr_curve_apple

training_progress_apple

Hyperparameter Tuning

A hyperparameter tuning process was conducted to identify the best combination of hidden_dim and pos_encoding_L. Multiple configurations were tested, and the results were compared based on the final PSNR values. The tuning process helped in selecting the optimal hyperparameters that provided the best balance between model complexity and performance.

1.4hyperparameter_comparison

Part2: Fit a Neural Radiance Field from Multi-view Images

Part 2.1: Create Rays from Cameras

  1. Camera to World Coordinate Conversion

  1. Pixel to Camera Coordinate Conversion

Part 2.2: Sampling

  1. Sampling Rays from Images

  1. Sampling Points along Rays