Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal’s spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIREN, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIREN s can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIREN with hypernetworks to learn priors over the space of SIREN functions.
The following results compare SIREN to a variety of network architectures. TanH, ReLU, Softplus etc. means an MLP of equal size with the respective nonlinearity. We also compare to the recently proposed positional encoding, combined with a ReLU nonlinearity, noted as ReLU P.E. SIREN outperforms all baselines by a significant margin, converges significantly faster, and is the only architecture that accurately represents the gradients of the signal, enabling its use to solve boundary value problems.
A Siren that maps 2D pixel coordinates to a color may be used to parameterize images. Here, we supervise Siren directly with ground-truth pixel values. Siren not only fits the image with a 10 dB higher PSNR and in significantly fewer iterations than all baseline architectures, but is also the only MLP that accurately represents the first- and second order derivatives of the image.
A Siren with a single, time-coordinate input and scalar output may parameterize audio signals. Siren is the only network architecture that succeeds in reproducing the audio signal, both for music and human voice.
A Siren with pixel coordinates together with a time coordinate can be used to parameterize a video. Here, Siren is directly supervised with the ground-truth pixel values, and parameterizes video significantly better than a ReLU MLP.
By supervising only the derivatives of Siren, we can solve Poisson's equation. Siren is again the only architecture that fits image, gradient, and laplacian domains accurately and swiftly.
We can recover an SDF from a pointcloud and surface normals by solving the Eikonal equation, a first-order boundary value problem. SIREN can recover a room-scale scene given only its pointcloud and surface normals, accurately reproducing fine detail, in less than an hour of training. In contrast to recent work on combining voxel grids with neural implicit representations, this stores the full scene in the weights of a single, 5-layer neural network, with no 2D or 3D convolutions, and orders of magnitude fewer parameters. Zoom in to compare fine detail! Note that these SDFs are not supervised with ground-truth SDF / occupancy values, but rather, are the result of solving the above Eikonal boundary value problem. This is a significantly harder task, which requires supervision in the gradient domain (see paper). As a result, architectures whose gradients are not well-behaved perform worse than SIREN.
Here, we use Siren to solve the inhomogeneous Helmholtz equation. ReLU- and Tanh-based architectures fail entirely to converge to a solution.
In the time domain, Siren succeeds to solve the wave equation, while a Tanh-based architecture fails to discover the correct solution.
Check out our related projects on the topic of implicit neural representations!