(论文)[2023-SIG-Course] A Gentle Introduction to ReSTIR: Path Reuse in Real-time (4)

ReSTIR

07-Making ReSTIR fast

  • sampler optimization:算法上的优化
    • what neighbors it chooses to reuse and the choice of MIS weights
  • low-level optimization:实现上的优化
    • improving sample quality at a given performance, or both

Sampler optimization

  • 一种 RIS、ReSTIR 的视角:MIS 组合了多种 estimator
  • 但是 MIS 不一定会提高采样质量
  • 复用周围像素的时候,很可能会存在糟糕的 estimator
  • 于是我们需要找到方法拒绝不好的 estimator
  • 拒绝的时候不能从单个样本考虑:biased!
    • 或者需要从条件概率考虑,很麻烦
  • 启发式
    • ensuring surface normals, depths, and material properties do not vary significantly between reused pixels

Neighbor rejection

  • neighbor rejection
    • 可以被认为是一种 cut-off heuristic
  • 假定:如果在某个域上的重要性低于其他,说明他可能有问题

Contribution MIS weights

  • Generalized RIS 论文提出
  • \(M\)-sample GRIS
    • 重采样权重:$w_i=m_i(T_i(X_i))(T_i(X_i))W_i$
    • 重采样样本:\(Y_i=T_i(X_i)\)
    • 使用如下 UCW 则无偏

\[ W_Y={\color{red}\left[\frac{c_s(Y)}{m_s(Y)}\right]}\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j \]

\[ \mathop{\sum_{i=1}^M}_{y\in T_i(\operatorname{supp}X_i)}c_i(y)=1 \]

  • 加上了红色部分,这样的操作之后,不需要再满足 \(\sum m_i=1\) 的条件(\(m_i\) 的选择更加自由)
    • 求期望的时候,\(m_i\) 被消掉了:\(\mathbb{E}\left[f(Y_i)c_i(Y_i)\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j\right]\)
  • 此时我们可以简单地使用 \(m_i=\dfrac{1}{M}\),MIS 的计算复杂度从平方降到线性 \(O(M)\)
    • 虽然是无偏的,但是可能方差会变大
    • 但是这个分布想要趋近于 \(\hat{p}\)\(m_i\) 还是得满足原始的条件

Pairwise MIS weights

  • 论文:Correlations and Reuse for Fast and Accurate Physically Based Light Transport
  • 通常假设:不同的 estimator 好坏我们提前是不知道的,对于特定的某个 estimator,有些定义域上好,有些坏
  • pairwise:假设存在一个 canonical 的 estimator
    • 能够覆盖整个函数 \(f\) 的定义域
    • 质量较高
  • canonical:当前像素的 estimator
  • pairwise:每一种 estimator 都和 canonical 做一个 balance heuristic,然后归一化
    • 具体如下

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)}{p_{j}(x)+p_{c}(x)}&\\ \end{aligned} \]

  • 效果上加大了 canonical 的权重
  • 但是过度加大了,假设都是 identity 分布,此时 \(m_c(x)=(M-1)m_i(x),i\ne c\)
  • 为了让大家都是 identity 的时候,\(m_c(x)=m_i(x)\),让 \(p_c(x)/(M-1)\)

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)/(M-1)}{p_{j}(x)+p_{c}(x)/(M-1)}&\\ \end{aligned} \]

  • generialzed version:\(p\) 使用 \(\hat{p}\) 代替
  • defensive form:使用代理的 \(\hat{p}\) 可能会存在很大的 \(p_i(x)\) 值,需要保护 canonical estimator,给 \(m_c(x)+1\)

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M}\frac{\hat{p}_{i}(x)}{\hat{p}_{i}(x)+\hat{p}_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=1+\frac{1}{M}\sum_{j\neq c}^{M}\frac{\hat{p}_{c}(x)/(M-1)}{\hat{p}_{j}(x)+\hat{p}_{c}(x)/(M-1)}&\\ \end{aligned} \]

  • 加上 confidence weights(weighted 版本,而不是显式的基于 \(M\) 的具体的样本数量)
  • non-defensive 版本:上面的就相当于 \(c_i=1\)

\[ \begin{aligned} &m_{i}(y) =\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k\neq c}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]

  • defensive

\[ \begin{aligned} &m_{i}(y) =\frac{\sum_{k\ne c}^{M}c_{k}}{\sum_{k=1}^{M}c_{k}}\cdot\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\frac{c_{c}}{\sum_{k=1}^{M}c_{k}}+\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k=1}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]

  • GRIS 发现
    • ReSTIR PT observes the \(O(M)\) pairwise MIS gives comparable convergence behavior as the \(O(M^2)\) balance heuristic
    • 于是将 defensive pairwise MIS 作为空间复用的默认选择

Biased MIS Weights

  • MIS 的计算开销来自于需要重新评估 \(p_i(X_i)\) 在新的 \(\Omega_j\) 上的 \(p_{j}(X_i)\)
    • 如果是光线的话,需要重新追踪
    • 时间上的复用,重新追光线需要保留上一帧的 BVH,这样不好
  • 如果 bias 比较小,我们可以使用一些近似方案
    • using the current frame BVH as a stand in for the prior frame BVH
    • assuming \(p_{j}(X_i)\)
    • recomputing \(p_{j}(X_i)\) using last frame’s data but assuming visibility does not change

\[ m_i(X_i)=\frac{p_i(X_i)}{p_i(X_i)+\boxed{p_j(X_i)}} \]

  • 如果使用有偏的 \(\tilde{p}_j(X_i)\) 代替方框区域(应该是单指计算 \(m_i(X_i)\) 的时候)
    • 偏大 \(\to\) \(m\) 偏小 \(\to\) darkening bias
    • 相同 \(\to\) no bias
    • 偏小 \(\to\) brightening bias
    • 对某些点 \(X_i\) 大,对某些小,整张图片会有些偏亮有些偏暗

Low-level optimization

  • 实现上的优化,需要考虑硬件

    • ****Minimize**** the per-pixel shadow ray count (targeting scenes with millions of lights)

    • Minimize the number of paths traced.

    • Maximize sample reuse; path samples are costly, so reuse each as much as possible to minimize cost per reuse.

    • Minimize correlation in final shading, so denoisers behave better.

    • Maximize parallelization and streaming reuse for GPU utilization (e.g., using weighted reservoir sampling)

    • Minimize size of intermediate buffers (e.g., reservoir size).

    • Minimize memory bandwidth.

    • Minimize execution divergence (ensuring maximal thread counts active in each GPU warp).

    • Minimize memory divergence (to avoid thrashing caches and minimizing memory access costs).

    • Minimize frame time. (ReSTIR benefits significantly from temporal reuse, so overall quality may improve by reducing the quality gained per-frame if you can instead reuse across frames much more quickly.)

    • Plus other traditional low-level optimization targets, e.g., minimizing register usage.

  • World Space ReSTIR

    • 对 ReSTIR 而言,内存占用小,光追数量只对 low-end 设备是瓶颈
    • World Space ReSTIR 存在 bias(Boissé 【2021】)

Sample tiling in ReSTIR DI

  • Highly randomized sampling is bad for caching!
  • 3M emissive triangles,耗时 25ms
    • picking random light candidates
    • 随机采样光源对缓存不友好,大家采样到不同的光源,cache 都 miss 了
  • 1080P 的场景,如果每一个像素都使用一个光源,最多使用 ~2M(\(1920*1080\approx2\text{M}\)) 个光源
  • 如何找出当前帧对应的光源
  • 每一帧采样 \(1/4\) 光源,这样 4 帧便能采样到所有光源,再加上 ReSTIR 复用,就能有好的效果
    • 加速效果还是不明显
  • 考虑 pixel tile
    • 16x16 的 tile,最多采样 26 个光源,将其从一个 1024/2048 个光源集合中一起采样,快了
  • 算法
    • 每一帧采样很多个光源子集 \(S\)(例如根据他们的 intensity 采样)
    • 每一个 pixel tile(\(8\times8/16\times16\)
      • 选择一个光源子集 \(S_i\)
      • 然后从 \(S_i\) 随机选取光源使用(均匀采样,直接 \(1/N\)
  • 通常而言,\(\vert{S}\vert=128,\vert{S_i}\vert=1024\) 就好用了
  • 光源比较少的时候,构建 tile 的开销(~0.1ms)可能占了大头
    • 缓存优势无了

Lighting with many analytic light types

  • 不同的光源类型(不同的采样代码)
    • spheres, quad, cylinders, triangles, environment maps, lines, points, spotlights, meshes
  • 不同像素执行不同的采样逻辑,则导致 execution divergence、cache thrashing
  • 在上面 tile 的基础上,让每一个 \(S_i\) 内部的光源种类相同
  • 这样在 per pixel 执行的时候,一个 pixel tile 内部的光源采样代码相同

Accelerating hybrid shift

  • 直接按照算法 7 执行,可能存在如下问题
    • 复杂执行逻辑的代码,存在前后相关联,可能导致 very high register usage
      • lowering the warp occupancy, and potentially causes register spilling to inflate memory cost.
    • divergence
  • 优化方案:shader time 优化
    • use smaller kernels instead of a big kernel
      • 不同 kernel:path tracing、BSDF re-evaluatation and visibility ray tests
    • Perform stream compaction to map threads only to non-empty ray tracing tasks
      • 很多路径样本不需要 random replay 去重连接
  • 同时会带来内存开销(需要保存 kernel 之间的中间结果)
    • 但是这个和节省的 shader time 相比还是比较少了

09-Advice for getting started

  • Start with a simple ground-truth Monte Carlo path tracer
  • Start simple, with basic RIS
    • 【2005-EGSR】Importance Resampling for Global Illumination
  • Think about rendering bias
    • 需要考虑 bias
  • Spatial reuse alone is easier to debug; combining with temporal reuse gives better quality.
  • Don’t try to get too clever too fast.
    • 循序渐进
  • Basic ReSTIR gives you probability distributions at a point
    • 而不是 voxel
  • Reuse visibility very carefully
    • 可见性可能会导致很多问题
  • ReSTIR accelerates in multiple ways.
  • Think a bit about ReSTIR as subsampling the integration domain