(论文)[2023-SIG-Course] A Gentle Introduction to ReSTIR: Path Reuse in Real-time (4)

Posted on 2024-10-02 Edited on 2025-03-10 Views:

ResTIR 加速

ReSTIR

07-Making ReSTIR fast

sampler optimization：算法上的优化
- what neighbors it chooses to reuse and the choice of MIS weights
low-level optimization：实现上的优化
- improving sample quality at a given performance, or both

Sampler optimization

一种 RIS、ReSTIR 的视角：MIS 组合了多种 estimator
但是 MIS 不一定会提高采样质量
复用周围像素的时候，很可能会存在糟糕的 estimator
于是我们需要找到方法拒绝不好的 estimator
拒绝的时候不能从单个样本考虑：biased！
- 或者需要从条件概率考虑，很麻烦
启发式
- ensuring surface normals, depths, and material properties do not vary significantly between reused pixels

Neighbor rejection

neighbor rejection
- 可以被认为是一种 cut-off heuristic
假定：如果在某个域上的重要性低于其他，说明他可能有问题

Contribution MIS weights

Generalized RIS 论文提出
$M$-sample GRIS
- 重采样权重：$w_i=m_i(T_i(X_i))(T_i(X_i))W_i$
- 重采样样本：$Y_i=T_i(X_i)$
- 使用如下 UCW 则无偏

\[ W_Y={\color{red}\left[\frac{c_s(Y)}{m_s(Y)}\right]}\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j \]

\[ \mathop{\sum_{i=1}^M}_{y\in T_i(\operatorname{supp}X_i)}c_i(y)=1 \]

加上了红色部分，这样的操作之后，不需要再满足 $\sum m_i=1$ 的条件（$m_i$ 的选择更加自由）
- 求期望的时候，$m_i$ 被消掉了：$\mathbb{E}\left[f(Y_i)c_i(Y_i)\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j\right]$
此时我们可以简单地使用 $m_i=\dfrac{1}{M}$，MIS 的计算复杂度从平方降到线性 $O(M)$
- 虽然是无偏的，但是可能方差会变大
- 但是这个分布想要趋近于 $\hat{p}$，$m_i$ 还是得满足原始的条件

Pairwise MIS weights

论文：Correlations and Reuse for Fast and Accurate Physically Based Light Transport
通常假设：不同的 estimator 好坏我们提前是不知道的，对于特定的某个 estimator，有些定义域上好，有些坏
pairwise：假设存在一个 canonical 的 estimator
- 能够覆盖整个函数 $f$ 的定义域
- 质量较高
canonical：当前像素的 estimator
pairwise：每一种 estimator 都和 canonical 做一个 balance heuristic，然后归一化
- 具体如下

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)}{p_{j}(x)+p_{c}(x)}&\\ \end{aligned} \]

效果上加大了 canonical 的权重
但是过度加大了，假设都是 identity 分布，此时 $m_c(x)=(M-1)m_i(x),i\ne c$
为了让大家都是 identity 的时候，$m_c(x)=m_i(x)$，让 $p_c(x)/(M-1)$

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)/(M-1)}{p_{j}(x)+p_{c}(x)/(M-1)}&\\ \end{aligned} \]

generialzed version：$p$ 使用 $\hat{p}$ 代替
defensive form：使用代理的 $\hat{p}$ 可能会存在很大的 $p_i(x)$ 值，需要保护 canonical estimator，给 $m_c(x)+1$

\[ \begin{aligned} &m_{i}(x)=\frac{1}{M}\frac{\hat{p}_{i}(x)}{\hat{p}_{i}(x)+\hat{p}_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=1+\frac{1}{M}\sum_{j\neq c}^{M}\frac{\hat{p}_{c}(x)/(M-1)}{\hat{p}_{j}(x)+\hat{p}_{c}(x)/(M-1)}&\\ \end{aligned} \]

加上 confidence weights（weighted 版本，而不是显式的基于 $M$ 的具体的样本数量）
non-defensive 版本：上面的就相当于 $c_i=1$

\[ \begin{aligned} &m_{i}(y) =\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k\neq c}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]

defensive

\[ \begin{aligned} &m_{i}(y) =\frac{\sum_{k\ne c}^{M}c_{k}}{\sum_{k=1}^{M}c_{k}}\cdot\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\frac{c_{c}}{\sum_{k=1}^{M}c_{k}}+\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k=1}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]

GRIS 发现
- ReSTIR PT observes the $O(M)$ pairwise MIS gives comparable convergence behavior as the $O(M^2)$ balance heuristic
- 于是将 defensive pairwise MIS 作为空间复用的默认选择

Biased MIS Weights

MIS 的计算开销来自于需要重新评估 $p_i(X_i)$ 在新的 $\Omega_j$ 上的 $p_{j}(X_i)$
- 如果是光线的话，需要重新追踪
- 时间上的复用，重新追光线需要保留上一帧的 BVH，这样不好
如果 bias 比较小，我们可以使用一些近似方案
- using the current frame BVH as a stand in for the prior frame BVH
- assuming $p_{j}(X_i)$
- recomputing $p_{j}(X_i)$ using last frame’s data but assuming visibility does not change

\[ m_i(X_i)=\frac{p_i(X_i)}{p_i(X_i)+\boxed{p_j(X_i)}} \]

如果使用有偏的 $\tilde{p}_j(X_i)$ 代替方框区域（应该是单指计算 $m_i(X_i)$ 的时候）
- 偏大 $\to$ $m$ 偏小 $\to$ darkening bias
- 相同 $\to$ no bias
- 偏小 $\to$ brightening bias
- 对某些点 $X_i$ 大，对某些小，整张图片会有些偏亮有些偏暗

Low-level optimization

实现上的优化，需要考虑硬件
- Minimize the per-pixel shadow ray count (targeting scenes with millions of lights)
- Minimize the number of paths traced.
- Maximize sample reuse; path samples are costly, so reuse each as much as possible to minimize cost per reuse.
- Minimize correlation in final shading, so denoisers behave better.
- Maximize parallelization and streaming reuse for GPU utilization (e.g., using weighted reservoir sampling)
- Minimize size of intermediate buffers (e.g., reservoir size).
- Minimize memory bandwidth.
- Minimize execution divergence (ensuring maximal thread counts active in each GPU warp).
- Minimize memory divergence (to avoid thrashing caches and minimizing memory access costs).
- Minimize frame time. (ReSTIR benefits significantly from temporal reuse, so overall quality may improve by reducing the quality gained per-frame if you can instead reuse across frames much more quickly.)
- Plus other traditional low-level optimization targets, e.g., minimizing register usage.
World Space ReSTIR
- 对 ReSTIR 而言，内存占用小，光追数量只对 low-end 设备是瓶颈
- World Space ReSTIR 存在 bias（Boissé 【2021】，with as-yet unpublished theory）

Sample tiling in ReSTIR DI

Highly randomized sampling is bad for caching!
3M emissive triangles，耗时 25ms
- picking random light candidates
- 随机采样光源对缓存不友好，大家采样到不同的光源，cache 都 miss 了
1080P 的场景，如果每一个像素都使用一个光源，最多使用 ~2M（$1920*1080\approx2\text{M}$）个光源
如何找出当前帧对应的光源
每一帧采样 $1/4$ 光源，这样 4 帧便能采样到所有光源，再加上 ReSTIR 复用，就能有好的效果
- 加速效果还是不明显
考虑 pixel tile
- 16x16 的 tile，最多采样 26 个光源，将其从一个 1024/2048 个光源集合中一起采样，快了
算法
- 每一帧采样很多个光源子集 $S$（例如根据他们的 intensity 采样）
- 每一个 pixel tile（$8\times8/16\times16$）
  - 选择一个光源子集 $S_i$
  - 然后从 $S_i$ 随机选取光源使用（均匀采样，直接 $1/N$）
通常而言，$\vert{S}\vert=128,\vert{S_i}\vert=1024$ 就好用了
光源比较少的时候，构建 tile 的开销（~0.1ms）可能占了大头
- 缓存优势无了

Lighting with many analytic light types

不同的光源类型（不同的采样代码）
- spheres, quad, cylinders, triangles, environment maps, lines, points, spotlights, meshes
不同像素执行不同的采样逻辑，则导致 execution divergence、cache thrashing
在上面 tile 的基础上，让每一个 $S_i$ 内部的光源种类相同
这样在 per pixel 执行的时候，一个 pixel tile 内部的光源采样代码相同

Accelerating hybrid shift

直接按照算法 7 执行，可能存在如下问题
- 复杂执行逻辑的代码，存在前后相关联，可能导致 very high register usage
  - lowering the warp occupancy, and potentially causes register spilling to inflate memory cost.
- divergence
优化方案：shader time 优化
- use smaller kernels instead of a big kernel
  - 不同 kernel：path tracing、BSDF re-evaluatation and visibility ray tests
- Perform stream compaction to map threads only to non-empty ray tracing tasks
  - 很多路径样本不需要 random replay 去重连接
同时会带来内存开销（需要保存 kernel 之间的中间结果）
- 但是这个和节省的 shader time 相比还是比较少了

09-Advice for getting started

Start with a simple ground-truth Monte Carlo path tracer
Start simple, with basic RIS
- 【2005-EGSR】Importance Resampling for Global Illumination
Think about rendering bias
- 需要考虑 bias
Spatial reuse alone is easier to debug; combining with temporal reuse gives better quality.
Don’t try to get too clever too fast.
- 循序渐进
Basic ReSTIR gives you probability distributions at a point
- 而不是 voxel
Reuse visibility very carefully
- 可见性可能会导致很多问题
ReSTIR accelerates in multiple ways.
Think a bit about ReSTIR as subsampling the integration domain