(论文)[2023-SIG-Course] A Gentle Introduction to ReSTIR: Path Reuse in Real-time (4)
07-Making ReSTIR fast
- sampler optimization:算法上的优化
- what neighbors it chooses to reuse and the choice of MIS weights
- low-level optimization:实现上的优化
- improving sample quality at a given performance, or both
Sampler optimization
- 一种 RIS、ReSTIR 的视角:MIS 组合了多种 estimator
- 但是 MIS 不一定会提高采样质量
- 复用周围像素的时候,很可能会存在糟糕的 estimator
- 于是我们需要找到方法拒绝不好的 estimator
- 拒绝的时候不能从单个样本考虑:biased!
- 或者需要从条件概率考虑,很麻烦
- 启发式
- ensuring surface normals, depths, and material properties do not vary significantly between reused pixels
Neighbor rejection
- neighbor rejection
- 可以被认为是一种 cut-off heuristic
- 假定:如果在某个域上的重要性低于其他,说明他可能有问题
Contribution MIS weights
- Generalized RIS 论文提出
- \(M\)-sample GRIS
- 重采样权重:$w_i=m_i(T_i(X_i))(T_i(X_i))W_i$
- 重采样样本:\(Y_i=T_i(X_i)\)
- 使用如下 UCW 则无偏
\[ W_Y={\color{red}\left[\frac{c_s(Y)}{m_s(Y)}\right]}\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j \]
\[ \mathop{\sum_{i=1}^M}_{y\in T_i(\operatorname{supp}X_i)}c_i(y)=1 \]
- 加上了红色部分,这样的操作之后,不需要再满足 \(\sum m_i=1\) 的条件(\(m_i\) 的选择更加自由)
- 求期望的时候,\(m_i\) 被消掉了:\(\mathbb{E}\left[f(Y_i)c_i(Y_i)\frac{1}{\hat{p}(Y)}\sum_{j=1}^Mw_j\right]\)
- 此时我们可以简单地使用 \(m_i=\dfrac{1}{M}\),MIS
的计算复杂度从平方降到线性 \(O(M)\)
- 虽然是无偏的,但是可能方差会变大
- 但是这个分布想要趋近于 \(\hat{p}\),\(m_i\) 还是得满足原始的条件
Pairwise MIS weights
- 论文:Correlations and Reuse for Fast and Accurate Physically Based Light Transport
- 通常假设:不同的 estimator 好坏我们提前是不知道的,对于特定的某个 estimator,有些定义域上好,有些坏
- pairwise:假设存在一个 canonical 的 estimator
- 能够覆盖整个函数 \(f\) 的定义域
- 质量较高
- canonical:当前像素的 estimator
- pairwise:每一种 estimator 都和 canonical 做一个 balance
- 具体如下
\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)}{p_{j}(x)+p_{c}(x)}&\\ \end{aligned} \]
- 效果上加大了 canonical 的权重
- 但是过度加大了,假设都是 identity 分布,此时 \(m_c(x)=(M-1)m_i(x),i\ne c\)
- 为了让大家都是 identity 的时候,\(m_c(x)=m_i(x)\),让 \(p_c(x)/(M-1)\)
\[ \begin{aligned} &m_{i}(x)=\frac{1}{M-1}\frac{p_{i}(x)}{p_{i}(x)+p_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=\frac{1}{M-1}\sum_{j\neq c}^{M}\frac{p_{c}(x)/(M-1)}{p_{j}(x)+p_{c}(x)/(M-1)}&\\ \end{aligned} \]
- generialzed version:\(p\) 使用 \(\hat{p}\) 代替
- defensive form:使用代理的 \(\hat{p}\) 可能会存在很大的 \(p_i(x)\) 值,需要保护 canonical estimator,给 \(m_c(x)+1\)
\[ \begin{aligned} &m_{i}(x)=\frac{1}{M}\frac{\hat{p}_{i}(x)}{\hat{p}_{i}(x)+\hat{p}_{c}(x)/(M-1)}&(i\neq c)\\ &m_{c}(x)=1+\frac{1}{M}\sum_{j\neq c}^{M}\frac{\hat{p}_{c}(x)/(M-1)}{\hat{p}_{j}(x)+\hat{p}_{c}(x)/(M-1)}&\\ \end{aligned} \]
- 加上 confidence weights(weighted 版本,而不是显式的基于 \(M\) 的具体的样本数量)
- non-defensive 版本:上面的就相当于 \(c_i=1\)
\[ \begin{aligned} &m_{i}(y) =\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k\neq c}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]
- defensive
\[ \begin{aligned} &m_{i}(y) =\frac{\sum_{k\ne c}^{M}c_{k}}{\sum_{k=1}^{M}c_{k}}\cdot\frac{c_{i}\hat{p}_{\leftarrow i}(y)}{\left(\sum_{k\neq c}^{M}c_{k}\right)\hat{p}_{\leftarrow i}(y)+c_{c}\hat{p}_{c}(y)} & (i\neq c) \\ &m_{c}(y) =\frac{c_{c}}{\sum_{k=1}^{M}c_{k}}+\sum_{j\neq c}^M\left(\frac{c_j}{\sum_{k=1}^Mc_k}\right)\frac{c_c\hat{p}_c(y)}{\left(\sum_{k\neq c}^Mc_k\right)\hat{p}_{\leftarrow j}(y)+c_c\hat{p}_c(y)}&\\ \end{aligned} \]
- GRIS 发现
- ReSTIR PT observes the \(O(M)\) pairwise MIS gives comparable convergence behavior as the \(O(M^2)\) balance heuristic
- 于是将 defensive pairwise MIS 作为空间复用的默认选择
Biased MIS Weights
- MIS 的计算开销来自于需要重新评估 \(p_i(X_i)\) 在新的 \(\Omega_j\) 上的 \(p_{j}(X_i)\)
- 如果是光线的话,需要重新追踪
- 时间上的复用,重新追光线需要保留上一帧的 BVH,这样不好
- 如果 bias 比较小,我们可以使用一些近似方案
- using the current frame BVH as a stand in for the prior frame BVH
- assuming \(p_{j}(X_i)\)
- recomputing \(p_{j}(X_i)\) using last frame’s data but assuming visibility does not change
\[ m_i(X_i)=\frac{p_i(X_i)}{p_i(X_i)+\boxed{p_j(X_i)}} \]
- 如果使用有偏的 \(\tilde{p}_j(X_i)\)
代替方框区域(应该是单指计算 \(m_i(X_i)\) 的时候)
- 偏大 \(\to\) \(m\) 偏小 \(\to\) darkening bias
- 相同 \(\to\) no bias
- 偏小 \(\to\) brightening bias
- 对某些点 \(X_i\) 大,对某些小,整张图片会有些偏亮有些偏暗
Low-level optimization
****Minimize**** the per-pixel shadow ray count (targeting scenes with millions of lights)
Minimize the number of paths traced.
Maximize sample reuse; path samples are costly, so reuse each as much as possible to minimize cost per reuse.
Minimize correlation in final shading, so denoisers behave better.
Maximize parallelization and streaming reuse for GPU utilization (e.g., using weighted reservoir sampling)
Minimize size of intermediate buffers (e.g., reservoir size).
Minimize memory bandwidth.
Minimize execution divergence (ensuring maximal thread counts active in each GPU warp).
Minimize memory divergence (to avoid thrashing caches and minimizing memory access costs).
Minimize frame time. (ReSTIR benefits significantly from temporal reuse, so overall quality may improve by reducing the quality gained per-frame if you can instead reuse across frames much more quickly.)
Plus other traditional low-level optimization targets, e.g., minimizing register usage.
World Space ReSTIR
- 对 ReSTIR 而言,内存占用小,光追数量只对 low-end 设备是瓶颈
- World Space ReSTIR 存在 bias(Boissé 【2021】)
Sample tiling in ReSTIR DI
- Highly randomized sampling is bad for caching!
- 3M emissive triangles,耗时 25ms
- picking random light candidates
- 随机采样光源对缓存不友好,大家采样到不同的光源,cache 都 miss 了
- 1080P 的场景,如果每一个像素都使用一个光源,最多使用 ~2M(\(1920*1080\approx2\text{M}\)) 个光源
- 如何找出当前帧对应的光源
- 每一帧采样 \(1/4\) 光源,这样 4
帧便能采样到所有光源,再加上 ReSTIR 复用,就能有好的效果
- 加速效果还是不明显
- 考虑 pixel tile
- 16x16 的 tile,最多采样 26 个光源,将其从一个 1024/2048 个光源集合中一起采样,快了
- 算法
- 每一帧采样很多个光源子集 \(S\)(例如根据他们的 intensity 采样)
- 每一个 pixel tile(\(8\times8/16\times16\))
- 选择一个光源子集 \(S_i\)
- 然后从 \(S_i\) 随机选取光源使用(均匀采样,直接 \(1/N\))
- 通常而言,\(\vert{S}\vert=128,\vert{S_i}\vert=1024\) 就好用了
- 光源比较少的时候,构建 tile 的开销(~0.1ms)可能占了大头
- 缓存优势无了
Lighting with many analytic light types
- 不同的光源类型(不同的采样代码)
- spheres, quad, cylinders, triangles, environment maps, lines, points, spotlights, meshes
- 不同像素执行不同的采样逻辑,则导致 execution divergence、cache thrashing
- 在上面 tile 的基础上,让每一个 \(S_i\) 内部的光源种类相同
- 这样在 per pixel 执行的时候,一个 pixel tile 内部的光源采样代码相同
Accelerating hybrid shift
- 直接按照算法 7 执行,可能存在如下问题
- 复杂执行逻辑的代码,存在前后相关联,可能导致 very high register
- lowering the warp occupancy, and potentially causes register spilling to inflate memory cost.
- divergence
- 复杂执行逻辑的代码,存在前后相关联,可能导致 very high register
- 优化方案:shader time 优化
- use smaller kernels instead of a big kernel
- 不同 kernel:path tracing、BSDF re-evaluatation and visibility ray tests
- Perform stream compaction to map threads only to non-empty ray
tracing tasks
- 很多路径样本不需要 random replay 去重连接
- use smaller kernels instead of a big kernel
- 同时会带来内存开销(需要保存 kernel 之间的中间结果)
- 但是这个和节省的 shader time 相比还是比较少了
09-Advice for getting started
- Start with a simple ground-truth Monte Carlo path tracer
- Start simple, with basic RIS
- 【2005-EGSR】Importance Resampling for Global Illumination
- Think about rendering bias
- 需要考虑 bias
- Spatial reuse alone is easier to debug; combining with temporal reuse gives better quality.
- Don’t try to get too clever too fast.
- 循序渐进
- Basic ReSTIR gives you probability distributions at a
- 而不是 voxel
- Reuse visibility very carefully
- 可见性可能会导致很多问题
- ReSTIR accelerates in multiple ways.
- Think a bit about ReSTIR as subsampling the integration domain