Here is the comparison table between V2 and V3 A higher number means that results arrive less often at the film buffer, but reduce the CPU overhead during rendering and as a consequence can improve performance, too.Ĭomparison for VRAM/RAM Usage Capabilities tile samples" controls the number of samples per pixel Octane renders until it takes the result and stores it in the film buffer. The change in performance depends on the scene, the GPU architecture and the number of shader processors the GPU has. If you set it to a high value, more graphics memory is needed rendering becomes faster. If you set it to a small value, Octane requires less memory to store the samples state, but most likely renders a bit slower. + "Parallel samples" controls how many samples we calculate in parallel. To give you some control over the kernel execution we added two options to the direct lighting / path tracing / info channel kernel nodes: And the CPU is stressed a bit more since it has to do more work to do many more kernel launches. There are two major consequences coming with this new approach: Octane needs to keep information for every sample that is calculated in parallel between kernel calls, which requires additional GPU memory. there are a lot more kernel calls are happening than in the past. To solve the problem, we split the big task of calculating a sample into smaller steps which are then processed one by one by the CUDA threads. Also OSL and OpenCL are pretty much impossible to implement this way.
We changed this for various reasons, the main one being the fact that the integration kernels got really huge and impossible to optimize. Since the beginning of Octane the integration kernels had one CUDA thread calculate one complete sample.