Particle Bind Solver Settings Rollout

tyFlow’s particle bind solver is what solves all inter-particle bindings (aka constraints or joints) within tyFlow (excluding PhysX bindings). At its core, a binding is just a relationship between two particles, and solving many bindings in succession is what gives rise to intricate behaviors seen in materials like dirt, wet sand, cloth, ropes, etc. Proper tuning of the bind solver is important when simulating these complex systems.

Solver Settings

  • Steps: controls the number of substeps per simulation time step to solve all active bindings.

The total number of evaluations per binding per frame can be calculated as (bind steps) x (simulation time steps). The higher the total number of evaluations, the more accurate the solver results will be. For granular simulations, a simulation time step of either “14 Frame” or “18 Frame” with bind solver steps of 5-10 is often adequate. For hires cloth simulations, bind solver steps may need to be much higher in order to maintain cloth stiffness.

  • Stepped force integration: controls whether particle velocities are smoothly added to the bind solver per step, or added only once prior to all bind solver steps. Keeping this enabled has a very minor performance impact but can reduce high velocity artifacts in the resulting simulation.

  • Strict determinism: with this setting turned on, successive runs of a simulation should return identical results. With this setting off, there is no guarantee that bindings will be evaluated in the same order or that race conditions between multiple threads will be prevented, and so results across multiple simulations may vary. Turning this setting on can have a detrimental performance impact, so it’s recommended to keep it off, unless you need the simulation to produce identical results across successive runs.

If you plan on rendering across multiple computers, “strict determinism” must be enabled or else the frames returned by different machines will not be in sync. An alternative to rendering with determinism on is to instead cache out your particles locally and then render a tyCache/PRT/etc loader instead of the tyFlow object itself.

If you choose to render your tyFlow with “strict determinism” on across multiple machines, make sure all machines have consistent OpenCL support. If you have OpenCL acceleration enabled but not all machines that you are using support OpenCL, mixing CPU/GPU solvers can impact determinism even with “strict determinism” enabled.

It is generally a better practice to render a cache of your flow instead of your tyFlow object itself, when rendering across multiple machines. Rendering a cache ensures that hardware differences between computers have no impact on the consistency of the final output.

  • Partition bindings: controls whether bindings will be split into non-overlapping groups, before being solved in a multithreaded manner. Turning this setting off can decrease simulation time, at the cost of increased error accumulation over time. This setting has no effect when OpenCL acceleration is enabled, because OpenCL acceleration requires partitions.

  • Deterministic partitioning: controls whether the partitioning process must avoid threaded race conditions. This setting can usually be disabled for granular flows, but should usually be enabled for cloth/soft-body flows to avoid jittering artifacts.

  • OpenCL acceleration: if an OpenCL2.0-compatible GPU device is found on the system, this option will be available. OpenCL acceleration can increase simulation performance, depending on the power of the available GPU. When enabled, all bindings will be solved on the GPU instead of the CPU.

Enabling OpenCL acceleration does not guarantee a performance boost. There is a fair amount of overhead involved in transferring data to-and-from the GPU during the simulation, that can offset the actual speed boost the GPU offers during its calculation phase. While an overall increase in performance should be expected for very high-end GPUs on systems with few CPU cores, a system with many CPU cores and a low-end GPU may not see much of a performance boost with OpenCL at all. Results will vary across hardware and users must experiment to determine if OpenCL acceleration is right for them. It is not a magic bullet solution.

CUDA Cloth Collision Solver

These settings are global settings for the CUDA cloth collision solver, which operates on all cloth meshes that have CUDA collisions enabled.

  • Repulse steps: the total number of repulsion steps that the collision solver will take in order to solve thickness/friction aspects of colliding cloth meshes.

Increasing repulsion steps can help the collision algorithm maintain more rigid cloth thickness, and also help stabilize cloth in situations where many pieces of cloth are layered on top of each other. However, the returns from increasing this value diminish fairly quickly unless the simulation involves lots of densely layered cloth. For cloth without a lot of layering/folding, it is often better to decrease overall simulation steps than to increase this value too high. Capping this value around 3 to 5 for some extra simulation stability is usually sufficient. For densely layered cloth (dozens/hundreds of cloth layers), values of 20-50+ may be necessary. Keep in mind that unlike IZ steps, repulsion steps have no early termination condition. Thus, increasing the number of repulsion steps will linearly increase the time it takes for the simulation to compute.

  • Repulse mult: the multiplier applied to the overall repulsion strength.

A higher repulsion strength multiplier can help to separate cloth triangles in close proximity with each other faster, but the higher the strength the more elastic (bouncy) the result. It is best to keep this value fairly low.

  • Impulse steps: the maximum number of impulse steps that the collision solver will take in order to sequentially solve intersections between colliding cloth meshes.

Collision impulses help to prevent cloth intersections, but they do not guarantee a collision-free state because one impulse acting on two triangles may actually cause a new intersection to form elsewhere. However, because impulses can be processed faster than Impact Zones, having a few initial impulses generated first helps take pressure off the IZ solver. A small number of impulse steps followed by a large number of IZ steps is the best way to ensure all collisions will be solved after repulsions are processed.

  • IZ steps: the maximum number of inelastic Impact Zone steps that the collision solver will take each step of the simulation. All collisions may be resolved in less steps, so this setting is purely a limiter which can prevent the solver from taking too long in certain situations.

The Impact Zone solver attempts to resolve all simultaneous collisions in one fell swoop, as opposed to the Impulse solver which attempts to resolve collisions sequentially. The benefit of the Impact Zone solver is that it can resolve very complex collision configurations that the Impulse solver cannot, however the IZ Solver is much slower than the Impulse solver. For this reason, several Impulse steps should be evaluated before the IZ solver is triggered. It is also important to keep IZ steps quite high, in order to catch all collisions, because the IZ solver is a failsafe for both the Repulsion and Impulse solvers.

If the inelastic Impact Zone solver requires more substeps than the specified maximum in order to converge for a particular frame, a warning will be printed to the MAXScript Listener. In some situations with extremely complex intersection configuations, the solver may never converge. Please see the FAQ for more info.

  • IZ thresh: the number of simultaneous collisions that must be present in a single collision-solver Impact Zone in order for general IZ processing to be offloaded to CUDA.

The core of the collision algorithm tracks simultaneous collisions in a cloth-collision adjacency graph called an Impact Zone. When the number of collisions in a graph of adjacent vertices exceeds this value, the processing algorithms for the zone will be executed on the GPU (using CUDA) rather than the CPU. If the IZ thresh value is set too low, the overhead of transferring data to/from the GPU can exceed the time required to process the vertices in parallel on the CPU. You should not need to change this value unless your CPU has very few cores (in which case, you may need to lower the value to offload more work to the GPU). A value that is too large means all IZ processing will happen on the CPU. A value that is too small means all IZ processing will happen on the GPU.

  • CG thresh: the number of simultaneous collisions that must be present in a single collision-solver Impact Zone in order for the Conjugate Gradient portion of the IZ processing algorithm to be offloaded to CUDA.

The CG thresh value determines how many rows/columns must exist in an IZ matrix in order for the Conjugate Gradient portion of the IZ process algorithm to be offloaded to the GPU. Due to the overhead required to transfer matrix data to the GPU, performing the CG method on the GPU will only become faster when a matrix is sufficiently large. You should not need to change this value unless your CPU has very few cores (in which case, you may need to lower the value to offload more work to the GPU). A value that is too large means all CG processing will happen on the CPU. A value that is too small means all CG processing will happen on the GPU.

  • Max IZ Size: controls the maximum number of impacts or nodes that are allowed in each Impact Zone. Lower values may result in more IZs being generated (leading to solver inaccuracies). Higher values may result in much slower performance.

While the solver is running, it will fragment IZs with more impacts/nodes than this value into multiple IZs. The more impacts/nodes per IZ, the greater the accuracy of the collison solver. However, too many impacts/nodes in a single zone can greatly reduce the performance of the solver. Therefore, finding a balance between accuracy and IZ size is important. Usually this value does not need to be changed. Turning on “Print CCCS details” in the debugging rollout will print information about IZ size in the MAXScript listener during each iteration of the solver.

  • Greedy VRAM Usage: when enabled, the CUDA solver will not clear VRAM after each time step, which allows it to run faster because then VRAM allocations do not need to be fully re-initialized each time step. On GPUs with small amounts of VRAM (less than 6GB), this may need to be disabled for hires cloth simulations.

  • Limit repulsion VRAM: by default, repulsions are generated for every matching vert-face or edge-edge pair of two proximate triangles. On very high resolution cloth meshes, this can require a lot of VRAM. By enabling this setting, you can limit the repulsion solver to only return the nearest vert-face or edge-edge pair of proximate faces, which can greatly reduce VRAM usage during the repulsion phase of the CCCS, at the cost of some accuracy.

  • 2D solver: enables the 2D mode of the CUDA collision solver

By default, all collisions will be processed in 3D. However, in many situations you may want them to be processed on a flat plane (example: flat splines colliding with each other). Simply creating the splines on a flat plane is not enough to guarantee proper 2D collisions because rounding errors along the “up” axis perpendicular to the desired plane (ex: the Z axis) can prevent all collisions from being detected properly. By enabling the 2D solver, you can ensure that all 2D collisions will be detected on a particular plane, because the solver will switch to a special 2D mode that completely ignores the specified up axis.

  • 2D Up Axis: the desired “up” axis of the 2D plane, which will be ignored by the solver. For example, specifying “Z” as the up axis means all input cloth data will be flattened on the X/Y plane.

Collision Compensation

During each simulation step, collisions are always processed after bindings. No solid geometry collisions are processed while the bind solver evaluates bindings each bind solver step. Because of this, it’s possible for the bind solver to pull particles straight through colliders, only for the subsequent collision step to fix those intersections afterwards. However, even though those intersections are eventually fixed, the rest of the bindings remain unaware that such a collision ever took place, and this can cause visual artifacts within the overall bind network. To compensate for this, particle masses can be artificially adjusted when collisions are detected on the previous simulation substep. Collided particles can be given a heavier mass, so that they won’t be pulled as forcefully by their connected particles. Once previously-collided particles are determined to have no more collisions, their mass will return to normal. The combination of these effects can help reduce visual artifacts in binding networks (like cloth).

  • Mass multiplier: The multiplier applied to the (inverse) mass of particles that collided on the previous simulation step. The smaller the value, the less influence surrounding bindings will have on a collided particle.

  • Interpolation: The interpolation speed used to transition particle masses between the collision compensation value and their original value. Keeping this value low can help prevent jittering artifacts caused by the masses of collided particles switching between the compensation value and their original value too quickly.

Particle Sleeping

By enabling particle sleeping, you can force low-velocity particles to come to a standstill when they would otherwise keep moving over time. This can prevented unwanted motion in particles and forcibly bring jittering particles to rest.

  • Velocity thresh: particles whose velocity magnitude is below this threshold will be considered candidates for sleep.

  • Min duration:: candidate particles whose velocity magnitude remains under the velocity threshold for this duration of time will be put to sleep.

  • Wake thresh: sleeping particles whose velocity exceeds this value at the end of a time step will be awoken.

  • Energy transfer: the amount of neighbor-particle energy that can contribute to waking a particle.

  • Relative to time step: Multiplies threshold velocities by the time step.

Because velocities are integrated each time step, wake/sleep thresholds may be too large by default if your time step is less than 1. For example, if your gravity strength is -1.0 and your sleep threshold is 0.5, particles will not fall asleep when your time step is 1 frame. However, if your time step is 12 frame, particles will fall asleep because at each substep their velocity is increased by 0.5 instead of 1.0 (which matches the sleep threshold). If “relative to time step” is enabled, the wake/sleep thresholds will be multiplied by the time step delta, and so in this example the effective threshold would actually be 0.25 (0.5 * 12) per step.

Particle sleeping has no performance impact. Its impact is purely visual. Sleeping particles will still be evaluated by the solver – the difference is that if they are considered asleep at the end of a time step, they will be returned to their previous location (effectively rendering them motionless).