Particle Bind Solver Settings Rollout

tyFlow’s particle bind solver is what solves all inter-particle bindings (aka constraints or joints) within tyFlow (excluding PhysX bindings). At its core, a binding is just a relationship between two particles, and solving many bindings in succession is what gives rise to intricate behaviors seen in materials like dirt, wet sand, cloth, ropes, etc. Proper tuning of the bind solver is important when simulating these complex systems.


Solver Settings

  • Steps: controls the number of substeps per simulation time step to solve all active bindings.

The total number of evaluations per binding per frame can be calculated as (bind steps) x (simulation time steps). The higher the total number of evaluations, the more accurate the solver results will be. For granular simulations, a simulation time step of either “14 Frame” or “18 Frame” with bind solver steps of 5-10 is often adequate. For hires cloth simulations, bind solver steps may need to be much higher in order to maintain cloth stiffness.

  • Stepped force integration: controls whether particle velocities are smoothly added to the bind solver per step, or added only once prior to all bind solver steps. Keeping this enabled has a very minor performance impact but can reduce high velocity artifacts in the resulting simulation.

  • Strict determinism: with this setting turned on, successive runs of a simulation should return identical results. With this setting off, there is no guarantee that bindings will be evaluated in the same order or that race conditions between multiple threads will be prevented, and so results across multiple simulations may vary. Turning this setting on can have a detrimental performance impact, so it’s recommended to keep it off, unless you need the simulation to produce identical results across successive runs.

If you plan on rendering across multiple computers, “strict determinism” must be enabled or else the frames returned by different machines will not be in sync. An alternative to rendering with determinism on is to instead cache out your particles locally and then render a tyCache/PRT/etc loader instead of the tyFlow object itself.

If you choose to render your tyFlow with “strict determinism” on across multiple machines, make sure all machines have consistent OpenCL support. If you have OpenCL acceleration enabled but not all machines that you are using support OpenCL, mixing CPU/GPU solvers can impact determinism even with “strict determinism” enabled.

It is generally a better practice to render a cache of your flow instead of your tyFlow object itself, when rendering across multiple machines. Rendering a cache ensures that hardware differences between computers have no impact on the consistency of the final output.

  • Partition bindings: controls whether bindings will be split into non-overlapping groups, before being solved in a multithreaded manner. Turning this setting off can decrease simulation time, at the cost of increased error accumulation over time. This setting has no effect when OpenCL acceleration is enabled, because OpenCL acceleration requires partitions.

  • Deterministic partitioning: controls whether the partitioning process must avoid threaded race conditions. This setting can usually be disabled for granular flows, but should usually be enabled for cloth/soft-body flows to avoid jittering artifacts.

  • OpenCL acceleration: if an OpenCL2.0-compatible GPU device is found on the system, this option will be available. OpenCL acceleration can increase simulation performance, depending on the power of the available GPU. When enabled, all bindings will be solved on the GPU instead of the CPU.

Enabling OpenCL acceleration does not guarantee a performance boost. There is a fair amount of overhead involved in transferring data to-and-from the GPU during the simulation, that can offset the actual speed boost the GPU offers during its calculation phase. While an overall increase in performance should be expected for very high-end GPUs on systems with few CPU cores, a system with many CPU cores and a low-end GPU may not see much of a performance boost with OpenCL at all. Results will vary across hardware and users must experiment to determine if OpenCL acceleration is right for them. It is not a magic bullet solution.

CUDA Cloth Collision Solver

These settings are global settings for the CUDA cloth collision solver, which operates on all cloth meshes that have CUDA collisions enabled.

  • Repulse steps: the total number of repulsion steps that the collision solver will take in order to solve thickness/friction aspects of colliding cloth meshes.

Increasing repulsion steps can help the collision algorithm maintain more rigid cloth thickness, and also help stabilize cloth in situations where many pieces of cloth are layered on top of each other. However, the returns from increasing this value diminish fairly quickly unless the simulation involves lots of densely layered cloth. For cloth without a lot of layering/folding, it is often better to decrease overall simulation steps than to increase this value too high. Capping this value around 3 to 5 for some extra simulation stability is usually sufficient. For densely layered cloth (dozens/hundreds of cloth layers), values of 20-50+ may be necessary. Keep in mind that unlike IZ steps, repulsion steps have no early termination condition. Thus, increasing the number of repulsion steps will linearly increase the time it takes for the simulation to compute.

  • Repulse mult: the multiplier applied to the overall repulsion strength.

A higher repulsion strength multiplier can help to separate cloth triangles in close proximity with each other faster, but the higher the strength the more elastic (bouncy) the result. It is best to keep this value fairly low.

  • Impulse steps: the maximum number of impulse steps that the collision solver will take in order to sequentially solve intersections between colliding cloth meshes.

Collision impulses help to prevent cloth intersections, but they do not guarantee a collision-free state because one impulse acting on two triangles may actually cause a new intersection to form elsewhere. However, because impulses can be processed faster than Impact Zones, having a few initial impulses generated first helps take pressure off the IZ solver. A small number of impulse steps followed by a large number of IZ steps is the best way to ensure all collisions will be solved after repulsions are processed.

  • Impulse mult: the multiplier applied to the overall impulse strength.

Impulse mult values greater than 1 can sometimes help offset the numerical relaxation that is a result of the way impulses are processes in parallel. If you are solely relying on the impulse solver for intersection prevention but find it requires a lot of iterations in order to produce a correct result, try increasing the impulse mult value to 1.2-1.3.

  • IZ steps: the maximum number of inelastic Impact Zone steps that the collision solver will take each step of the simulation. All collisions may be resolved in less steps, so this setting is purely a limiter which can prevent the solver from taking too long in certain situations.

The Impact Zone solver attempts to resolve all simultaneous collisions in one fell swoop, as opposed to the Impulse solver which attempts to resolve collisions sequentially. The benefit of the Impact Zone solver is that it can resolve very complex collision configurations that the Impulse solver cannot, however the IZ Solver is much slower than the Impulse solver. For this reason, several Impulse steps should be evaluated before the IZ solver is triggered. It is also important to keep IZ steps quite high, in order to catch all collisions, because the IZ solver is a failsafe for both the Repulsion and Impulse solvers.

If the inelastic Impact Zone solver requires more substeps than the specified maximum in order to converge for a particular frame, a warning will be printed to the MAXScript Listener. In some situations with extremely complex intersection configuations, the solver may never converge. Please see the FAQ for more info.

  • IZ thresh: the number of simultaneous collisions that must be present in a single collision-solver Impact Zone in order for general IZ processing to be offloaded to CUDA.

The core of the collision algorithm tracks simultaneous collisions in a cloth-collision adjacency graph called an Impact Zone. When the number of collisions in a graph of adjacent vertices exceeds this value, the processing algorithms for the zone will be executed on the GPU (using CUDA) rather than the CPU. If the IZ thresh value is set too low, the overhead of transferring data to/from the GPU can exceed the time required to process the vertices in parallel on the CPU. You should not need to change this value unless your CPU has very few cores (in which case, you may need to lower the value to offload more work to the GPU). A value that is too large means all IZ processing will happen on the CPU. A value that is too small means all IZ processing will happen on the GPU.

  • CG thresh: the number of simultaneous collisions that must be present in a single collision-solver Impact Zone in order for the Conjugate Gradient portion of the IZ processing algorithm to be offloaded to CUDA.

The CG thresh value determines how many rows/columns must exist in an IZ matrix in order for the Conjugate Gradient portion of the IZ process algorithm to be offloaded to the GPU. Due to the overhead required to transfer matrix data to the GPU, performing the CG method on the GPU will only become faster when a matrix is sufficiently large. You should not need to change this value unless your CPU has very few cores (in which case, you may need to lower the value to offload more work to the GPU). A value that is too large means all CG processing will happen on the CPU. A value that is too small means all CG processing will happen on the GPU.

  • Max IZ Size: controls the maximum number of impacts or nodes that are allowed in each Impact Zone. Lower values may result in more IZs being generated (leading to solver inaccuracies). Higher values may result in much slower performance.

While the solver is running, it will fragment IZs with more impacts/nodes than this value into multiple IZs. The more impacts/nodes per IZ, the greater the accuracy of the collison solver. However, too many impacts/nodes in a single zone can greatly reduce the performance of the solver. Therefore, finding a balance between accuracy and IZ size is important. Usually this value does not need to be changed. Turning on “Print CCCS details” in the debugging rollout will print information about IZ size in the MAXScript listener during each iteration of the solver.

  • Stuck IZ jitter: controls the amount of jitter to apply to particles whose IZ is stuck due to numerical precision issues.

Occasionally an IZ can get “stuck” during a solve. This usually happens when the solution to a particular impact generates a new collision whose solution re-causes the prior impact within a very tiny numerical tolerance, ad infinitum. This manifests as repeated IZ solver iterations whose total impact count doesn’t change over time (turning on CCCS printouts from the Debugging rollout can reveal when these situations occur). Without any artificial jitter applied to impact particles, the CCCS will be caught in a loop until the max number of IZ solver iterations is reached. This can greatly increase simulation time despite the fact that the extra time doesn’t improve the solution. By increasing the “stuck IZ jitter” setting, any time a stuck IZ is detected, all of the offending particles will be artiticially jittered in order to hopefully break past the precision issues resulting in the infinite solver loop. The jittering may introdue new impacts which will need to be solved in the next IZ solver iteration, but if the goal to break out of the loop is achieved then the total processing time will still be lowered. Keeping this value small (but not too small) will help prevent stuck IZs. A value that is 10% of cloth thickness is a good starting place.

  • Stuck IZ limit: controls the number of times the solver will attempt to unstick IZs with jitter before breaking out of the solver loop.

In situations where all remaining IZs are stuck (as explained above) and repeated attempts to unstick them with jitter fail to resolve the remaining collisions, the stuck IZ limit can be used to manually break out of the solver loop early. This can improve simulation time in situations where time is wasted repeatedly attempting to solve collisions which can never be solved even with jitter applied (due to extreme numerical precision issues). For example, if you’ve set the max IZ iterations to 200, and the last remaining IZ becomes stuck at iteration 25, it’s possible for the solver to get stuck in a loop from iteration 26 to 200 without being able to solve the last collision(s) even after repeated attempts to unstick them. In that example, by setting the stuck IZ limit to 3, the solver would break out of the loop after the 3rd failed attempt to unstick the IZ…which would reduce the amount of total simulation time by the amount of time required to perform the 100+ failed IZ iterations.

  • VRAM Usage Type:

Greedy VRAM Usage: the CUDA solver will not clear VRAM after each time step, which allows it to run faster because then VRAM allocations do not need to be fully re-initialized each time step.

Conservative VRAM Usage: the CUDA solver will clear VRAM after each time step, which may slow down the overall simulation due to re-allocations being required each frame, but frees up the VRAM for other processes to use after the solver completes its task.

  • Limit repulsion VRAM: by default, repulsions are generated for every matching vert-face or edge-edge pair of two proximate triangles. On very high resolution cloth meshes, this can require a lot of VRAM. By enabling this setting, you can limit the repulsion solver to only return the nearest vert-face or edge-edge pair of proximate faces, which can greatly reduce VRAM usage during the repulsion phase of the CCCS, at the cost of some accuracy.

If you are getting CUDA allocation errors (usually displayed as error 700), try switching to “conservative VRAM usage” mode (then save the scene and restart 3ds Max, since a restart is required in order to re-initialize CUDA after a crash).

  • 2D solver: enables the 2D mode of the CUDA collision solver

By default, all collisions will be processed in 3D. However, in many situations you may want them to be processed on a flat plane (example: flat splines colliding with each other). Simply creating the splines on a flat plane is not enough to guarantee proper 2D collisions because rounding errors along the “up” axis perpendicular to the desired plane (ex: the Z axis) can prevent all collisions from being detected properly. By enabling the 2D solver, you can ensure that all 2D collisions will be detected on a particular plane, because the solver will switch to a special 2D mode that completely ignores the specified up axis.

  • 2D Up Axis: the desired “up” axis of the 2D plane, which will be ignored by the solver. For example, specifying “Z” as the up axis means all input cloth data will be flattened on the X/Y plane.

Collision Compensation

During each simulation step, collisions are always processed after bindings. No solid geometry collisions are processed while the bind solver evaluates bindings each bind solver step. Because of this, it’s possible for the bind solver to pull particles straight through colliders, only for the subsequent collision step to fix those intersections afterwards. However, even though those intersections are eventually fixed, the rest of the bindings remain unaware that such a collision ever took place, and this can cause visual artifacts within the overall bind network. To compensate for this, particle masses can be artificially adjusted when collisions are detected on the previous simulation substep. Collided particles can be given a heavier mass, so that they won’t be pulled as forcefully by their connected particles. Once previously-collided particles are determined to have no more collisions, their mass will return to normal. The combination of these effects can help reduce visual artifacts in binding networks (like cloth).

  • Mass multiplier: The multiplier applied to the (inverse) mass of particles that collided on the previous simulation step. The smaller the value, the less influence surrounding bindings will have on a collided particle.

  • Interpolation: The interpolation speed used to transition particle masses between the collision compensation value and their original value. Keeping this value low can help prevent jittering artifacts caused by the masses of collided particles switching between the compensation value and their original value too quickly.

Particle Sleeping

By enabling particle sleeping, you can force low-velocity particles to come to a standstill when they would otherwise keep moving over time. This can prevented unwanted motion in particles and forcibly bring jittering particles to rest.

  • Velocity thresh: particles whose velocity magnitude is below this threshold will be considered candidates for sleep.

  • Min duration:: candidate particles whose velocity magnitude remains under the velocity threshold for this duration of time will be put to sleep.

  • Wake thresh: sleeping particles whose velocity exceeds this value at the end of a time step will be awoken.

  • Energy transfer: the amount of neighbor-particle energy that can contribute to waking a particle.

  • Relative to time step: Multiplies threshold velocities by the time step.

Because velocities are integrated each time step, wake/sleep thresholds may be too large by default if your time step is less than 1. For example, if your gravity strength is -1.0 and your sleep threshold is 0.5, particles will not fall asleep when your time step is 1 frame. However, if your time step is 12 frame, particles will fall asleep because at each substep their velocity is increased by 0.5 instead of 1.0 (which matches the sleep threshold). If “relative to time step” is enabled, the wake/sleep thresholds will be multiplied by the time step delta, and so in this example the effective threshold would actually be 0.25 (0.5 * 12) per step.

Particle sleeping has no performance impact. Its impact is purely visual. Sleeping particles will still be evaluated by the solver – the difference is that if they are considered asleep at the end of a time step, they will be returned to their previous location (effectively rendering them motionless).