Generate Tab

Presets

The [Presets >>] button allows you to save/load tyDiffusion XML preset files, as well as export a ComfyUI workspace containing all of your generation settings. You can also restore default tyDiffusion settings from this menu.

tyDiffusion XML preset files saved in the [tyDiffusion root]\Presets folder will be displayed in the preset menu dropdown. Preset files saved elsewhere will need to be loaded manually.

Style

The style selector allows you to transparently apply style keywords to positive/negative prompts. Default styles are saved in the [tyDiffusion root]\Styles folder. Users should not modify the default styles file as tyDiffusion may overwrite their changes. Users can add their own style presets to tyDiffusion by adding their own style CSV files to that folder with the following internal table format: category,name,prompt,negative_prompt” (one per line, with the prompt entries contained in double-quotes). For example, if a user creates a file named “user.csv” in the Styles folder with the following content:

category,name,prompt,negative_prompt
Custom Style,Test,"pink fluffy {prompt} riding a bicycle, highly detailed",""

…that would create a style called “Test” in the “Custom Style” category of the styles selector menu, which would transparently change a user prompt like “dog” to “pink fluffy dog riding a bycycle” (the {prompt} keyword in a style will be replaced by the user’s current prompt text during generation).

When a style is selected, the selector menu will also have an option to bake the current style, which will swap out the user’s current prompt for the styled result. This allows you to see exactly how a prompt will be changed by a particular style.

Styles are just prompt modifiers. For more control over the visual style of a generated image/animation, you can also use LoRAs.

Prompt

The prompt is a natural-language description of the types of things you would like tyDiffusion to generate. Prompts can contain descriptions, qualifiers, names, etc.

Negative prompt

The negative prompt is a natural-language description of the types of things you would like tyDiffusion to avoid generating in the result.

Prompt adherence is dependent on the quality of a given diffusion model. Some models have better prompt adherence than others. Due to the nature of the Stable Diffusion process, there is no gaurantee that a generated image or animation will correctly adhere to the prompts used to generate it. Prompts may guide results but don’t provide perfect control over them.

Some models/LoRAs (typically those with a recommended CFG value of 1) will not utilize the negative prompt during image/animation generation, so you may not see any changes in the generated result after changing the negative prompt.


Basic Settings Tab

  • Model: the diffusion model (checkpoint) that will be used by tyDiffusion to generate a result.

Press the [🗘] button next to the model dropdown to refresh the model list.

  • VAE: the VAE used to decode latent data into a final image (note: most models contain an embedded VAE and a custom VAE does not need to be specified).

  • Resolution: the specified resolution for resulting images.

SD 1.5 and SD XL models have fairly low resolution limits. If you want to generate images/animations with a resolution outside of those limits, use an upscaler in the Upscale tab.

Clicking the [🡆] button next to the resolution dropdown will apply the selected resolution to 3ds Max’s render settings, allowing the viewport safe-frame rectangle to match the aspect ratio of images generated by tyDiffusion. When the viewport safe-frame rectangle matches the aspect ratio of images generated by tyDiffusion, no cropping/clipping will occur when displaying the tyDiffusion result in the viewport.

  • Mode: controls whether results will be generated from prompts alone (text to image) or from a combination of prompts and image denoise functions (image to image).

Image to image mode can be used to modify a source image (typically whatever is visible in the viewport) - by adjusting the denoise setting, you can control how much influence the Stable Diffusion process has over the result. The higher the denoise value, the more influence the AI will have. A denoise value of 0 means the result will be identical to the input (source) image.

  • Source: the source for the image used by the “image to image” diffusion method.

  • Sampler/Scheduler: the specific algorithms used by the Stable Diffusion process to transform input noise into an output image, in latent space.

Different samplers/schedulers can produce different results (for the same prompt) and there’s not necessarily a right or wrong sampler to use for a given scenario (although some are more commonly used than others).

  • Converse VRAM: when enabled, ComfyUI will use a tiled VAE encode/decoder to convert data to/from latent space, which is slower but will use less VRAM than the default method.

  • Steps: the number of steps the Stable Diffusion process will take, to generate a resulting image/animation. Higher steps usually increase image clarity/quality, with diminishing returns beyond a certain point.

  • CFG scale: controls how much guidance prompts should have over the Stable Diffusion process. Higher values usually increase sharpness/contrast up to a certain point. Values beyond that point may “cook” images, making them look too saturated.

  • Denoise: the amount of influence the AI should have over the result of the Stable Diffusion process when using “image to image” mode.

  • Auto reseed: when enabled, the seed will be changed to a random value after each successful generation.

  • Seed: the random noise seed used to intialize noise in latent space, prior to diffusion. A unique seed typically gaurantees a unique result, even if all other parameters are the same.

When you select a natively-supported diffusion model from the model list, a list of “recommended settings” for the model will appear at the bottom of the Basic Settings tab. Users are encouraged to use these recommendations as a guide for tweaking their generation parameters for that specific model. Clicking the [🡅] button next to the recommended settings list will automatically apply those recommended settings to the tyDiffusion UI.


ControlNets Tab

ControlNets provide users with more precise control over the overall layout, composition and style of generated images than simple prompts/LoRAs. Multiple ControlNets can be enabled at a time and used in combination with one another. tyDiffusion has a variety of hard-coded ControlNets and some general purpose ones as well:

  • Depth: the Depth ControlNet can take an input grayscale image representing the depth of pixels from the camera in z-space, and use it to constrain generated output.

The Depth ControlNet is great for matching the volume and shape of generated objects to those in the viewport.

  • Edges: the Edges ControlNet can take an input grayscale image of contours/outlines and use them to constrain generated output.

The Edges ControlNet is great for matching the shape of generated objects to those in the viewport.

  • IC-Light: the IC-Light ControlNet can take an input image and use it to re-light the generated output.

  • IP-Adapter: the IP-Adapter ControlNet can take input images(s) and use them to style the generated output.

The IP-Adapter ControlNet is great for performing style transfers between input images and generated output. When the IP-Adapter ControlNet is enabled at full strength, no text prompts may be necessary in order to generate outputs that closely match the style of IP-Adapter input images (in other words, when IP-Adapter is enabled, you can leave the text prompt boxes empty and still get meaningful outputs).

  • Pose: the Pose ControlNet can take an input image of a pose skeleton and use it to perform pose matching on humanoids/animals in generated output.

Use the “Biped to OpenPose” source modes in combination with OpenPose model(s) for the best pose accuracy when posing biped rigs in the scene.

  • QR Code Monster: the QR Code Monster ControlNet can take an input grayscale image and use it to separate and constrain background/foreground in generated output.

The QR Code Monster ControlNet is great for constraining the overall composition/layout of the result without affecting its style. It was originally designed to create readable QR Codes with Stable Diffusion, but can be used with any kind of grayscale image input.

  • Custom1/2/3: these ControlNets can use any available model to constrain the result with the specified input images.

You can use the Custom ControlNets to enable more than one instance of other ControlNets. For example, you could assign a QR Code Monster model to a Custom ControlNet, and use it in combination with the hard-coded QR Code Monster ControlNet, in order to use two different QR Code Monster ControlNets at the same time (perhaps with different weights or other parameters).


tyDiffusion ControlNets all share similar parameters, so a breakdown of all parameters in each individual ControlNet’s tab is not necessary - instead, only a single explanation of the shared parameters is included here.

  • Model: the ControlNet model that will be used by tyDiffusion to generate a result.

Press the [🗘] button next to the model dropdown to refresh the model list.

  • Source: the source of the input image that will be sent to the ControlNet model for processing.

  • Blur source the amount of blur that will be applied to the source image, prior to processing by the ControlNet model.

  • Preprocessor: the preprocessor that will be applied to the source image, prior to processing by the ControlNet model.

Different ControlNets expect different types of inputs. For example, Depth ControlNets expect a grayscale depth buffer as input, but pose ControlNets expect an image of a proper pose skeleton as input, etc. The hard-coded ControlNets each have different types of preprocessors which can be used to convert input images into input images in the format expected by the ControlNet in question. Sometimes a preprocessor is not necessary - for example, if you’ve selected “Viewport depth” as the source image for a Depth ControlNet, a preprocessor is not required to convert the image into a grayscale depth buffer because the image is already in that format. In those cases, you can specify “none” as the preprocessor.

  • Weight: the amount of influence a ControlNet will have over the generated result.

  • Start: the place in the Stable Diffusion process where the ControlNet will begin to influence the result, relative to the total number of generation steps.

  • End: the place in the Stable Diffusion process where the ControlNet will stop influencing the result, relative to the total number of generation steps.

If you are generating an image using 20 steps and your ControlNet “start” value is set to 0.5, the ControlNet will only start influencing the result at step 10. Similarly, if you set the “end” value to 0.75, the ControlNet will stop influencing the result at step 15. In this way, you can use the start/end values to fine-tune the exact amount of influence ControlNets will have over every step of the generation process. Starting a ControlNet late will give the AI more control over the generated image’s composition. Ending a ControlNet early will give the AI more control over the generated image’s details.


LoRAs Tab

LoRAs can be used for various purposes during the image/animation generation process - some can be used to constrain output style, others to constrain levels of detail, etc. LoRAs are essentially just extra fine-tuned training data applied to a model.

When you select a LoRA from the [LoRA >>] menu, it will appear in the active LoRA list with a spinner to control its overall strength.

Press the [🗘] button next to the LoRA menu button to refresh the LoRA list.

  • Enable LoRAs: allows you to enable/disable all active LoRAs, without removing them from the active LoRA list.

Animation Tab

AnimateDiff Tab

AnimateDiff if a motion module for Stable Diffusion that can generate animations with temporal coherence. Currently, native support is only provided for SD 1.5 AnimateDiff modules. SD XL AnimateDiff models do exist, but they produce inferior results to SD 1.5 models.

AnimateDiff generates 16 frame sequences of temporally-coherent frames. In order to generate longer sequences, multiple contexts will be blended together by the given context overlap value.

All of tyDiffusion’s upscalers are compatible with AnimateDiff - if you want to generate animations with higher-than-SD resolution, simply enable an upscaler (Hires fix is recommended) and it will automatically process all frames in output animations.

  • Model: specifies which motion module AnimateDiff should use.

AnimateDiff Motion Module v3 (and/or the TemporalDiff Motion Module) can produce more temporally smooth/coherent results than AnimateDiff Motion Module v2, but only AnimateDiff Motion Module v2 supports motion LoRAs. So there are tradeoffs to consider when picking a motion module to use with AnimateDiff and users are encouraged to experiment with all of them.

  • Context overlap: controls how many frames from each AnimateDiff context will be used to blend separate contexts together. Lower values result in more visible differences between contexts, but higher values will increase processing time because more contexts will need to be generated to provide enough overlap for the desired sequence length.

  • Noise: controls how noise in each context’s latent space will be initialized. Some noise modes result in more coherence between contexts, but users are encouraged to experiment with the various modes because they all have pros/cons and they don’t necessarily gaurantee a certain type of result.

  • Beta schedule: controls the behavior of the noise reduction process during animation generation. Different modes can have different impacts on things like overall contrast and detail.

  • Output MP4: when enabled, animated sequences will be encoded into an MP4 file in the specified folder on disk.

  • Output PNG: when enabled, animated sequences will be exported as PNG files in the specified folder on disk.

  • Output preset: when enabled, the settings used to generate the animated sequence will be saved alongside the output MP4/PNG file(s) as a tyDiffusion XML preset file.

  • Base filename: the base filename used to save output MP4/PNGs.

Click the [?] button next to the filename textbox to see a list of supported filename symbols.

  • Version: the version number which can be added to the base filename with the version symbol.

  • Auto increment: when enabled, the version number will be incremented each time an animated sequence is generated.

  • Start farme: the start frame in the 3ds Max timeline to begin the animated sequence export.

  • End frame: when enabled, specifies the end frame in the 3ds Max timeline for the animated sequence. When disabled, animated sequences will be exactly 1 context length long (16 frames).

Sometimes it can be useful to generate shorter animated sequences while tweaking various settings, before exporting a final, full-length sequence. By toggling the “end frame” checkbox, you can quickly switch between short and full-frame sequence export.

Due to the nature of the AnimateDiff algorith, it is not possible to generate animated sequences less than 16 frames long.

  • Every nth frame: frames will be sampled from the 3ds Max timeline at a rate of every Nth frame, beginning at the specified start frame.

  • FPS: when enabled, allows users to control the FPS of output MP4 files. When disabled, the output FPS will be set to 3ds Max’s specified playback FPS.

  • Motion scale: a scale multiplier applied to the motion generated by the AnimateDiff module. Values too large or too small may generate many artifacts. A range of 0.8-1.2 is ideal.

  • Loop: when enabled, AnimateDiff will attempt to match the start frame with the end frame, creating a seamless loop.

  • Interpolation: controls whether frames will be interpolated in a post-process after AnimateDiff has generated them.

The FILM interpolator requires more VRAM but can often produce better/smoother results than the RIFE interpolator.

  • Interpolated frames: controls how many new frames to generate between original frames output by AnimateDiff.

The number of interpolated frames output by the interpolated can be calculated with this formula: ((total frames - 1) * (1 + interpolated frames) + 1). For example, if AnimateDiff outputs 16 frames, and interpolated frames is set to 2, a total of 46 frames will be output by the interpolator ((16 - 1) * (1 + 2) + 1)

Prompt Scheduler tab

The prompt scheduler allows you to synchronize prompts with certain frames during the AnimateDiff generation process. By using the {scheduler} keyword, you can control where (within your main prompt) the scheduled prompts will appear. If the {scheduler}} keyword is not included in your main prompt, scheduled prompts will be appended (rather than inserted) into the main prompt.

By scheduling prompts you can (roughly) direct the result of the animation process, over time.


Upscale Tab

Hires Fix Tab

The Hires Fix upscaler performs an upscale function in latent space. This has the effect of establishing overall composition using the base sampler settings, while adding extra detail using the upscaler’s settings.

  • Upscaler: specifies which upscale model to use to scale the image in latent space.

  • Model mode: controls how prior nodes affecting the input diffusion model are piped into the upscale node within the ComfyUI node graph.

Simple pipe means the diffusion model will be piped directly into the upscale sampler. Partial pipe means the diffusion model will first be processed by any active LoRAs before being piped into the upscale sampler. Full pipe means all prior model-affecting nodes will be processed before the model is piped into the upscale sampler. By adjusting the model mode, you can perform various effects in the upscaler. For example, by setting the model mode to “simple pipe”, you can bypass active LoRAs in the node graph and upscale an image without using the LoRAs to do so - perhaps to prevent the influence of those LoRAs in any detail enhancement that occurs.

  • Model: when enabled, allows you to override which diffusion model will be used in the upscale sampler.

  • Sampler: when enabled, allows you to override the sampler which will be used by the upscaler.

  • Scheduler: when enabled, allows you to override the scheduler which will be used by the upscaler.

  • Prompt: when enabled, allows you to specify a custom prompt that will only be used during upscale sampling.

  • Upscale factor: controls a resolution multiplier that will be used to determine the final upscaled resolution of an image.

  • Steps: when enabled, allows you to override the number of steps taken by the upscale sampler.

Upscale sampling usually requires fewer steps than the base diffusion sampler, so you can override the step count with a much lower value than used in the Basic Settings tab.

  • CFG scale: when enabled, allows you to override the CFG scale value used by the upscale sampler.

Overriding the CFG scale value in the upsampler is especially useful if you’re overriding the diffusion model in the upscaler as well - and the diffusion model override requires a different CFG than the base diffusion model.

  • Denoise: controls how much influence the upscale sampler has over the content of the final upscaled image. A value of 0 means the upscale sampler will have no influence on the final image, and a value of 1 means the upscale sampler will have full influence over the final image. An ideal denoise value is usually between 0.4-0.6.

SD Upscale Tab

The SD Upscaler performs a simple upscale function on an image, with no further post-scale sampling in latent space.

  • Upscaler: specifies which upscale model to use to scale the image.

  • Upscale factor: controls a resolution multiplier that will be used to determine the final upscaled resolution of an image.

Ultimate SD Upscaler

The Ultimate SD Upscaler uses a tiled sampling function to upscale images. Unlike the Hires Fix upscaler (whose VRAM usage will increase if its upscale factor is increased), the Ultimate SD Upscaler is not limited by VRAM - increasing the upscale factor will only increase the time it takes to upscale image, not the amount of VRAM used in the process. For this reason, it’s ideal for performing very large upscales (especially on GPUs with limited VRAM).

The majority of the Ultimate SD Upscaler’s settings are identical to those found in the Hires Fix upscaler, so explanations for the identical settings will not be listed here (instead, reference the Hires Fix documentation above for descriptions of each setting).

  • Fix seams: when enabled, the upscaler will perform extra passes along the seams between tiles in the upscaled image, to reduce tiling artifacts that come as a result of each tile being processed separately.

Detailers Tab

The detailers tab contains settings for adding detailers to an image, which allow you to mask and re-generate certain parts of an image at a higher resolution or with a different prompt.

  • Enable detailers: a global toggle for all detailers.

Detailers settings

  • Enabled: when enabled, the selected detailer will be used.

  • Mode: when painting mode is enabled, users can manually paint masks by clicking the “paint masks” button. When text mode is enabled, users can automatically generate masks based on text specified in the “search for” field.

Most detailer settings (model/VAE/sampler/etc) have the same effect as the corresponding settings in the Basic Settings tab. For brevity, descriptions of those settings will not be repeated here.

  • Search for: text specified here will be used by the segmentation model to generate masks for the input image. For example, entering “face” when the input image is a group of people will create masks around all detected faces within the group.
Mask parameters
  • Precision: a threshold value used to clamp output from the segmentation algorithm. Smaller values generate larger initial masks.

  • Erode: after the segmentation algorithm extracts a mask, it will be eroded (reduced in size) by this many pixels inward.

  • Dilate: after the erosion algorithm erodes a mask, it will be dilated (increased in size) by this many pixels outward.

By eroding and then dilating a mask, you can remove tiny artifacts while maintaining the overall size of the mask.

  • Blur: specifies the amount of blur to add to masked areas, prior to copying them back onto the original image. Larger values will reduce visible seams.

  • Pad: specifies the amount of pixels to pad masks by, prior to extracting areas of the source image intersecting the masks.


  • Detailer prompt: when enabled, the specified prompt will be used to inpaint the masked areas. When disabled, the “search for” text will be used as the prompt.

External Tab

The external tab contains settings for external tools that utilize tyDiffusion.

tyDiffusionTexGen modifier

The tyDiffusionTexGen modifier can be used to generate projection textures on input geometry, and perform inpainting operations to combine multiple projections. Inpainting only occurs when two or more projectors are applied to a tyDiffusionTexGen modifier, as the inpainting process involves filling in textured areas on a mesh occluded by the position of prior projectors.

  • Expand mask: controls how many pixels to expand inpainting masks.

  • Blur mask controls how much to blur inpainting masks.

The inpainting mask is a mask passed to tyDiffusion by a tyDiffusionTexGen modifier to control which areas of a mesh need to be filled in with textures, due to prior-projector occlusion. Increasing these values a modest amount will help to reduce visibilty of seams in resulting textures.

  • Use advanced inpainting model (XL only) when enabled, an advanced inpainting model (that will convert XL diffusion models into XL inpainting models) will be used when inpainting with an XL diffusion model.

  • Enable inpainting overrides: when enabled, the tyDiffusionTexGen modifier will use the overrides to performing its inpainting functions when two or more projectors exist in its projector list.

Inpainting settings (model/VAE/sampler/etc) have the same effect as the corresponding settings in the Basic Settings tab. For brevity, descriptions of those settings will not be repeated here.


Misc Tab

  • ComfyUI source image mode: controls how source images will be sent to ComfyUI for processing. When “Embedded base64 strings” is selected, images will be baked into the JSON format sent to ComfyUI as Base64 strings and decoded in memory. When “Saved raw images” is selected, source images will be saved to disk and sent to ComfyUI as filenames. When “Automatic” is selected, source images will be saved to disk when animation is enabled, otherwise they will be baked as Base64 strings.

Base64 strings will greatly inflate the size of ComfyUI workflows exported from tyDiffusion, but entail that the workflows are fully portable - no (image) files other than the exported JSON file will be needed to load and run the workfow in ComfyUI.

Some users were reporting issues with Base64 string mode when exporting animations - it’s possible the size of a JSON file (inflated with larger numbers of Base64 strings) can cause issues with ComfyUI’s API. For that reason, it’s recommended to set the source image mode to “Automatic” or “Saved raw images” when exporting animations.

  • Show live preview images: when enabled, tyDiffusion will display preview images in the viewport during generation.

  • Update frequency: controls how often to query ComfyUI for preview images during generation.

  • Save source images: when enabled, all source images required to generate an output from tyDiffusion will be saved to disk.

  • Save output images: when enabled, all images generated by tyDiffusion will be saved to disk.

  • Save ComfyUI workflow in PNG metadata: when enabled, all generated images that are saved to disk will include the corresponding ComfyUI workflow in their PNG metadata.

PNG files that contain ComfyUI metadata can be directly loaded into ComfyUI as a workflow.

  • Keep animation frames in VRAM: when enabled, all frames generated by tyDiffusion during the animation generation process will be loaded into VRAM for faster playback in the viewport.

  • Max frames to load from source videos: specifies the maximum number of frames to extract when processing source videos loaded into the tyDiffusion UI.


When a user modifies tyDiffusion settings in the UI, tyDiffusion will internally check for a variety of possible issues related to the current overall configuration of tyDiffusion parameters. If a problem is found, a small, yellow [⚠️] button will appear next to the generate button at the bottom of the UI. Clicking the [⚠️] button will display a popup message describing the potential problem(s) found to the user.