Our Stable Video Diffusion Workflow

before
Input image and output video from SVD workflow.

You can use a variety of methods and workflows to create AI videos. One of these methods is the use of the Stable Video Diffusion (SVD) model. The advantages of SVD are its speed and consistency. The disadvantages are the relatively low number of factors that can be influenced during generation and the limitation to generating 25 frames.

Stable Video Diffusion Workflow
Stable Video Diffusion Workflow for download.

Installation

While SVD can be used in various tools, the following descriptions refer to the use of SVD in ComfyUI.

Install ComfyUI

One of the most straightforward installation options for ComfyUI is Stability Matrix. We have already published an article on how to get started with Stabilit Matrix here. If you don't have Stability Matrix installed yet, simply follow that article and select ComfyUI as your package (instead of "Stable Diffusion WebUI"). We do not need to select a checkpoint model during installation.

If you already have Stability Matrix installed, you can also simply add ComfyUI by clicking the +Add Package button at the bottom of the "Packages" page.

Download SVD Model

The core of SVD is the respective model. There are several options to download the current SVD model (XT 1.1). We can either manually download the model from the official model page on HuggingFace, which requires a HuggingFace account, or use other sources. There are also other sources for the model. For example, a version is also available on CivitAI.

If we use Stability Matrix, we can also find and download the model directly via the "Model Browser" page, so that it is automatically placed in the correct folder. We simply go to the "Model Browser" and type "SVD" in the search bar. If no search result appears, we need to make sure that "Model Type" is set to either "Checkpoint" or "All".

The result we are looking for is the "Stable Video Diffusion - SVD" model with the subtitle "img2vid-xt-1.1". If we click on the corresponding tile, the model page should come up, where we have to make sure to select "img2vid-xt-1.1" and within it "stableVideoDiffusion_img2vidXt11". Then we can click "Import".

If we download the model manually, we can move it to the StabilityMatrix\Data\Models\StableDiffusion directory in Stability Matrix or in the ComfyUI_windows_portable\ComfyUI\models\checkpoints directory in a normal ComfyUI installation.

Install ComfyUI Manager

We can install all necessary extensions directly via Stability Matrix. However, it makes sense to start with the ComfyUI Manager, as it can detect all other necessary extensions.

For this, we go back to the "Packages" page and click on the puzzle icon within the ComfyUI tile. This should show a page with all "Available Extensions". In the search bar, we can simply type "Manager" and activate the checkbox behind the "ComfyUI-Manager" by Dr.Lt.Data. Now an "Install" option should appear at the bottom right. We now click this button to install the ComfyUI Manager.

Once the installation is complete, we can start ComfyUI by clicking the Launch button.

SVD Workflow Download and Start

Once ComfyUI has been started via the Launch button, the terminal within Stability Matrix should open and all further steps should automatically run. When the startup processes are complete, the last line should read To see the GUI go to: http://127.... Either the default web browser should open automatically or we can click the Open Web UI button at the top to open ComfyUI in our browser.

We can now drag and drop the appropriate SVD workflow into the ComfyUI interface to open it. A special workflow for this article can be downloaded using the button below. A similar workflow can also be downloaded from the CivitAI article on SVD.

Install Missing Custom Nodes with ComfyUI Manager

If we drag the workflow as a JSON file into ComfyUI, it is very likely that a message will appear stating that there are missing "Custom Nodes" that were not recognised. This is perfectly fine.

To install the missing nodes, we simply click on the Manager button in ComfyUI. This opens the "ComfyUI Manager Menu" in which we can now click on the "Install Missing Custom Nodes" button. The ComfyUI Manager should now recognise all missing nodes. We select the checkboxes of all nodes on the left side and then click Install on the right side of the node. Following this, the installation process will run. When the installation is complete, a Restart button will appear at the bottom.

Attention! If we restart ComfyUI within the web interface, Stability Matrix does not recognize that ComfyUI is being restarted and there may be problems in the terminal in Stability Matrix. This is not a critical problem, but it is advisable to restart ComfyUI using the blue Restart button in Stability Matrix!

If after the restart the last line again reads To see the GUI go to: http://127..., we can refresh ComfyUI in the browser by simply refreshing the page (e.g., by using the shortcut ctrl/cmd + r). Now the workflow should be displayed without an error message about missing Custom Nodes.

Using SVD

Prepare Input for SVD

It makes sense to first prepare our input image. If, for example, we want an output video that has a 16:9 aspect ratio, we should crop our input image accordingly to match the desired aspect ratio. It is generally helpful to use relatively high-resolution images as the input.

Settings and Nodes

In the next step, we can devote ourselves to the corresponding options and settings of the workflow.

Load Image Node

The workflow starts with the Load Image node, in which we select our input image.

Image Only Checkpoint Loader (img2vid model)

In this node, we only need to select the SVD XT 1.1 model. If we click on the name of the model, a dropdown menu with all available models should appear. If we do not find the SVD model here, we should checked again that the installation of the model took place as described above and that the model is in the correct folder.

VideoLinearCFGGuidance

This node is "collapsed", i.e., minimized, and simply sets the CFG scale to 1. We can see the contents of the node by right-clicking on it and selecting Collapse. In the same way, we can minimize the node again.

SVD_img2vid_Conditioning

In this node, there are some important points. First, the output size is set in pixels. It should be noted that SVD works best with resolutions where the long side is 1024 pixels. For 16:9, for example, this would be a resolution of 1024 x 576 pixels. If the aspect ratio here does not match the input material, the image will automatically be cropped accordingly.

Next, we have the parameter video_frames. Currently, SVD is only able to generate a maximum of 25 frames natively. The most important parameter overall is motion_bucket_id. The default value is 128. The higher the value, the more movement is interpreted into the image. The lower the value, the less movement the output video has. It should be noted that each input image is interpreted differently. Some images require high values, such as 128, while other images work best with values like 4. Accordingly, one can test through various values to find out which produce the desired result.

Lastly, there are the values fps and augmentation_level. FPS stands for Frames Per Second and sets the frame rate. However, this is initially only relevant if we want to render the final video directly as GIF or MP4, for example. It makes sense to leave this value at 6 and make final adjustments in terms of frame rate in external programs later. In the current SVD model, moreover, the augmentation_level value must remain at 0. In old versions of the model, this value had to be adjusted, but this has been superseded since the model XT 1.1.

KSampler

In the KSampler node the images are rendered. Accordingly, we can influence some parameters that can direct the generation. However, the default values can be adopted in most cases.

The seed stands for the initial noise based on which the images are generated. If we keep all settings the same and start the generation with the identical seed, the same output is generated each time. This means, conversely, that if we are already quite close to what we want to achieve with a certain motion bucket value, it can be helpful to let the workflow run another time. As long as the seed is not identical, a new animation will result from this.

Whether a new seed is automatically generated is determined in control_after_generation. If the value is set to randomize, a new seed is automatically generated each time. Alternatively, we can set the value to fixed to keep the same seed.

With more steps, we can theoretically increase the quality of our frames, but in our tests, the difference between 25 and 50 steps was negligible, and especially if you plan on upscaling later, any improvements due to more steps will most likely be rendered irrelevant.

The cfg parameter, or CFG scale does have some impact, as it seems. Values of 7 and higher, break the images and values of 1 and lower make the animation rather obscure. However, higher values generally seem to improve consistency slightly, while lower values seem to produce more extreme movements.

The sampler_name, scheduler, and denoise values already seem best in the default settings in our tests.

VAE Decode

The VAE Decode node simply finalises our generation. We do not need to change anything here.

FILM VFI

The FILM VFI node allows us to perform frame interpolation in ComfyUI. Frame interpolation takes two images within a video and calculates or estimates what an image between these two images might look like. This allows us to artificially extend a video. FILM stands for "Frame Interpolation for Large Motion" and is an algorithm that was developed by Google.

As ckpt_name (i.e., checkpoint model), we should use film_net_fp32.pt for our case. The model should be automatically downloaded with the node. clear_cache_after_n_frames defines, as the name suggests, after how many individual frames the cache is cleared. For computers with less RAM, it may be helpful to lower the value.

The multiplier parameter determines how many frames are added. A multiplier of 4, for example, generates 100 output images from the 25 input images (though it is practically 97 images). Especially with slow movements, multipliers of 4 to 8 can create realistic videos. If the video has very fast or strong movements, however, frame interpolations with lower values (e.g. 2) look more realistic or it may even make sense not to perform any frame interpolation.

In general, the FILM VFI node is set to Bypass in the example workflow (recognisable by the purple overlay). This means that it is ignored by default. This is because it often takes several attempts to find the right settings. Performing frame interpolation each time, even if the result does not have the desired look, unnecessarily slows down the process, so it makes sense to perform the frame interpolation process separately. If we want to undo the Bypass, we can right-click on the node and click on Bypass again. This way, we can also turn off the Save Image node if we do not want to save all individual frames.

Load Image (Path)

The Load Image (Path) node is not connected by default, as we can perform the frame interpolation afterwards. The process is described in detail below.

Save Image

The Save Image node saves all individual frames of the animation. Here we only need to specify the path where we want to save our images. We can either specify an absolute path, such as C:\Users\Username\Desktop, or a relative path like FolderName/FileName. Then, the files are saved in the path StabilityMatrix-win-x64\Data\Packages\ComfyUI\output\FolderName\FileName in a Stability Matrix installation. It is generally sensible to create a new folder for each generation so that the corresponding frames of the animation are always saved together. This makes it easier to interpolate the frames later or put them back together into a MP4 file, for example. It is sufficient to specify FolderName/FileName as the output path, even if the folder does not yet exist. The folder should be created automatically.

Video Combine

The Video Combine node combines the individual frames into a video. We can either use the node as a preview or save it as the final result. frame_rate sets the frame rate. For a preview, the frame rate is less important. If we want to save the video from here directly, standard values are 24, 30, and 60. loop_count determines how often the video is looped. 0 is an endless loop and 1 is no loop. In filename_prefix, we can add a prefix to the file name when saving. In format, we can specify the output format. The combination of video{h264-mp4, yuv420p, and 19 as crf are standard values for MP4 videos. If we turn on save_metadata, a PNG will be saved in addition to the video, which includes the entire workflow. This is quite helpful as we can recreate the workflow of a particular video by simply dragging and dropping the PNG into ComfyUI. The individual images saved via the Save Image node also include the workflow.

The pingpong value creates a loop that plays the video forward and then backward. With save_output, we can automatically save the video. If we set save_output to false, we can still save the video by right-clicking on the video preview and selecting Save preview.

Start Workflow

We start the workflow and the generation by pressing the Queue Prompt button in the ComfyUI menu.

Separate Frame Interpolation

We can use the Load Image (Path) node to perform the frame interpolation process after the initial generation. This assumes that we already have a separate folder where all the frames of the animation are saved. So, if we run the workflow once and have the Save Image node turned on, which saves the individual frames, we can select the output folder as the new input folder here.

We simply click on choose folder to upload and navigate to the corresponding folder. Alternatively, we can click on the folder name in the node and choose the folder that way.

Now we need to connect the small blue dot IMAGES of the Load Image (Path) node with the small blue dot frames in the FILM VFI node. This does not rerun the entire workflow but only the part from the Load Image (Path) node onwards. Now we can adjust the values of the other nodes as we see fit and based on the previous descriptions.

At the end, we should make sure that we specify a new folder in the Save Image node so that the new frames are saved together in a separate folder.

Now we can start the workflow and the generation again by pressing the Queue Prompt button in the ComfyUI menu.

Upscaling the Video

In this article, we describe how we batch-upscale the videos from SVD. Admittedly, the upscaling part of the workflow is by far the most time-consuming one, but it is worth the wait from our perspective.

Conclusion

We love the SVD XT 1.1 model and this workflow for its high consistency and speed. It's really easy and fast to get great results. Sometimes, it might be annoying that there are no more ways to control the output, but generally speaking, it works much better than other, more complex animation workflows as of now.