Our Stable Video Diffusion Workflow
You can use a variety of methods and workflows to create AI videos. One of these methods is the use of the Stable Video Diffusion (SVD) model. The advantages of SVD are its speed and consistency. The disadvantages are the relatively low number of factors that can be influenced during generation and the limitation to generating 25 frames.

Installation
While SVD can be used in various tools, the following descriptions refer to the use of SVD in ComfyUI.
Install ComfyUI
One of the most straightforward installation options for ComfyUI is Stability Matrix. We have already published an article on how to get started with Stabilit Matrix here. If you don't have Stability Matrix installed yet, simply follow that article and select ComfyUI as your package (instead of "Stable Diffusion WebUI"). We do not need to select a checkpoint model during installation.
If you already have Stability Matrix installed, you can also simply add ComfyUI by clicking the
+Add Package
button at the bottom of the "Packages" page.
Download SVD Model
The core of SVD is the respective model. There are several options to download the current SVD model (XT 1.1). We can either manually download the model from the official model page on HuggingFace, which requires a HuggingFace account, or use other sources. There are also other sources for the model. For example, a version is also available on CivitAI.
If we use Stability Matrix, we can also find and download the model directly via the "Model Browser" page, so that it is automatically placed in the correct folder. We simply go to the "Model Browser" and type "SVD" in the search bar. If no search result appears, we need to make sure that "Model Type" is set to either "Checkpoint" or "All".
The result we are looking for is the "Stable Video Diffusion - SVD" model with the subtitle "img2vid-xt-1.1". If we click on the corresponding tile, the model page should come up, where we have to make sure to select "img2vid-xt-1.1" and within it "stableVideoDiffusion_img2vidXt11". Then we can click "Import".
If we download the model manually, we can move it to the
StabilityMatrix\Data\Models\StableDiffusion
directory in Stability Matrix or in the
ComfyUI_windows_portable\ComfyUI\models\checkpoints
directory in a normal ComfyUI
installation.
Install ComfyUI Manager
We can install all necessary extensions directly via Stability Matrix. However, it makes sense to start with the ComfyUI Manager, as it can detect all other necessary extensions.
For this, we go back to the "Packages" page and click on the puzzle
icon within the ComfyUI
tile. This should show a page with all "Available Extensions". In the search bar, we can simply type
"Manager" and activate the checkbox behind the "ComfyUI-Manager" by Dr.Lt.Data. Now an "Install"
option should appear at the bottom right. We now click this button to install the ComfyUI Manager.
Once the installation is complete, we can start ComfyUI by clicking the
Launch
button.
SVD Workflow Download and Start
Once ComfyUI has been started via the
Launch
button, the terminal within Stability Matrix should
open and all further steps should automatically run. When the startup processes are complete, the
last line should read To see the GUI go to: http://127...
. Either the default web
browser should
open automatically or we can click the
Open Web UI
button at the top to open ComfyUI in our
browser.
We can now drag and drop the appropriate SVD workflow into the ComfyUI interface to open it. A special workflow for this article can be downloaded using the button below. A similar workflow can also be downloaded from the CivitAI article on SVD.
Install Missing Custom Nodes with ComfyUI Manager
If we drag the workflow as a JSON file into ComfyUI, it is very likely that a message will appear stating that there are missing "Custom Nodes" that were not recognised. This is perfectly fine.
To install the missing nodes, we simply click on the Manager
button in ComfyUI. This
opens the
"ComfyUI Manager Menu" in which we can now click on the "Install Missing Custom Nodes" button. The
ComfyUI Manager should now recognise all missing nodes. We select the checkboxes of all nodes on the
left side and then click Install
on the right side of the node. Following this, the
installation
process will run. When the installation is complete, a Restart
button will appear at
the bottom.
Attention! If we restart ComfyUI within the web interface, Stability Matrix does
not recognize that
ComfyUI is being restarted and there may be problems in the terminal in Stability Matrix. This is
not a critical problem, but it is advisable to restart ComfyUI using the blue
Restart
button
in Stability Matrix!
If after the restart the last line again reads To see the GUI go to: http://127...
, we
can refresh
ComfyUI in the browser by simply refreshing the page (e.g., by using the shortcut ctrl/cmd + r). Now
the workflow should be displayed without an error message about missing Custom Nodes.
Using SVD
Prepare Input for SVD
It makes sense to first prepare our input image. If, for example, we want an output video that has a 16:9 aspect ratio, we should crop our input image accordingly to match the desired aspect ratio. It is generally helpful to use relatively high-resolution images as the input.
Settings and Nodes
In the next step, we can devote ourselves to the corresponding options and settings of the workflow.
Load Image Node
The workflow starts with the Load Image
node, in which we select our input image.
Image Only Checkpoint Loader (img2vid model)
In this node, we only need to select the SVD XT 1.1
model. If we click on the name of
the model, a
dropdown menu with all available models should appear. If we do not find the SVD model here, we
should checked again that the installation of the model took place as described above and that
the model is in the correct folder.
VideoLinearCFGGuidance
This node is "collapsed", i.e., minimized, and simply sets the CFG scale to 1. We can see the
contents of the node by right-clicking on it and selecting Collapse
. In the same way,
we can
minimize the node again.
SVD_img2vid_Conditioning
In this node, there are some important points. First, the output size is set in pixels. It should be noted that SVD works best with resolutions where the long side is 1024 pixels. For 16:9, for example, this would be a resolution of 1024 x 576 pixels. If the aspect ratio here does not match the input material, the image will automatically be cropped accordingly.
Next, we have the parameter video_frames
. Currently, SVD is only able to generate a
maximum of 25
frames natively.
The most important parameter overall is motion_bucket_id
. The default value is
128
. The higher
the value,
the more movement is interpreted into the image. The lower the value, the less movement the output
video has. It should be noted that each input image is interpreted differently. Some images require
high values, such as 128
, while other images work best with values like 4
.
Accordingly, one can
test through various values to find out which produce the desired result.
Lastly, there are the values fps
and augmentation_level
. FPS stands for
Frames Per Second and
sets the frame rate. However, this is initially only relevant if we want to render the final
video directly as GIF or MP4, for example. It makes sense to leave this value at 6
and
make final adjustments in
terms of
frame rate in external programs later. In the current SVD model, moreover, the
augmentation_level
value must remain at 0
. In old versions of the model, this value had to be adjusted,
but this has been
superseded
since the model XT 1.1
.
KSampler
In the KSampler node the images are rendered. Accordingly, we can influence some parameters that can direct the generation. However, the default values can be adopted in most cases.
The seed
stands for the initial noise based on which the images are generated. If we
keep all
settings the same and start the generation with the identical seed, the same output is generated
each time. This means, conversely, that if we are already quite close to what we want to achieve
with a certain motion bucket value, it can be helpful to let the workflow run another time. As long
as the seed is not identical, a new animation will result from this.
Whether a new seed is automatically generated is determined in control_after_generation
.
If the
value is set to randomize
, a new seed is automatically generated each time.
Alternatively, we can
set the value to fixed
to keep the same seed.
With more steps
, we can theoretically increase the quality of our frames, but in our
tests, the difference between 25
and 50
steps was negligible, and
especially if you plan on upscaling later, any improvements due to more steps will most likely be
rendered irrelevant.
The cfg
parameter, or CFG scale does have some impact, as it seems. Values of
7
and higher, break the images and values of 1
and lower make the
animation rather obscure. However, higher values generally seem to improve consistency slightly,
while lower values seem to produce more extreme movements.
The sampler_name
, scheduler
, and denoise
values already seem
best in the default settings in our tests.
VAE Decode
The VAE Decode node simply finalises our generation. We do not need to change anything here.
FILM VFI
The FILM VFI node allows us to perform frame interpolation in ComfyUI. Frame interpolation takes two images within a video and calculates or estimates what an image between these two images might look like. This allows us to artificially extend a video. FILM stands for "Frame Interpolation for Large Motion" and is an algorithm that was developed by Google.
As ckpt_name
(i.e., checkpoint model), we should use film_net_fp32.pt
for
our case. The model
should be automatically downloaded with the node. clear_cache_after_n_frames
defines,
as
the name
suggests, after how many individual frames the cache is cleared. For computers with less RAM, it may
be helpful to lower the value.
The multiplier
parameter determines how many frames are added. A multiplier of
4
, for example,
generates 100 output images from the 25 input images (though it is practically 97 images).
Especially with slow movements, multipliers of 4
to 8
can create realistic
videos. If the video
has very fast or strong movements, however, frame interpolations with lower values (e.g.
2
) look more realistic or it may
even make sense not to perform any frame interpolation.
In general, the FILM VFI node is set to Bypass
in the example workflow (recognisable by
the purple
overlay). This means that it is ignored by default. This is because it often takes several attempts
to find the right settings. Performing frame interpolation each time, even if the result does not
have the desired look, unnecessarily slows down the process, so it makes sense to perform the frame
interpolation process separately. If we want to undo the Bypass
, we can right-click on
the node
and click on Bypass
again. This way, we can also turn off the Save Image
node if we
do not want
to save all individual frames.
Load Image (Path)
The Load Image (Path)
node is not connected by default, as we can perform the frame
interpolation
afterwards. The process is described in detail below.
Save Image
The Save Image
node saves all individual frames of the animation. Here we
only need to
specify the path where we want to save our images. We can either specify an absolute path, such as
C:\Users\Username\Desktop
, or a relative path like FolderName/FileName
.
Then, the files are
saved in the path
StabilityMatrix-win-x64\Data\Packages\ComfyUI\output\FolderName\FileName
in a
Stability Matrix installation. It is generally sensible to create a new folder for each generation
so that the corresponding frames of the animation are always saved together. This makes it easier to
interpolate the frames later or put them back together into a MP4 file, for example. It is
sufficient to specify
FolderName/FileName
as the output path, even if the folder does not yet exist. The
folder should
be created automatically.
Video Combine
The Video Combine
node combines the individual frames into
a video. We
can either use the node as a preview or save it as the final result.
frame_rate
sets the frame rate. For a preview, the frame rate is less important. If we
want
to save
the video from here directly, standard values are 24
, 30
, and
60
.
loop_count
determines how often the video is looped. 0
is an endless loop
and
1
is no loop.
In filename_prefix
, we can add a prefix to the file name when saving.
In format
, we can specify the output format. The combination of
video{h264-mp4
, yuv420p
, and 19
as crf
are standard values for MP4 videos.
If we turn on save_metadata
, a PNG will be saved in addition to the video, which
includes
the entire
workflow. This is quite helpful as we can recreate the workflow of a particular video by simply
dragging and dropping the PNG into ComfyUI. The individual images saved via the
Save Image
node
also include the workflow.
The pingpong
value creates a loop that plays the video forward and then backward.
With save_output
, we can automatically save the video. If we set
save_output
to false, we can
still save the video by right-clicking on the video preview and selecting Save preview.
Start Workflow
We start the workflow and the generation by pressing the Queue Prompt
button in the
ComfyUI menu.
Separate Frame Interpolation
We can use the Load Image (Path)
node to perform the frame interpolation process after
the
initial generation. This assumes that we already have a separate folder where all the frames of the
animation are saved. So, if we run the workflow once and have the Save Image
node
turned on, which
saves the individual frames, we can select the output folder as the new input folder here.
We simply click on choose folder to upload
and navigate to the corresponding folder.
Alternatively, we can click on the folder name in the node and choose the folder that way.
Now we need to connect the small blue dot IMAGES
of the Load Image (Path)
node with the
small blue
dot frames
in the FILM VFI
node. This does not rerun the entire workflow
but only the part from
the Load Image (Path)
node onwards. Now we can adjust the values of the other nodes as
we see fit
and based on the previous descriptions.
At the end, we should make sure that we specify a new folder in the Save Image
node so
that the new
frames are saved together in a separate folder.
Now we can start the workflow and the generation again by pressing the Queue Prompt
button in the ComfyUI menu.
Upscaling the Video
In this article, we describe how we batch-upscale the videos from SVD. Admittedly, the upscaling part of the workflow is by far the most time-consuming one, but it is worth the wait from our perspective.
Conclusion
We love the SVD XT 1.1 model and this workflow for its high consistency and speed. It's really easy and fast to get great results. Sometimes, it might be annoying that there are no more ways to control the output, but generally speaking, it works much better than other, more complex animation workflows as of now.