In this tutorial, I learned how to generate a video using three frames on LTX 2.3.
I generated a 1280x704 video in Cloud for the tutorial, lasting 15 seconds.
The video adds sounds of moving objects that properly match the actions in the video, as well as a melody. I couldn't remove the melody; it's different every time depending on the generation number, and sometimes there might be no melody. I also couldn't upload my own background music for the video. The sounds of the girl walking and the door opening are correct.
The tutorial author has a prompt for 15 seconds of video generation using three frames:
She turns her head and gets up from her chair.
The girl leaves the room. She walks to the door, reaches out, and opens it in a smooth motion. She continues forward, into the outside environment with a natural transition of lighting.
She walks to a parked superbike and stops next to it, holding her helmet. Continuous frame-by-frame motion, no cuts, no abrupt transitions, smooth temporal coherence, stable identity, consistent clothing, natural pace, realistic movement, 24 frames/s.
But this prompt often generates various errors that the AI doesn't understand, and it generates them randomly, which can make it unsuitable.
I decided to create the following scenario for the video:
The girl stood up, pushed back her chair, and walked out of the room to the right (from the video perspective).
Next angle: The door was opened by the girl from the inside. The girl exited, closed the door, and walked toward the motorcycle.
The camera follows the girl's movement to the motorcycle. The girl approaches the motorcycle, turns to the camera, grabs her helmet from the front of the motorcycle with both hands, and moves the helmet to the side, holding it with one hand.
From the moment she leaves the house, the camera shouldn't change angles, but follow the girl the entire time.
But many errors occur in the generations.
She exits incorrectly, gets up from her chair incorrectly, turns around incorrectly, appears incorrectly near the door,
approaches the door from the other side, slips through the door, crouches, and crawls to the door,
walks away from the motorcycle, two girls appear, one approaches the door, the other exits, or one leaves, the other exits at the same time.
The camera doesn't follow the girl as she walks toward the motorcycle. She incorrectly grabs her helmet and abruptly changes her pose from the rear to the front, and the helmet appears out of nowhere. There were glitches where the girl turns around and doesn't get up from her chair, looking in different directions, and then gets up or leaves the chair, walks back, and a wall appears. The girl walks left, in the other direction, takes a long time to exit the house, and walls begin to appear in the room, a new interior that wasn't in the prompt. The girl tried to walk toward the wall, where there was no door. When she got up from the chair, she made an incorrect turn and walked in the wrong direction. There were also some glitches with the helmet at first. It would randomly appear in the girl's hand when she walked away from the door, or it would appear in her hand when she was close to the motorcycle. Sometimes the girl's pants would change to a skirt.
A watch would appear on the girl's hand. When I was generating the video, I didn't notice this. In the second frame, the girl isn't wearing a watch, and in the third, a steel bracelet is visible on her hand; it needed to be removed. The AI sometimes tries to add it, sometimes remove it.
The errors in the video are inconsistent; sometimes it might generate correctly, other times there might be glitches. But when I clearly described how the girl approaches the motorcycle, how the camera follows her, how she turns around, how she picks up the helmet, and how she moves it in the right direction, there were no glitches at the end of the video after all these explanations.
All the movements are physically correct, the girl doesn't have any extra body parts, and there are no other graphical defects.
But in this workflow, the girl's eyes are deliberately covered with black glasses to avoid any defects.
When generating various versions, I tested and changed the prompts, but there were a lot of errors in the generations. They depend on the complexity of the images and the quality of the animation prompt.
The main error is that the girl exits the door incorrectly, approaches the house from the exit, tries to go through the door, and then begins to exit the house.
Highly dependent on a video card, you need about 24 GB to get the workflow on your computer.
The ability to generate different sizes:
3840 x 2176 1920 x 1088
2560 x 1408 1280 x 704
2048 x 1152 1024 x 576
1920 x 1088 960 x 544
1280 x 704 640 x 352
1280 x 736 640 x 368
960 x 544 480 x 272
768 x 432
384 x 216
These dimensions show that there's only one horizontal video format with different sizes of the same aspect ratio. For the 1280p size, there are two options.
I haven't tested other sizes.
The second frame can appear at different points in the video; I haven't thoroughly tested how to specify exactly what second it should appear at. Depending on the amount of text in the prompt, the second frame may shift in time. For example, if there's no detailed description of the girl leaving the house at the beginning, the second frame appears sooner.
If I add a lot of clarification in the prompt before the girl leaves the house, or if the animation glitches before the second frame, the girl will start running at the end of the animation because there wasn't enough time to complete all the actions in the prompt, and the video length is fixed at 15 seconds.