This document requires Google Chrome, Safari or Mozilla Firefox to view
Leonard Teo (teo.leonard@gmail.com)
ID: 9724761, Concordia University
INSE 6530, Fall 2011
Schodl et al (2000) introduces a new medium called a “video texture” (not to be mistaken with a texture map that uses an video file as its source). Essentially a video texture is a moving image that can be played back for an infinite duration without a noticeable jump in the footage. Schodl provides a novel approach to taking arbitrary video footage and converting these into usable video textures. It is the synthesis of video textures that this paper documents.
Schodl's approach involves taking an input video clip, analyzing it, then either rendering it directly to screen (random play) or sequencing video loops that can be subsequently played back on any device or embedded in other mediums such as video games. I implemented the video texture synthesis in C++ on Mac OS X, utilizing the OpenCV framework. Finally, the textures were tested in a real-time openGL application using GLUT.

Some of the videos that were successfully converted into video textures
The first step of implementation involves analyzing input video. We do this by computing the difference between all pairs of frames in a video sequence. Schodl recommends calculating the L2 distance (sum of squared difference) between each pair of frame. We first convert all images into greyscale as this gives us a simple value at each pixel. We sum the difference squared of each pixel between the frames and return the square root of the sum as the L2 distance.
For a video that has N number of frames, we end up with a NxN matrix with each cell containing the L2 differences between each corresponding frame. Because the L2 distance analysis can be computationally intensive, we save out the results to a cache file that can be read subsequently.

A visual representation of the L2 distances between each frame.
We now convert each of the distances into corresponding probabilities. Schodl provides the probability equation as:
Pij ∝ exp (−Di+1, j/σ)
Where:

The probability matrix based on the L2 distances.
At this point, we can actually play back a video texture by playing a frame, then stochastically select the next frame based on the probabilities provided in our probability matrix. This is the basis of random play. The problem at this point is that our video does not preserve dynamics. If our input video contains repeating video (e.g. a pendulum or swing), we need to account for the dynamics.


For input videos with repeating pendulum-like motions,
we need to account for dynamics.
Schodl provides a novel approach to preserving dynamics. Rather than using optical flow or some other technique that analyses the actual motion in the video, we simply apply a weighted diagonal kernel to our distance matrix as defined in the equation:

In effect, the weighted kernel simply distributes the distance values diagonally. If we visualize the distance matrix as an image, it blurs our image diagonally from the top left to bottom right.


The effects of the diagonal weighted kernel on the distance matrix.
We then recalculate the probability matrix using the same equation as before.

When we recalculate the probabilities,
we find most of the unwanted transitions removed.
So far, we've only looked at the local costs of taking a given transition. To get the best transitions, however, we have to account for the future cost of taking a given transition. One common area this is useful is in avoiding a dead end in a video. We do this by applying the anticipated future cost of taking a transition to the current transition. The cost of transitioning out of a dead end frame would add to the cost of taking a given frame, therefore lowering its probability so that the video texture doesn't go there.
We do this using the following equation:
![]()
Essentially, we take the cost of the next minimum-cost transition and add it to the current transition. We set a small threshold to check if the matrix converges, and run multiple passes until the difference between each pass is minutely small, assuming that the matrix has converged. We then take this new distance matrix and calculate corresponding probabilities.

The final anticipated future cost probability matrix.
If we were to run our video texture using random play based on the anticipated future cost probabilities calculated in the previous step, we can get good results with it. The problem with stochastically selecting frames is that it can still randomly select a bad frame to transition to. That's the double-edged nature of random number generation – you don't really know what you're going to get. For that, we prune transitions. If the probability of a given transition falls below a certain threshold, we set the probability of that cell to zero, thereby eliminating it. We can arbitrarily set the threshold by selecting a small enough probability that will eliminate the terrible transitions. The downside with this technique is that some rows in your matrix may end up with no suitable transitions, thus we have to keep lowering the threshold until it is at a value just high enough to provide only the best transitions without eliminating all transitions.
Random play is relatively straightforward. We stochastically select the next frame based on the weighted probabilities. In our implementation, we can select which probability matrix to play from. Obviously the best results come from the anticipated future cost probability matrix.
When taking a transition, however, there are cases when the transition jump is jarring. We attempt to minimize this by cross fading between the frames. At this point, our cross fade procedure will only create a single interstitial frame between two frames. On slower videos (15fps) this is fine and provides some relief. On faster videos (30fps) you almost can't see the effect, so the transition is still jarring. Schodl recommends creating multi-frame transitions and even morphing to minimize the transition jump.

An extreme example of a cross fade frame that
interpolates between two frames.
While random play is fairly straightforward to implement, its use is limited as it requires an executable program to play back the video texture. In most cases, we want to save the video texture into a looping video. This looping media can then be played back on most devices or be embedded into some other medium such as a video game.
As I found out, sequencing video loops is not trivial.
The technique Schodl describes is to take a sample of the best transitions in a video, then generate a dynamic programming table for finding optimal loops. In this table, the columns are each of the transitions (we regard each transition as a valid video loop). The rows represent the target number of frames we want our resulting video to be.
We sequence the video texture using the following procedure (note that our description is greatly simplified)
Step 1: Find the best transitions in a video. We do this by analyzing our distance matrix (after its gone through anticipated future cost calculations) and selecting the best minimum-cost transitions up to a value that the user provides. Initialize the dynamic programming table.
Step 2: For each cell in the table, examine all loops of shorter length in the same column and try to combine them with loops from columns that overlap with the column considered and add up to the row number.
The computational complexity of this algorithm is O(L2N2), so producing this table to get good results can take quite a while.
For scheduling the video texture from a set of overlapping primitive loops, we follow the following procedure. Note that our procedure slightly differs from Schodl's.
The above procedure is simpler to implement than Schodl's as they cater for each of the non-overlapping ranges.
Here are some results.
Candle flame – This video shows a single flickering candle flame. This provides a generally good basis for creating a video texture. |
|
Leo – In this video, I swing back and forth and start waving my hand, providing a good test case for dead-end detection. While the random play texture sometimes chooses one of the dead-end frames, the generated video loop was actually very good, with no perceivable transition when looping. |
|
Fireplace – Fireplace footage works very well for video textures on both random play and video loop sequencing. We found very little issues with creating infinitely looping textures on fires like this. |
|
Flag – I found flags to be very difficult. On random play, it looked terrible. When sequencing video loops, we were able to get reasonable results, though the frame jump is still slightly visible. |
|
Cats Eyes - This footage is a close up of a cat that looks around. I was actually shocked at how well this turned out. Although the transitions are visible, the technique was able to match up the closest positions for the cat and build a reasonably good video texture. |
|
Puppy – This is slightly more ambitious than the cat, as the puppy moves around a lot. With a bit of tweaking I was able to get a usable video texture. The jump is noticeable but I was surprised at how well the algorithm matched up the best transition. |
|
South Park – I tested with two sequences from the cartoon series South Park. I found that they worked well because animators already reuse many frames to minimize the number of frames they have to draw. Combined with cross-fading, I was able to get some humorous results. |
|
VTClock – The original test footage from Schodl (2000). This was used as the control sample for generating video textures. My implementation was able to generate a reasonably good texture out of this and avoids the deadend, but an odd transition is noticeable. |
In my final test, I integrated video textures in a real-time graphics engine using both random play and pre-rendered video textures.
While the thought of using infinitely looping video textures in a game engine might seem compelling, we must examine the justification for its uses. First of all, it is unlikely that random play video textures would be considered in a commercial game. In commercial game development, computational cycles are extremely precious. As games increase in complexity, these computational cycles must be allocated to things that improve game play such as AI, physics and animation. Random play consumes too many computational cycles per frame to produce a single texture, thus the benefit does not outweigh the cost.
Pre-rendered video textures, however, can be used as looping video in a game engine. Commercial game engines such as Unity and Unreal already include support for importing a movie file and using that as a texture. The main question is under what circumstances in a game would you want to use a looping video texture?
For simulating computer or television screens, having infinitely looping video might be useful, as is shown in Half Life 2 with the video of “big brother” constantly playing across all screens in a city. The paper by Schodl, however, might convince you that using video textures for elements such as fire and flags might be a good idea. The problem here is that video textures are, in effect, flat billboards and they don't look realistic in a 3D environment. In a 3D game environment, a better approach to fire would be to use a particle system. Similarly, a flag can be achieved with better results by using actual geometry and animating the geometry (potentially with cloth simulation).
In this video, we load 4 pre-generated video loops at run-time.
In this video, we generate the video texture at each frame update using random play.
Overall, implementing video textures was challenging and rewarding. For me, it gave a good introduction to video processing and computer vision.
Schodl, A., Szeliski, R., Salesin, D., Essa, I. Video Textures, Proceedings of SIGGRAPH 2000, pages 489-498, July 2000.
Leganiere, R. OpenCV 2 Computer Vision Application Programming Cookbook, 2011.
Video samples from Pond5.com