This page lists up our hands-on experience in playing with TLD on how to create an effective model, to train them for actual detection and to evaluate the result for real-world vides. In our practice the source data is members' lifelogs mostly composed of images and videos. Since TLD only works for videos (yes, it only works with videos by its algorithm), models are created from user's videos and then tested on his other videos and photos for evaluation. Later, we will check similar works and any well-known data sets for comparative evaluation with other algorithms.
Prepare the source data
The first thing to try is the input data quality (video size, frame rate, quality etc.). The speed of a TLD algorithm mostly related with the SIZE of a video. It is thus mind-boggleing which size is best considerting for both time and quality of a model to train. The good part of TLD is that it is scale-variance (in some tolerance) and so we first converted input video into this size.
ffmpeg -r 30 -i input.avi -sameq -an -s 480x260 output.avi
The above command reduces an input video to the size 480x260, removes its audio and keep the same video quality (For the practive of ffmpeg, please check out our ffmpeg practive page.)
Select & Create the model
With our modified TLD Xcode version, a user can create multiple models for training and detection. A user can specify the name of a model upon the time of model creation. He may erase or overwrite or stop training at any time with key inputs.