This patch attempts to solve long-standing audio-video synchronization issues by replacing our old QtMultimedia based audio system with a new one that's based on the MLT Framework.
Up until now, Krita's animation system has used QtMultimedia's QMediaPlayer in an attempt to add audio to animation playback. This is an important feature for any workflow that involves syncing animation with dialogue or background music tracks. The QMediaPlayer supports loading a single audio file in a variety of formats, and gives basic high-level controls for playing, stopping, seeking, etc. In the context of Krita, this meant that our ability to handle audio playback was limited to something akin to controlling a media player; hitting play when we want playback to start, stop when we want it to stop, and seeking to the correct time in milliseconds.
These high-level controls, coupled with a design where the change in animation frames was responsible for driving audio player state changes, resulted in a lot of showstopping problems with audio-video synchronization as well as pops and crackles due to audio buffer xruns. In hindsight, QMediaPlayer was probably not the right tool for the job, so animation audio has not been very accurate or easy to work with.
Last year, we spent a little time between projects researching alternatives to QMediaPlayer, including the QtMultimedia's lower-level interface options, SDL, and others. But it wasn't until a few weeks ago that we jumped back into animation audio and started looking into what KDenLive uses, MLT (Media Lovin' Toolkit).
MLT was designed with audio-video synchronization in mind and, so far, seems to be a good fit for Krita. It does a lot of the stuff that we want out of the box (frame-by-frame control, push and pull models, configurable frame rate, buffer size and latency, support for many formats and drivers, etc.), and has a lot of flexibility to do basically anything that we can think of through the application of filters (like timewarp for speed adjustments) or creation of custom services.
The KisPart singleton now has a single instance of a new
KisPlaybackEnginebase class, which is implemented by two new subclasses
KisPlaybackEngineMLTuses MLT to drive animation image changes while pulling/pushing synchronized audio.
KisPlaybackEngineQTis a simple fallback engine which relies only on a Qt timer to drive animation playback, in cases where MLT is unavailable or unwanted.
Because we now have a single playback engine per Krita instance, it also makes sense to store some per-canvas (typically per-document) data for when the active canvas is changed. That data is stored in a new class called
KisCanvasAnimationState, within each KisCanvas2 instance. For example, while each document's audio file is now loaded by the
KisPLaybackEngineMLTas an Mlt::Producer, the path of this file is saved in the
KisPlaybackEngineMLTswitches image frames to the cadence of audio for the sake of synchronization. Audio output is handled by a pair of Mlt::Consumers. First, there is the
pullConsumerwhich drives regular playback, automatically pulling audio and calling back into Krita for visual frame change commands. Then there's the
pushConsumerthat we can manually push frames of audio to outside of playback (for example, while scrubbing on the timeline).
KisPlaybackEngineMLTmakes use of a custom
mlt_producerplugin written in C to implement some Krita-specific behavior like looping a certain range of frames and handling playback speed changes. There's probably a lot more that can be done by customizing MLT services in this way.
Another small new class
KisFrameDisplayProxysits between the canvas and the image, and is responsible for making decisions about whether to use the cached version of a frame (for speed) or whether to reproject/refresh the frame from the image (for accuracy). We've found that a lot of logic about whether to use the cache or not was poorly encapsulated, increasing the likelihood of caching bugs. We hope that putting most of decision making around caching into a dedicated class with a simple interface will make caching errors less likely.
At 24fps (as an example) a single frame of audio will only be 1/24th of a second long. So, to give a more useful preview of the audio while scrubbing, we are currently pushing multiple frames worth of audio with each call to
seek(frame). Right now we're using the seek target frame and a small number of frames after it. (We want to make this use configurable, but we should also try to find a good framerate-agnostic default.)
KisPlaybackEnginehas a couple of different
SeekOptionflags that can be used to add extra, optional behavior when seeking frames. Right now this allows us to have a lot of control over seeking in different scenarios (for example, seeking during playback vs seeking while scrubbing vs seeking in response to some external signal), some of them requiring audio to be pushed, some of them wanting a full cache refresh, and others wanting nothing. Adding new optional or uncommon seeking behaviors should be relatively easy with this design.
For now, only
KisPLaybackEngineMLTsupports animation audio. The simple Qt-driven version,
KisPlaybackEngineQT, was something that we rewrote from scratch in recent months when deciding the best way to handle the dependencies for this MR, because we determined that it wouldn't hurt to have another simpler fallback option in the event that MLT isn't available to people who want to build from source. It probably wouldn't be too hard to add more functionality to the
KisPlaybackEngineQthowever, if that's something people want and are interested in, though right now it is pretty simple.
Known Issues and Remaining Tasks
Playback preview speed isn't currently hooked up and no longer works. We think that inserting something like a "timewarp" producer into the playback network will allow us to control preview playback speed again. (That's how KDenLive does it.)We've added variable playback percentage speeds w/ appropriate sound scaling and playback changes. Synchronization is also consistent.
Playback no longer loops within the selected framespan. MLT is driving playback right now but it doesn't know about the state of the selected frames on the Timeline Docker. We are considering a couple of ways that this can be fixed, such as (a) inserting a custom filter into the playback network that imposes limits on which frames can be played, or (b) by imposing those limits from within Krita. We aren't sure which way is better yet.(This has been solved behind the scenes by implementing a custom MLT::Producer that confines playback to a specific range that is set from within Krita. This custom producer will need to be built and packaged with Krita. No change from user's perspective.)
Framerate changes in GUI currently don't work. Should be easy enough to plug in and update all producers / consumers with the appropriate change of values. (This has now been addressed. No change from user's perspective.)
Pushing audio "frames" while scrubbing can cause some glitchy warbling sounds that are likely caused by sample-level discontinuities in the audio due to random writes. It's kind of a low-priority polish issue, but we would like to smooth this over by creating a custom MLT filter that zeroes out audio samples until the first zero crossing. I think that'll make it nice and clean even when scrubbing!
Find a better default value for the size of the windows of audio that's pushed by the
KisPlaybackEngineMLT, and make it configurable. User testing suggests that the current window is too big by default.