Skip to content

Fix mute icons appearing on inactive tasks on Pipewire

Nyan Pasu requested to merge nyanpasu/plasma-pa:fix-pipewire-taskbar into master

This replaces MapBase's ineffective mechanism for avoiding race conditions (m_pendingRemovals, last worked on by @sitter) with a cancellation token system, so callbacks pointing to deleted clients/streams/etc. don't write stale data to MapBase (reverting the deletion).

I hope it's correct and doesn't leak RAM. This is very fiddly, involving new/delete and unchecked downcasting from void* (because libpulse is a C API), and tricky logic (chosen by me to accommodate libpulse's asynchronity).

@bharadwaj-raju perhaps?

Explanation

Bug 438565 occurs when plasmashell thinks some sink inputs (audio playback streams) belong to PID 0 (which is the PID plasmashell associates with inactive apps). This happens because plasmashell and plasma-pa load the stream before loading the client (process) that created the stream, and when trying to find the PID of the client owning the stream (https://invent.kde.org/plasma/plasma-desktop/-/blob/master/applets/taskmanager/package/contents/ui/PulseAudio.qml#L79), plasmashell doesn't find a client and falls back to PID 0 (and caches it for the lifetime of the stream).

Why does plasma-pa load the stream before the client, when pipewire-pulse informs plasma-pa about the client before the stream? Because plasma-pa has a race condition which can delay handling (client/stream/etc.) insertions past later removals (removals are handled instantly, while insertions are only handled after a round-trip to pipewire-pulse to request more information). See my comment for details.

Successful tests so far

  • Firefox and Discord mute icons appear normally (like before).
  • Running Audacious in PulseAudio mode doesn't put mute icons on inactive apps rather than Audacious.
    • If I stub out CancellationToken::cancel(), the bug is easily reproducible.
    • I have a commit with my fix plus extra logging (not published yet, but I can push it if you want). This way I can run plasmashell in a terminal, launch Audacious, and see on my terminal when the bug would've happened if not for CancellationToken.
  • flatpak run org.audacityteam.Audacity does not result in Audacity's audio playback appearing on inactive tasks.
  • Starting plasmashell with running audio streams (Context::contextStateCallback()) works.
  • Opening apps and audio streams after starting plasmashell (Context::subscribeCallback()) works.
  • Restarting pipewire-pulse with plasmashell running (Context::reset(), MapBase::reset()) works (prints "org.kde.plasma.pulseaudio: context kaput" and picks up the new pipewire-pulse).
  • flatpak run org.audacityteam.Audacity & watch -n0.1 'systemctl --user restart pipewire pipewire-pulse' (insanity test) doesn't trip an assert. (Amusingly it crashed pavucontrol-qt once (backtrace), but I can't reproduce it.) Audacity doesn't play audio after, but I didn't expect it to work. And nothing plays audio until you terminate the loop, wait for the system to settle down, then run systemctl --user restart pipewire pipewire-pulse.
  • Similarly, flatpak run org.audacityteam.Audacity & while true; pipewire-pulse &; sleep 0.1; killall pipewire-pulse; end and variations.
  • I used my logging to verify that on regular PulseAudio (which doesn't reuse indexes), my modified MapBase container doesn't leak client indexes when I run flatpak run org.audacityteam.Audacity (which creates and destroys many of them in a short period of time) (Stream::client(), m_clientIndex 57 (0, 1, 2, 57) QPulseAudio::Client(0x55e2dadbefa0). I think MapBase doesn't leak sink inputs either (PulseAudio.qml, instantiator only contains streams from currently running apps). You can probably verify this by making sure there's no stale applications in the volume applet (IDK if it would catch leaked streams).

Remaining issues

  • Flatpak apps don't show a mute icon on either pulseaudio or pipewire-pulse. Under the current design, plasmashell relies on the PulseAudio client metadata's application.process.id to associate clients with taskbar Task. This is always 2 for Flatpak apps (since apps run under PID2 within a Flatpak PID namespace), but plasmashell sees a different PID. And it appears the streamsForAppName fallback doesn't work, but I didn't look into why (whether the function gets called and fails, or never gets called).
  • Running mpv in Konsole doesn't show a mute indicator on Konsole. This is because plasmashell checks for the stream client process's PID, and its parent's PID, but Konsole is the grandparent of mpv (konsole -> fish -> mpv).

Is there a future plan to move to cgroups to associate audio streams with apps? I don't know if this is possible, since pactl list clients | less doesn't seem to show cgroup information. Even if it worked, it has the flaw where if you run a GUI app under Konsole's cgroup, they share a cgroup and audio coming from either window would get shown under both taskbar items. Maybe a stream should only be shown on tasks with a matching cgroup, if none of the visible Tasks have a matching PID/app name.

(Note to self: read https://lwn.net/Articles/834329/.)

Code review

libpulse calls certain callbacks one or more times with eol=0 (followed by one call with eol=1), supplying indexes which may be invalidated before the callback runs. I changed these callbacks to take Job * rather than Context *. All of these are called both on startup (Context::contextStateCallback()) and in response to events (Context::subscribeCallback()). I hope I didn't mix Job * and Context * up.

  • sink_cb
  • source_cb
  • sink_input_callback
  • source_output_cb
  • client_cb
  • card_cb
  • module_info_list_cb

I didn't make ext_stream_restore_read_cb take Job *, because m_streamRestores is never removed from, so there is no race condition. Honestly I don't know if m_streamRestores should even be a MapBase at all.

I marked MapBase as a final class and marked protected members as private, since it currently has no subclasses (maybe it used to?). This is fine if it's a private API that consumers can't see or subclass, but will otherwise break downstream libraries or apps.

Was it a good idea to add my own CancellationToken and Job types? Do I need to document CancellationToken more (it's a copyable object wrapping a shared_ptr<bool>, so one copy can write cancelled = true and the other copies can query the current status)?

I don't know if CancellationToken Context::m_globalCancel is necessary. I added it so if you call Context::reset() midway through PulseAudio callbacks, the remaining callbacks get cancelled. I don't know if this can ever happen (reset() is called by ~Context(), and when Context::contextStateCallback(state) is called but !PA_CONTEXT_IS_GOOD(state)). IDK if the "insanity test" (repeatedly restarting the pulseaudio server) can trigger it. Is plasma-pa vulnerable to use-after-free where some callbacks call methods on ((Context *)data) after Context is destroyed? (I haven't noticed crashes on program shutdown yet...)

I'm not actually sure whether my approach is acceptable, or if it's too complex or unsafe for the functionality achieved. I really dislike how some void *data are this, while others are ephemeral Job objects, but don't see a good way around it. Does anyone have suggestions? I can't think of any ideas that don't require queueing removeEntry() off a server message callback where you ignore the actual message (which depends on libpulse and the server responding to messages in order, and may not handle resets as well as my MR), and don't require using different types for different void *data (like my MR).

Style

Is my usage of Q_ASSERT acceptable, or did I check too many or too few invariants? After I finished initially bug-fixing, I haven't hit Q_ASSERT a single time since then.

Do I need to use const more?

I introduced #include <memory> and std::shared_ptr<> into this code, while the previous code seems to not use stdlib containers but only Qt ones (like QMap<> and QSharedPointer<>). Is this okay or should I stick to Qt containers? Should I switch my new m_cancellation (or even the existing m_data) from QMap to QHash (I haven't tested if it works) or even std::unordered_map (honestly not really faster than QHash because of iterator stability and chaining)?

Future refactors

Should I reorder Context::contextStateCallback() above Context::subscribeCallback(), since it gets called earlier?

Should MapBase[QObject] be renamed to something like PulseMap[QObject]? It's no longer a base class for anything in this repo (unsure others).

I found MapBase<Type, PAInfo> confusing to learn, because it maps PAInfo::index to Type, but the parameters are in the opposite order I'd expect. I didn't swap the parameters here, because it makes the diff bigger and isn't functionally tied to this MR. Should I open another MR to swap the arguments?

BUG: 438565

Edited by Nyan Pasu

Merge request reports