Skip to content

Handle SIGBUS for KSharedDataCache

Dāvis Mosāns requested to merge davism/kcoreaddons:sigbus into master

Currently if you're using copy-on-write (CoW) filesystem such as Btrfs and if you get your ~/.cache partition full then your Plasma Desktop and most KDE apps will crash (and some will auto restart and keep crashing).

Crashes will look like this (eg. drkonqi, kwin_x11)

okt 29 06:52:08 Arch kwin_x11[92289]: Application::crashHandler() called with signal 7; recent crashes: 30
okt 29 06:52:08 Arch kwin_x11[92269]: 13 -- platform=xcb
okt 29 06:52:09 Arch plasmashell[92279]: 11 -- display=:0
okt 29 06:52:09 Arch kwin_x11[92269]: 11 -- display=:0
okt 29 06:52:09 Arch plasmashell[92279]: 20 -- appname=plasmashell
okt 29 06:52:09 Arch kwin_x11[92269]: 17 -- appname=kwin_x11
okt 29 06:52:09 Arch kwin_x11[92289]: 22 -- exe=/usr/bin/kwin_x11
okt 29 06:52:10 Arch plasmashell[92279]: 17 -- apppath=/usr/bin
okt 29 06:52:10 Arch kwin_x11[92289]: 13 -- platform=xcb
okt 29 06:52:10 Arch kwin_x11[92300]: Application::crashHandler() called with signal 7; recent crashes: 31
okt 29 06:52:10 Arch systemd[893]: plasma-plasmashell.service: Main process exited, code=killed, status=14/ALRM
okt 29 06:52:10 Arch systemd[893]: plasma-plasmashell.service: Failed with result 'signal'.
okt 29 06:52:10 Arch systemd[893]: Failed to start KDE Plasma Workspace.
okt 29 06:52:11 Arch systemd[893]: plasma-plasmashell.service: Scheduled restart job, restart counter is at 923.
okt 29 06:52:11 Arch systemd[893]: Stopped KDE Plasma Workspace.
okt 29 06:52:11 Arch systemd[893]: Starting KDE Plasma Workspace...
okt 29 06:52:11 Arch kwin_x11[92289]: 11 -- display=:0
okt 29 06:52:11 Arch kwin_x11[92289]: 17 -- appname=kwin_x11
okt 29 06:52:11 Arch kwin_x11[92300]: 22 -- exe=/usr/bin/kwin_x11
okt 29 06:52:12 Arch kwin_x11[92300]: 13 -- platform=xcb
okt 29 06:52:13 Arch kwin_x11[92300]: 11 -- display=:0
okt 29 06:52:13 Arch kwin_x11[92300]: 17 -- appname=kwin_x11
okt 29 06:52:37 Arch systemd[893]: plasma-kwin_x11.service: State 'stop-sigterm' timed out. Killing.
okt 29 06:52:37 Arch systemd[893]: plasma-kwin_x11.service: Killing process 92322 (kwin_x11) with signal SIGKILL.
okt 29 06:52:37 Arch systemd[893]: plasma-kwin_x11.service: Failed with result 'signal'.
okt 29 06:52:37 Arch systemd[893]: Failed to start KDE Window Manager.
okt 29 06:52:37 Arch systemd[893]: plasma-kwin_x11.service: Consumed 9.996s CPU time.
okt 29 06:52:37 Arch systemd[893]: plasma-kwin_x11.service: Scheduled restart job, restart counter is at 272.
[...]
okt 29 06:58:20 Arch kwin_x11[94438]: KCrash: crashing... crashRecursionCounter = 2
okt 29 06:58:20 Arch kwin_x11[94438]: KCrash: Application Name = kwin_x11 path = /usr/bin pid = 94438
okt 29 06:58:20 Arch kwin_x11[94438]: KCrash: Arguments: /usr/bin/kwin_x11 --crashes 10
okt 29 06:58:20 Arch plasmashell[94521]: 11 -- display=:0
okt 29 06:58:20 Arch kwin_x11[94504]: 9 -- signal=7
okt 29 06:58:20 Arch kwin_x11[94438]: KCrash: Attempting to start /usr/lib/drkonqi
okt 29 06:58:20 Arch plasmashell[94521]: 16 -- appname=drkonqi
okt 29 06:58:20 Arch kwin_x11[94504]: 10 -- pid=94504
okt 29 06:58:20 Arch plasmashell[94521]: 17 -- apppath=/usr/lib
okt 29 06:58:20 Arch kwin_x11[94504]: 18 -- appversion=5.26.1
okt 29 06:58:20 Arch kwin_x11[94504]: 17 -- programname=KWin
okt 29 06:58:20 Arch plasmashell[94521]: 9 -- signal=7
okt 29 06:58:20 Arch plasmashell[94521]: 10 -- pid=94521
okt 29 06:58:20 Arch kwin_x11[94504]: 31 -- bugaddress=submit@bugs.kde.org
okt 29 06:58:20 Arch plasmashell[94521]: 19 -- appversion=5.25.80
okt 29 06:58:20 Arch kwin_x11[94504]: 12 -- startupid=0
okt 29 06:58:20 Arch plasmashell[94521]: 34 -- programname=The KDE Crash Handler
okt 29 06:58:20 Arch kwin_x11[94504]: KCrash: crashing... crashRecursionCounter = 2
okt 29 06:58:20 Arch kwin_x11[94504]: KCrash: Application Name = kwin_x11 path = /usr/bin pid = 94504
okt 29 06:58:20 Arch kwin_x11[94504]: KCrash: Arguments: /usr/bin/kwin_x11 --crashes 11
okt 29 06:58:20 Arch kwin_x11[94504]: KCrash: Attempting to start /usr/lib/drkonqi
okt 29 06:58:20 Arch plasmashell[94521]: 31 -- bugaddress=submit@bugs.kde.org
okt 29 06:58:20 Arch kwin_x11[94531]: 21 -- exe=/usr/lib/drkonqi
okt 29 06:58:20 Arch plasmashell[94521]: 12 -- startupid=0
okt 29 06:58:20 Arch kwin_x11[94531]: 13 -- platform=xcb
okt 29 06:58:20 Arch plasmashell[94521]: KCrash: crashing... crashRecursionCounter = 2
okt 29 06:58:20 Arch plasmashell[94521]: KCrash: Application Name = drkonqi path = /usr/lib pid = 94521
okt 29 06:58:20 Arch plasmashell[94521]: KCrash: Arguments: /usr/lib/drkonqi --appname plasmashell --apppath /usr/bin --signal 7 --pid 94463 --appversion 5.26.1 --programname Plasma --bugaddress submit@bugs.kde.org --startupid 0
okt 29 06:58:20 Arch kwin_x11[94531]: 11 -- display=:0
okt 29 06:58:20 Arch systemd[1]: Started Process Core Dump (PID 94542/UID 0).
okt 29 06:58:20 Arch kwin_x11[94531]: 16 -- appname=drkonqi
okt 29 06:58:20 Arch kwin_x11[94531]: 17 -- apppath=/usr/lib
okt 29 06:58:20 Arch systemd-coredump[94544]: Removed old coredump core.kwin_x11.1000.f5c4b03274d0499c95894cd84ed629b5.92429.1667023025000000.zst.
okt 29 06:58:20 Arch kwin_x11[94531]: 9 -- signal=7
okt 29 06:58:20 Arch kwin_x11[94531]: 10 -- pid=94531
okt 29 06:58:21 Arch systemd-coredump[94544]: Process 94521 (drkonqi) of user 1000 dumped core.
[...]

#0  0x00007f373af3364c in ?? () from /usr/lib/libc.so.6
#1  0x00007f373aee3958 in raise () from /usr/lib/libc.so.6
#2  0x00007f373c62d9cc in KCrash::defaultCrashHandler (sig=7) at /usr/src/debug/kcrash/src/kcrash.cpp:618
#3  <signal handler called>
#4  0x00007f373af3424d in pthread_mutex_init () from /usr/lib/libc.so.6
#5  0x00007f373b8172b6 in pthreadLock::initialize (this=0x560430d60c30, processSharingSupported=@0x7ffc2fd9c8b0: false) at /usr/src/debug/kcoreaddons/src/lib/caching/kshareddatacache_p.h:199
#6  0x00007f373b81a619 in KSharedDataCache::Private::mapSharedMemory (this=0x560430d23510) at /usr/src/debug/kcoreaddons/src/lib/caching/kshareddatacache.cpp:1148
#7  0x00007f373b819a0a in KSharedDataCache::Private::Private (this=0x560430d23510, name=..., defaultCacheSize=10485760, expectedItemSize=0) at /usr/src/debug/kcoreaddons/src/lib/caching/kshareddatacache.cpp:974
#8  0x00007f373b814fcd in KSharedDataCache::KSharedDataCache (this=0x560430d60bf0, cacheName=..., defaultCacheSize=10485760,expectedItemSize=0) at /usr/src/debug/kcoreaddons/src/lib/caching/kshareddatacache.cpp:1362
#9  0x00007f373a60bac1 in KIconLoaderPrivate::init (this=0x560430d1de70, _appname=..., extraSearchPaths=...) at /usr/src/debug/kiconthemes/src/kiconloader.cpp:398
#10 0x00007f373a60afbd in KIconLoaderPrivate::KIconLoaderPrivate (this=0x560430d1de70, _appname=..., extraSearchPaths=..., qq=0x7f373a649f60 <(anonymous namespace)::Q_QGS_globalIconLoader::innerFunction()::holder>) at /usr/src/debug/kiconthemes/src/kiconloader.cpp:238
#11 0x00007f373a60b850 in KIconLoader::KIconLoader (this=0x7f373a649f60 <(anonymous namespace)::Q_QGS_globalIconLoader::innerFunction()::holder>, appname=..., extraSearchPaths=..., parent=0x0) at /usr/src/debug/kiconthemes/src/kiconloader.cpp:375
#12 0x00007f373a613fe2 in Holder::Holder (this=0x7f373a649f60 <(anonymous namespace)::Q_QGS_globalIconLoader::innerFunction()::holder>) at /usr/src/debug/kiconthemes/src/kiconloader.cpp:1656
#13 0x00007f373a61408c in (anonymous namespace)::Q_QGS_globalIconLoader::innerFunction () at /usr/src/debug/kiconthemes/src/kiconloader.cpp:1656
#14 0x00007f373a6149f4 in QGlobalStatic<KIconLoader, (anonymous namespace)::Q_QGS_globalIconLoader::innerFunction, (anonymous namespace)::Q_QGS_globalIconLoader::guard>::operator() (this=0x7f373a649f80 <globalIconLoader>) at /usr/include/qt/QtCore/qglobalstatic.h:138
#15 0x00007f373a6140db in KIconLoader::global () at /usr/src/debug/kiconthemes/src/kiconloader.cpp:1660
#16 0x00007f3733696371 in ?? () from /usr/lib/qt/plugins/platformthemes/KDEPlasmaPlatformTheme.so
#17 0x00007f373bacb5d0 in QIcon::fromTheme(QString const&) () from /usr/lib/libQt5Gui.so.5
#18 0x000056042f12c3fd in ?? ()
#19 0x00007f373aece290 in ?? () from /usr/lib/libc.so.6
#20 0x00007f373aece34a in __libc_start_main () from /usr/lib/libc.so.6
#21 0x000056042f12e425 in ?? ()

plasmashell

#0  0x00007fa7a582864c n/a (libc.so.6 + 0x8864c)
#1  0x00007fa7a57d8958 raise (libc.so.6 + 0x38958)
#2  0x00007fa7a7d77af5 KCrash::defaultCrashHandler(int) (libKF5Crash.so.5 + 0x6af5)
#3  0x00007fa7a57d8a00 n/a (libc.so.6 + 0x38a00)
#4  0x00007fa7a582924d pthread_mutex_init (libc.so.6 + 0x8924d)
#5  0x00007fa7a67fa2b6 pthreadLock::initialize(bool&) (libKF5CoreAddons.so.5 + 0x2b2b6)
#6  0x00007fa7a67fd619 KSharedDataCache::Private::mapSharedMemory() (libKF5CoreAddons.so.5 + 0x2e619)
#7  0x00007fa7a67fca0a KSharedDataCache::Private::Private(QString const&, unsigned int, unsigned int) (libKF5CoreAddons.so.5 + 0x2da0a)
#8  0x00007fa7a67f7fcd KSharedDataCache::KSharedDataCache(QString const&, unsigned int, unsigned int) (libKF5CoreAddons.so.5 + 0x28fcd)
#9  0x00007fa7a7fd0593 n/a (libKF5Plasma.so.5 + 0x70593)
#10 0x00007fa7a7fd082f Plasma::Theme::findInCache(QString const&, QPixmap&, unsignedint) (libKF5Plasma.so.5 + 0x7082f)
#11 0x00007fa7a7fc357d n/a (libKF5Plasma.so.5 + 0x6357d)
#12 0x00007fa7a7fc5061 n/a (libKF5Plasma.so.5 + 0x65061)
#13 0x00007fa7a7fc524e Plasma::FrameSvg::mask() const (libKF5Plasma.so.5 + 0x6524e)
#14 0x00007fa7a8234add n/a (libKF5PlasmaQuick.so.5 + 0x26add)
#15 0x00007fa7a742d651 QQmlObjectCreator::finalize(QQmlInstantiationInterrupt&) (libQt5Qml.so.5 + 0x2a4651)
#16 0x00007fa7a73d3bdf QQmlIncubatorPrivate::incubate(QQmlInstantiationInterrupt&) (libQt5Qml.so.5 + 0x24abdf)
#17 0x00007fa7a73d4586 QQmlEnginePrivate::incubate(QQmlIncubator&, QQmlContextData*)(libQt5Qml.so.5 + 0x24b586)
#18 0x00007fa7a73d4802 QQmlComponent::create(QQmlIncubator&, QQmlContext*, QQmlContext*) (libQt5Qml.so.5 + 0x24b802)
#19 0x00007fa7a7ec46fd KDeclarative::QmlObject::completeInitialization(QHash<QString, QVariant> const&) (libKF5Declarative.so.5 + 0xa6fd)
#20 0x00007fa7a7ec4941 n/a (libKF5Declarative.so.5 + 0xa941)
#21 0x00007fa7a81cf711 KQuickAddons::QuickViewSharedEngine::setSource(QUrl const&) (libKF5QuickAddons.so.5 + 0x11711)
#22 0x000055bef93033e5 n/a (plasmashell + 0x323e5)
#23 0x000055bef9319ed4 n/a (plasmashell + 0x48ed4)
#24 0x000055bef931bc67 n/a (plasmashell + 0x4ac67)
#25 0x00007fa7a5e7b381 n/a (libQt5Core.so.5 + 0x2bd381)
#26 0x00007fa7a81e6916 KActivities::Consumer::serviceStatusChanged(KActivities::Consumer::ServiceStatus) (libKF5Activities.so.5 + 0xc916)
#27 0x00007fa7a5e7b381 n/a (libQt5Core.so.5 + 0x2bd381)
#28 0x00007fa7a81f826a n/a (libKF5Activities.so.5 + 0x1e26a)
#29 0x00007fa7a81eaffc n/a (libKF5Activities.so.5 + 0x10ffc)
#30 0x00007fa7a5e7b530 n/a (libQt5Core.so.5 + 0x2bd530)
#31 0x00007fa7a67a6d74 QDBusPendingCallWatcher::finished(QDBusPendingCallWatcher*) (libQt5DBus.so.5 + 0x57d74)
#32 0x00007fa7a5e6e520 QObject::event(QEvent*) (libQt5Core.so.5 + 0x2b0520)
#33 0x00007fa7a6b8bb1c QApplicationPrivate::notify_helper(QObject*, QEvent*) (libQt5Widgets.so.5 + 0x178b1c)
#34 0x00007fa7a5e4ab88 QCoreApplication::notifyInternal2(QObject*, QEvent*) (libQt5Core.so.5 + 0x28cb88)
#35 0x00007fa7a5e4b693 QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (libQt5Core.so.5 + 0x28d693)
#36 0x00007fa7a5e91728 n/a (libQt5Core.so.5 + 0x2d3728)
#37 0x00007fa7a3d6481b g_main_context_dispatch (libglib-2.0.so.0 + 0x5581b)
#38 0x00007fa7a3dbaec9 n/a (libglib-2.0.so.0 + 0xabec9)
#39 0x00007fa7a3d630d2 g_main_context_iteration (libglib-2.0.so.0 + 0x540d2)
#40 0x00007fa7a5e9550c QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (libQt5Core.so.5 + 0x2d750c)
#41 0x00007fa7a5e4332c QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (libQt5Core.so.5 + 0x28532c)
#42 0x00007fa7a5e4de59 QCoreApplication::exec() (libQt5Core.so.5 + 0x28fe59)
#43 0x000055bef92f410f n/a (plasmashell + 0x2310f)
#44 0x00007fa7a57c3290 n/a (libc.so.6 + 0x23290)
#45 0x00007fa7a57c334a __libc_start_main (libc.so.6 + 0x2334a)
#46 0x000055bef92f44c5 n/a (plasmashell + 0x234c5)

This has been annoying me for years because it's very easy to accidentally fill all disk space and I've encountered some other cache related issues aswell (crashes/deadlocks, sometimes rebooting PC didn't even help but only deleting cache files).

Only now I really looked into this.

So for this particular case the issue is that because of CoW, writing even a single bit such as mutex lock (or even just unlock) will need enough free space to copy whole block and if there isn't enough free space for that then write will fail and kernel will signal SIGBUS.

I actually don't know why we're using mmap'ed files rather than memfd or shm which wouldn't need to be persisted on disk. Anyway turns out it wasn't that hard to implement support for handling SIGBUS for these cases and gracefully recover with fallback to MAP_ANONYMOUS.

This MR contains:

  1. Implementation of KUnwind - small wrapper around libunwind and Windows implementation because libunwind doesn't have it. It's needed to get parent stack pointer for restoring stack after longjmp (used in KSignalMonitor)
  2. Implementation of KSignaler - signals are shared resource and there can be many users who might want to use them so this allows everyone to use them without stepping on each other. I also added some notes about how this same approach can be used to handle other global events aswell (eg. xcb) that are not related to any window in particular.
  3. Implementation of KSignalMonitor - a handy class to handle/monitor signals in same way as we use C++ exceptions. Basically this achieves cross-platform Structured Exception Handling (SEH).

Microsoft SEH is MSVC specific extension

__try
{ 
  auto r = 1 / 0; 
} 
__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{ 
  throw DivByZero();
}

With KSignalMonitor

#ifdef WIN
auto signal = EXCEPTION_INT_DIVIDE_BY_ZERO;
#else
auto signal = SIGFPE;
#endif

KSignalMonitor monitor(signal);
monitor.start([](KSignalMonitor::Signal signal) {
  throw DivByZero();
});

auto r = 1 / 0; 
  1. Made KSignalHandler to use KSignaler
  2. Extracted various parts of KSharedDataCache to seperate files as it was huge file.
  3. Refactored KSharedDataCache::Private so that mmap handling is in sperate class KSDCMapping
  4. Do ensureFileAllocated first because file.resize only marks the file size, it doesn't check if there's actually enough free space. Currently without this change if you get full disk and delete cache then file.resize will extend it to expected size but ensureFileAllocated will fail and then on next run file.size() >= size will pass even when there's not enough disk space allocted for file causing failure later.
  5. Try mlock mapped region because mmap doesn't guarentee that page will be actually loaded. It can fail later on first access.
  6. Monitor for SIGBUS and handle those case with fallback to MAP_ANONYMOUS.

To reproduce these issues you can

$ dd if=/dev/zero of=/tmp/fs.img bs=120M count=1
$ mkfs.btrfs /tmp/fs.img
$ mkdir /tmp/full
$ sudo mount /tmp/fs.img /tmp/full
$ sudo chown $USER /tmp/full/
$ mkdir /tmp/full/.cache/
# To simulate 7th point
$ truncate -s 10547304 /tmp/full/.cache/icon-cache.kcache
# For 8th
# fallocate -l 10547304 /tmp/full/.cache/icon-cache.kcache
$ dd if=/dev/urandom of=/tmp/full/junk
$ HOME=/tmp/full konsole
# See the crash :P
$ rm -f /tmp/full/junk
$ HOME=/tmp/full konsole
# Without closing konsole
$ dd if=/dev/urandom of=/tmp/full/junk
# Now in konsole open menu items (eg. `Help -> About KDE`)
# And another crash :P
# To see real glory :)
$ dd if=/dev/urandom of=~/.cache/junk

With this MR those above steps doesn't crash konsole and works very well. Also no matter what/how I tried I wasn't able to get any crashes anymore related to KSharedDataCache. There are still other issues but those are seperate story :D

Edited by Dāvis Mosāns

Merge request reports