Skip to content

add coredumpd support

Harald Sitter requested to merge work/coredumpd-service-tango into master

This only does something when used with a new enough KCrash.

Coredumpd is a coredump handler that comes with systemd. When a process dumps its core it is sent to coredumpd which records the crash in the systemd journal and stores the core on disk. This allows us to pick up the crash after the fact and file a bug report. For example when software crashes on session logout.

To facilitate bug reporting KCrash writes the metadata we ordinarily get through ARGV to disk as an INI file. Since we still want to support both operation modes this commit introduces large amounts of extra tooling specifically meant to connect coredumpd crashes, to metadata, to drkonqi argvs. All of this does depend on systemd and is generally working with version 245, but 248 is vastly more recommended because of various refinements and bugfixes.

Architecturally a coredumpd crash works like this:

KCrash

The app crashes. KCrash's signal handler runs. It records the metadata to a file in ~/.cache. Re-raises the signal to then trigger a core dump.

coredumpd

Coredumpd gets invoked by the kernel, captures the core, records the crash with all the metadata it has available (proc maps, pid, time, etc.) to journald. It does this by invoking an instance of systemd-coredump@.service

drkonqi-coredump-processor@.service

This is wanted by systemd-coredump@.service and instantiated using the same instance "name" as coredump@ (this then allows us to find the correct crash). The processor connects to journald and searches/waits for the crash for the correct coredump@ instance to appear in the journal. Once the crash record has been found a connection to a user-scope socket is opened...

drkonqi-coredump-launcher.socket

Is a user-scope socket that purely exists for drkonqi-coredump-processor@.service to talk to. When a connection is opened an instance of drkonqi-coredump-launcher@.service is spun up to deal with the traffic.

drkonqi-coredump-launcher@.service

Is the actual launcher service, it is socket activated from system-scope. On the socket it gets the crash metadata streamed from the system-level processor (thereby eliminating the need to talk to journald again - the processor forwards the data it looked up).

The launcher then glues the coredumpd metadata into the same file as the KCrash metadata, turning the .ini file into a comprehensive record of the crash.

Once the file is complete it forks drkonqi with the same arguments as though KCrash had invoked it directly so the user can file a crash report.

Drkonqi

Drkonqi itself has grown a new CoredumpBackend analogous to the KCrash backend. Its main concern is preparing the core for tracing. Depending on the systemd version that is either delegated to coredumpctl (the CLI for coredumpd) or partially done on our end. In either event coredumpctl is a runtime requirement to not have to concern ourselves with where a core is actually stored from the coredumpd side of things (could be compressed, on disk, or in journal).

gdbrc now also supports the coredump backend by extending the commandline templates with core-based tracin, for the coredump backend only. As a side effect, debuggers now can have a corefile template variable which is the path to the on-disk corefile in the event that the legacy coredumpd backend is used. Newer coredumpd-248+ allows us to invoke gdb through coredumpctl directly, eliminating the need to faff about with core files manually on our end.

Everything else stays the same. As far as the UI bits are concerned nothing changes between a kcrash backend and a coredump backend.

Metadata file presence currently is doubling as "this crash has not been dealt with" indicator. As such, metadata files are only cleaned up if the user somehow interacts with drkonqi to discard the dialog. This is to assist with future development to implement "an application has crashed in the past" style behavior (e.g. when apps crashed on logout).

drkonqi-coredump-cleanup.{service,timer}

Is a cleanup system in case crashes fall through the cracks and don't get their metadata files clean up. This is largely a stop-gap measure because this commit does not deal with actually picking up crashes that happened at logout - this requires additional UI engineering first.

Edited by Harald Sitter

Merge request reports