when I say "do code completion" I also mean actually writing some code like
a.
to trigger a partial invalidation
How much of the lag in re-parsing the file after a change is actually due to the code completion feature, and is there any way to speed up that process?
FWIW, I'm a fast but not a very good blind-typist so most of the time I watch my fingers and don't use the code completion feature at all (also because it happens too often that a new choice inserts itself right at the place of the one I selected, after I selected it).
@rjvbb PCH files used by the build system have nothing to do with the preambles used for kdevelop's language analysis or code completion
I know that, but I understood the question as being about additional forms of precompiling, which could easily generate the same kind of issues.
Igor Kushnir wrote on 20230309::10:13:45 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
Does anyone know how KDevelop benefits from the precompiled preambles and what regressions are possible when they are missing?
All I can tell is that I typically deactivate precompiled header files in projects where I need KDevelop's parser. And when I use ccache btw because it's too tricky to get the 2 to co-operate optimally.
Luigi Toscano wrote on 20230309::00:45:49 re: "Re: QtCurve | Process and install the translation files if requested. (6951b91d)"
Hi,
Do you think it would make sense to backport this to the 1.9 branch, which is still tracked as stable i18n branch?
I don't see why not. There's so little development going on with QtCurve that it's probably a very simple backport too.
René J.V. Bertin (6951b91d) at 13 Feb 17:27
Process and install the translation files if requested.
Igor Kushnir wrote on 20230206::17:17:29 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
In order to prevent this directory proliferation, tests can use a common temporary directory /tmp/kdevelop-1000/tests, which is not removed on start
I haven't yet looked in detail but this suggest you do not just use the simple approach of unlinking the entire temp dir on start (or on a clean exit)?
Igor Kushnir wrote on 20230204::17:17:59 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
There is no cross-platform way to get the real user name.
FWIW, I did mean the user name (login name), not the real name.
Igor Kushnir wrote on 20230203::08:42:38 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
@rjvbb, how do you get the user ID?
I haven't really bothered with non-UNIX systems (I guess only MS Windows?) so I simply use getuid(). I think even MSWin has a POSIX compatibility library so I wouldn't be surprised if the function (or an equivalent) exists there too.
I also didn't research whether session IDs are unique, but I would keep the possibility in mind that people might figure out how to share session directories (beyond them being on a shared/collaborative volume) so it seems wise to use more than just the session ID.
User names are also supposed to be unique; maybe there's a cross-platform way to get at that information and use it instead of a meaningless UUID? Not that it's crucial but I like for this kind of path name to be human-readable (and have often cursed the fact KDevelop's session and cache directories aren't).
Besides, how common is the scenario where multiple users are logged in a system and run KDevelop simultaneously?
Probably not at all, and certainly not on MSWin.
Igor Kushnir wrote on 20230127::16:02:28 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
Properly copying file permissions of the original temporary directory to the session-specific temporary directory would require detecting and applying the sticky bit.
Of course. Doh, for some reason the term "sticky" always has me thinking that it causes new entries to be created with the permissions of the parent...
No way to get or set the sticky bit using [Qt API]
KDevelop is not restricted to using Qt APIs only, is it? Also, on which directory would you want to set the sticky bit?
I imagine restricting the permissions to the owner can break processes that intentionally share temporary files with processes that run as other users.
I can't imagine that this "is still a thing", to be honest; there are cleaner ways to do this.
Perhaps KDevelop should use the permissions of the original temporary directory to create the session-specific temporary directory?
That would definitely get my vote.
Do you know why the permissions of KDevelop's temporary directory should allow only owner access? My system /tmp, which is used by all processes, is readable, writable and executable by everyone.
I don't think it is that crucial for KDevelop and the files it stores in that directory. Worst case scenario would be that someone else on your computer sees some snippet of code you're working on. It might be more strict for helper apps that also get to use that same directory: we can't foresee what kind of helpers KDevelop users might use and how sensitive information in the temp files those helpers create is. Named pipes, sockets and mmapped files in the temp directory should probably also better be hidden from others (maybe it's possible to use mmapped files as an attack vector??)
That rings a bell :))
I cannot remember if I delved into the QTemperoraryDir implementation to see if it could be forked and adapted, but I do see this now in my own patch:
+ // make the directory exclusive to us.
+ QFile::setPermissions(tmpLocation, QFileDevice::ReadOwner | QFileDevice::WriteOwner | QFileDevice::ExeOwner)
Separating this step from creating the directory could create a race condition where someone might get naught with the just created directory in the split second that it still has its default permissions. There are probably ways to prevent that - setting exclusive permissions on the tmpdir's parent temporarily for instance. IIRC you can define the default permissions for new directory entries via an env. variable but that's probably not cross-platform enough.
Igor Kushnir wrote on 20230115::19:08:32 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
Someone has to check this for all files placed by KDevelop and its dependencies in the temporary directory.
I thought this was about clang's temp. files?!
Igor Kushnir wrote on 20230115::15:43:17 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
I mean the issue is that the application/library that owns the files may rely on them remaining visible until it cleans them on exit.
That also means that the devs of said application/library can know exactly to what extent these files can be unlinked without unwanted side-effects.
Unix guarantees that all operations on the file content remain possible through the file descriptor or pointer you already have (including seeking in the file), as long as you don't close that descriptor.
R
Igor Kushnir wrote on 20230115::15:16:04 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
What risk?
That some files should remain visible in the filesystem, for whatever reason. Anything can happen if you remove random application/library-owned files while it is running.
Quickly: no files should remain visible if that method uses unlink(2)
as I think it does and handles errors from that system call appropriately.
The question is what happens if temp files are created after the tmpdir removal - in my implementation their creation should fail because the directory is no longer there (and that failure appears not to be a problem).
Igor Kushnir wrote on 20230115::10:26:27 re: "Re: KDevelop | Draft: Clean up Clang's temporary files on KDevelop start (!283 (merged))"
BTW, the tmpdir is called
kdevelop-tmp-<uid>-<sessionUUID>
so it's unique for each user and session but identical across restarts.The user ID should be included in the path here before this is merged.
uid = user ID ;)
The risk of
m_tmpDir->removeRecursively();
(that some files should remain visible) does not justify the benefit.
What risk?
Most temporary files created by KDevelop are tiny and can well stay on disk until the next time the session is opened.
Remember, this is not about KDevelop's temp. files. Individual pch files are rarely large, but there can be a lot of them. I think I contributed a project re-parse function (in the context menu when you right-click on a project in the side-bar); as I said pch files created during that operation get unlinked with my KDevelop build, does this also happen for you? If not you might end up with hundreds of 40Mb files - and I'd call that using up a lot of temp space. (Of course that would happen each time KDevelop considers a full reparse necessary.)
My analysis of llvm-project/clang/lib/Frontend/PrecompiledPreamble.cpp shows that such unlinking isn't straightforward for the preamble files, e.g. because of
PrecompiledPreamble::getSize()
, which checks the file size at the preamble file path.
My guess would be that this information is cached somewhere at a system level, even after unlinking the file. Good question, I'll have to remember to look into that! (I'm a bit surprised also that clang would have to rely on the filesystem to know the size of a file it wrote itself but I suppose it can be the cheap way out.)
Very few users kill KDevelop
You mean that very few users are bitten by KDevelop's habit of hanging on exit? ;) Still happens often to me, to the point that I have another hack that raises a SIGHUP 1 min. after the final core cleanup step before handing off execution to the global destruction phase. My temp. dir is long gone by then, of course.
I'd rather avoid creating/modifying environment variables altogether to avoid potential issues. This would be a last-resort approach.
I just observed that clang already uses $TMPDIR in a relevant way. Checking an additional variable should be a cheap change - I'm assuming that preamble file names are generated in a single location and that the full file name itself is stored in a variable. Surely reading out such a variable properly (with a mutex of some sort) cannot be a bottleneck for clang's performance.
I am not prepared to spend so much time on implementing something that could be useful, but also could fail, have downsides and cause regressions.
Far from me to suggest it'd have to be you ;) It does seem like a nice student/SoC/similar project to analyse the question and draft an implementation. I seem to recall Milian once had the idea of having a centralised parser that's not session-specific; seems this would be a first tep towards that. But it's a different topic so let's just leave it at that.
R.
I am not very interested in OOM prevention, because I have ample swap space. Swapping out is slow, so there is ample time to kill the buggy process that consumes too much memory.
The keyword here being "buggy". Do we want to kill (or quit) KDevelop if, say, by mistake we opened a file that causes RAM usage to peak? Knowing that the next time we open the session that file might be reopened - currently I often do restart a session if I notice it had been closed with ObjC/++ files open which I am not interested in at the moment.
Having ample swap space is great, but unless things have changed Linux isn't better at reclaiming swap (as reported by swapon -s) than e.g. Darwin is.
which gets unlinked on exit, btw
This is interesting. Could you elaborate?
I simply do m_tmpDir->removeRecursively();
in the CorePrivate dtor, before clearing the sessionController(). This is safe on Unix-like OSes because every file that is still open will not actually be deleted but its directory entry cleared. It's an old Unix trick to mark (open) files as volatile. You could do this with any opened file (that you don't want to be able to close and then reopen) immediately after creating it in which case it'll get cleaned up even in case of a hard crash, but evidently you can't do it with the session-specific tmpdir itself.
BTW, the tmpdir is called kdevelop-tmp-<uid>-<sessionUUID>
so it's unique for each user and session but identical across restarts.
If the per-session temporary directory can be reliably removed when KDevelop is killed or crashes
Killing with a SIGTERM, SIGINT or SIGHUP is safe and a priori my implementation will remove the tmpdir reliably in that case. Crashes are different of course. The only (almost) fail-safe way I see to do cleanup in that case would be to delegate the action to a simple dedicated helper command which gets launched immediately after creating the tmpdir and then waits until KDevelop has exited and then does a rm -rf
of the specified directory. It's quite possible that that's also the most cross-platform solution.
If files can be reliably removed as well, libclang could even remove the preamble files itself thus eliminating any need for the temporary directory option.
As I said above, this should be possible on any Unix variant as long as those files are opened and closed only once - this is also why the file removal function from libc is called unlink()... I'm not aware that you can achieve the same thing the same way on MSWin but I wouldn't be surprised if that OS has a way of opening/creating files where you tell it that it should be discarded whenever it is closed. And is this whole topic (or at least the LLVM patch) even completely relevant for MSWin?
If libclang can support TMPDIR it should also be able to support a dedicated variable without any additional problems related to multithreading.
This is not the case, because the dedicated variable has to be global and can be modified by any thread at any time. The environment variables are currently read each time a temporary directory location is requested. The value is never cached globally.
I wasn't clear enough, evidently I meant supporting a dedicated environmental variable, read out the same way as TMPDIR!
How would this look if KDevelop ran its parser in a separate daemon process like I think Qt Creator does?
I don't even want to think about this possibility, because implementing it would most likely be too time-consuming to justify the effort.
I'd agree with that if it were only for cleaning up tempfiles. But I presume it would make the caching of precompiled information more efficient and thus almost inevitably also speed up parsing of files. Plus the IDE no longer goes down when libclang crashes, which is a BIG plus for me. Anyway, I was just asking. There has been a discussion about the amount of work it'd be. I wasn't completely convinced back then but also not motivated enough to figure out the API myself so I'll just leave it at this.
The .pch.tmp files are most likely stored by the same code as the .pch files.
They're undoubtedly renamed to or moved over the previous pch file of the same name. The latter thing probably never happens with pch files generated by/for KDevelop's parser but it seems likely that the code generating those files is also used for generated pre-compiled header files.
The ones created by KDevelop itself can be more or less easily found in the source code. Though build system, compiler and user-executable child processes can also create files in /tmp...
Exactly, the finding can be as easy as those files have specific names or are created through a dedicated function. The nice thing with a session-specific tmpdir is of course that a priori all the temp files related to the given session are created in there. And for those who want it can just as easily be on a tmpfs volume.
R.
@rjvbb, thanks for your opinion and experience. My comment above doesn't recommend unconditionally storing these temporary files in RAM, only considers this possibility, so the aggressiveness of your reply is misplaced.
I wasn't intending to be aggressive, nor aware that I was.
I haven't been following this issue closely, I was just reacting to the possibility that KDevelop's memory usage might grow (even more). In fact, I may have missed quite a bit of KDevelop development because I've been assuming it was more or less halted since I never got any notifications about new branches after the 5.6.2 release... I'm also not exactly interested in doing any LLVM development (takes hours to build even on my fastest current machine), but I do wonder if keeping a pch format file in memory is the most efficient approach, especially if there are doubts:
Are the preamble files rarely referenced by libclang and thus preferable to swap out first?
I do see that they don't compress very well with cheap compressors like LZ4.
I experience RAM shortage myself sometimes with my meager 16 GB. Thankfully placing a swap file on an SSD prevents complete freezes during heavy swapping. I keep /tmp on tmpfs, because the RAM is rarely used up. According to the tmpfs Wikipedia page, GNU/Linux implements some out-of-memory preventions.
But Linux will also, by default, pretend that RAM allocations never fail, so I wonder how that affects the tmpfs OOM preventions!
Though I think swapping out anything from RAM to disk, including the preamble files, is unfortunate and best avoided.
Wouldn't it be more appropriate for this kind of resource to keep storing it as a file, and mmap it? If that's not what is already being done, of course.
@rjvbb, if you prefer some of the approaches over others strongly enough to do some of the implementation work yourself,
I'm just not that masochistic ;) and not unhappy I've managed not to have my brains hooked to a computer all day.
FWIW, libclang already respects TMPDIR, and I have been using a patch for the past 3 years or so that makes KDevelop create its own subdirectory under $TMPDIR (and then change the env.var to reflect that specific path). IIRC that patch was considered problematic because it also affects all helper applications which recognise TMPDIR. While true that has never caused an actual problem for me and I have no issue personally with the idea that those helpers use the same session-specific tmpdir (which gets unlinked on exit, btw).
If libclang can support TMPDIR it should also be able to support a dedicated variable without any additional problems related to multithreading.
Just out of curiosity: there is currently 1 pch file that gets created per parsed file opened in the editor (those are not cleaned up AFAICT) plus a number of pch files related to the number of threads used for the full-project background parser (which for some reason do get cleaned up or at least unlinked ... but that could be one of my own hacks). There's probably a lot of redundancy in those files, which isn't something one typically wants for an in-RAM representation. How would this look if KDevelop ran its parser in a separate daemon process like I think Qt Creator does?
A relevant question: has anyone ever seen some files other than preamble-*.pch stored in the temporary directory by libclang?
There are .pch.tmp files that appear when the parser is runnig but other than that: how would one tell them apart from files created by KDevelop itself?
R.
/tmp is on tmpfs, i.e. in RAM, by default on GNU/Linux (or is this distro-specific?)
This is definitely distro-specific IMHO. In my case /tmp sits on a dedicated ZFS dataset on a HDD, $TMPDIR idem but on a small work SSD.
, so for most users there would be no difference.
Doesn't tmpfs have some form of disk-based backup/swap strategy that is maybe a bit less generic than using actual swap memory?
However, a system can have insufficient RAM
Define sufficient? I don't do huge projects but there are times where even a small source file causes clang to create a few gigabytes of tempfile(s) - presumably thanks to the amount of headerfiles being loaded (and it gets even worse when you use ObjC/++). I may be a bit of a dinosaur here, but I still think that an IDE should be able to work with an amount of RAM that isn't at least 5x the minimum amount of RAM specified by the distro (or to be more specific: 8Gb should be amply enough).
But maybe the team doesn't see a problem making KDevelop an IDE that's "not for small projects/players"?
R.
PS: disk space will always be cheaper, more plentiful and easier to increase than RAM!