 21 Jan, 2021 1 commit


Script Kiddy authored

 18 Jan, 2021 1 commit


Script Kiddy authored

 14 Jan, 2021 1 commit


Script Kiddy authored

 13 Jan, 2021 1 commit


Script Kiddy authored

 12 Jan, 2021 1 commit


Script Kiddy authored

 11 Jan, 2021 2 commits



Script Kiddy authored

 05 Jan, 2021 1 commit


Milian Wolff authored
It will become a required dependency in the future, but isn't yet.

 25 Dec, 2020 1 commit


Denis Khalikov authored
BuildID based standalone debug info might be ignored for some shared objects. See details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86198

 14 Dec, 2020 1 commit


Milian Wolff authored
Thanks Nuno

 01 Dec, 2020 1 commit


Arran CudbardBell authored
Looks like the KDE homebrew tap maintainers switched hosting about 5 months back and started depending on more upstream formulae. This fixes the tap command and updates the names of the dependencies.

 26 Nov, 2020 7 commits


Milian Wolff authored
Use a monotonic_buffer_resource to show what we can gain by using it for this purpose: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.526,48 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 910.944 pagefaults:u # 0,165 M/sec 20.395.572.533 cycles:u # 3,691 GHz (83,34%) 58.593.916 stalledcyclesfrontend:u # 0,29% frontend cycles idle (83,34%) 15.917.051.157 stalledcyclesbackend:u # 78,04% backend cycles idle (83,33%) 22.396.075.369 instructions:u # 1,10 insn per cycle # 0,71 stalled cycles per insn (83,34%) 4.567.250.732 branches:u # 826,431 M/sec (83,34%) 21.088.217 branchmisses:u # 0,46% of all branches (83,32%) 5,527112576 seconds time elapsed 4,724403000 seconds user 0,780738000 seconds sys Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::pmr::slist': 4.096,87 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 910.958 pagefaults:u # 0,222 M/sec 15.097.193.259 cycles:u # 3,685 GHz (83,30%) 30.032.956 stalledcyclesfrontend:u # 0,20% frontend cycles idle (83,30%) 12.890.862.885 stalledcyclesbackend:u # 85,39% backend cycles idle (83,32%) 9.589.139.418 instructions:u # 0,64 insn per cycle # 1,34 stalled cycles per insn (83,38%) 1.683.224.838 branches:u # 410,857 M/sec (83,38%) 19.650.364 branchmisses:u # 1,17% of all branches (83,31%) 4,097475027 seconds time elapsed 3,366684000 seconds user 0,714428000 seconds sys ``` Memory consumption is pretty much unaffected, but that's obviously OK: ``` slist: Maximum resident set size (kbytes): 3649904 pmr::list: Maximum resident set size (kbytes): 3649888 ```

Milian Wolff authored

Milian Wolff authored
This has a significant impact on the vectorbased trees, as we now need to do one full run over the tree to set the parents. Now, the list based trees are already significantly better: ``` for tag in QVector std::vector std::list boost::slist; do perf stat ~/projects/build/heaptrack/tests/benchmarks/bench_tree $tag; done 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree QVector': 7.258,90 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 1.066.324 pagefaults:u # 0,147 M/sec 28.597.338.274 cycles:u # 3,940 GHz (83,30%) 106.681.727 stalledcyclesfrontend:u # 0,37% frontend cycles idle (83,33%) 18.253.436.626 stalledcyclesbackend:u # 63,83% backend cycles idle (83,34%) 35.021.338.418 instructions:u # 1,22 insn per cycle # 0,52 stalled cycles per insn (83,35%) 6.910.244.080 branches:u # 951,968 M/sec (83,34%) 24.486.655 branchmisses:u # 0,35% of all branches (83,33%) 7,259623426 seconds time elapsed 6,407177000 seconds user 0,823973000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::vector': 6.559,15 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 913.372 pagefaults:u # 0,139 M/sec 25.178.246.000 cycles:u # 3,839 GHz (83,31%) 61.494.028 stalledcyclesfrontend:u # 0,24% frontend cycles idle (83,32%) 18.412.909.524 stalledcyclesbackend:u # 73,13% backend cycles idle (83,35%) 25.738.738.521 instructions:u # 1,02 insn per cycle # 0,72 stalled cycles per insn (83,35%) 5.149.522.578 branches:u # 785,090 M/sec (83,35%) 23.133.623 branchmisses:u # 0,45% of all branches (83,32%) 6,559888601 seconds time elapsed 5,804386000 seconds user 0,730870000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list': 6.328,67 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 1.068.075 pagefaults:u # 0,169 M/sec 23.138.038.923 cycles:u # 3,656 GHz (83,31%) 57.658.887 stalledcyclesfrontend:u # 0,25% frontend cycles idle (83,31%) 18.076.221.406 stalledcyclesbackend:u # 78,12% backend cycles idle (83,31%) 22.337.549.206 instructions:u # 0,97 insn per cycle # 0,81 stalled cycles per insn (83,34%) 4.534.944.282 branches:u # 716,572 M/sec (83,36%) 21.234.052 branchmisses:u # 0,47% of all branches (83,35%) 6,329363185 seconds time elapsed 5,384786000 seconds user 0,908987000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.377,76 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 910.954 pagefaults:u # 0,169 M/sec 20.828.436.359 cycles:u # 3,873 GHz (83,32%) 31.529.423 stalledcyclesfrontend:u # 0,15% frontend cycles idle (83,32%) 16.674.025.296 stalledcyclesbackend:u # 80,05% backend cycles idle (83,32%) 22.416.653.118 instructions:u # 1,08 insn per cycle # 0,74 stalled cycles per insn (83,32%) 4.560.100.713 branches:u # 847,956 M/sec (83,38%) 21.152.807 branchmisses:u # 0,46% of all branches (83,34%) 5,378376956 seconds time elapsed 4,596761000 seconds user 0,761155000 seconds sys ```

Milian Wolff authored
This one looks very promising, as it is already at the performance of the current best container, std::vector, but will allows us to easily use a PMR container in the future: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.455,84 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 910.951 pagefaults:u # 0,167 M/sec 20.295.488.575 cycles:u # 3,720 GHz (83,32%) 33.927.249 stalledcyclesfrontend:u # 0,17% frontend cycles idle (83,34%) 16.038.938.738 stalledcyclesbackend:u # 79,03% backend cycles idle (83,34%) 22.319.974.139 instructions:u # 1,10 insn per cycle # 0,72 stalled cycles per insn (83,34%) 4.564.927.240 branches:u # 836,705 M/sec (83,34%) 21.123.736 branchmisses:u # 0,46% of all branches (83,32%) 5,456638203 seconds time elapsed 4,656348000 seconds user 0,777383000 seconds sys ``` Memory: ``` Maximum resident set size (kbytes): 3649960 ```

Milian Wolff authored
Asis, this lies somewhere inbetween std::vector and QVector: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list': 6.227,71 msec taskclock:u # 1,000 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 1.068.071 pagefaults:u # 0,172 M/sec 23.672.061.063 cycles:u # 3,801 GHz (83,33%) 142.641.882 stalledcyclesfrontend:u # 0,60% frontend cycles idle (83,33%) 17.758.938.284 stalledcyclesbackend:u # 75,02% backend cycles idle (83,33%) 22.212.698.253 instructions:u # 0,94 insn per cycle # 0,80 stalled cycles per insn (83,33%) 4.492.589.760 branches:u # 721,387 M/sec (83,33%) 21.193.689 branchmisses:u # 0,47% of all branches (83,33%) 6,228371570 seconds time elapsed 5,293329000 seconds user 0,909763000 seconds sys ``` Peak RSS: ``` Maximum resident set size (kbytes): 4278300 ```

Milian Wolff authored
This is  contrary to what I thought  significantly faster than using QVector here: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std': 5.657,58 msec taskclock # 1,000 CPUs utilized 30 contextswitches # 0,005 K/sec 0 cpumigrations # 0,000 K/sec 913.370 pagefaults # 0,161 M/sec 24.925.687.516 cycles # 4,406 GHz (83,30%) 1.071.926.434 stalledcyclesfrontend # 4,30% frontend cycles idle (83,35%) 15.463.080.632 stalledcyclesbackend # 62,04% backend cycles idle (83,35%) 29.942.331.671 instructions # 1,20 insn per cycle # 0,52 stalled cycles per insn (83,35%) 5.935.680.160 branches # 1049,154 M/sec (83,35%) 24.781.895 branchmisses # 0,42% of all branches (83,30%) 5,658346811 seconds time elapsed 4,769389000 seconds user 0,867054000 seconds sys ``` Also, it uses less peak memory: ``` Maximum resident set size (kbytes): 3659356 ```

Milian Wolff authored
This is basically the status quo used in heaptrack. Current timings: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree qt': 6.389,58 msec taskclock # 1,000 CPUs utilized 11 contextswitches # 0,002 K/sec 0 cpumigrations # 0,000 K/sec 1.066.317 pagefaults # 0,167 M/sec 27.562.650.647 cycles # 4,314 GHz (83,33%) 1.112.140.389 stalledcyclesfrontend # 4,03% frontend cycles idle (83,33%) 15.580.172.700 stalledcyclesbackend # 56,53% backend cycles idle (83,33%) 39.147.522.858 instructions # 1,42 insn per cycle # 0,40 stalled cycles per insn (83,33%) 7.680.291.744 branches # 1202,002 M/sec (83,33%) 25.106.869 branchmisses # 0,33% of all branches (83,34%) 6,390189396 seconds time elapsed 5,500982000 seconds user 0,863794000 seconds sys ``` Peak RSS: ``` Maximum resident set size (kbytes): 4271420 ```

 17 Nov, 2020 14 commits


Milian Wolff authored
This reverts commit fadba34c. It is actually slower to use std::timed_mutex here. Since the sleeping in the timer thread isn't too important, it's better to optimize the common case of writing data. With std::mutex: ``` heaptrack analyze "/home/milian/projects/build/heaptrack/heaptrack.threaded.21113.zst" Performance counter stats for 'heaptrack ./tests/manual/threaded' (5 runs): 8.903,06 msec taskclock # 17,911 CPUs utilized ( + 1,29% ) 314.336 contextswitches # 0,035 M/sec ( + 1,50% ) 40.313 cpumigrations # 0,005 M/sec ( + 13,18% ) 49.464 pagefaults # 0,006 M/sec ( + 0,53% ) 36.517.191.099 cycles # 4,102 GHz ( + 1,42% ) (83,37%) 24.890.585.818 stalledcyclesfrontend # 68,16% frontend cycles idle ( + 1,60% ) (84,13%) 926.640.652 stalledcyclesbackend # 2,54% backend cycles idle ( + 2,28% ) (83,55%) 9.228.118.514 instructions # 0,25 insn per cycle # 2,70 stalled cycles per insn ( + 1,97% ) (83,08%) 1.951.426.273 branches # 219,186 M/sec ( + 2,14% ) (83,48%) 21.883.712 branchmisses # 1,12% of all branches ( + 1,32% ) (82,39%) 0,49707 + 0,00592 seconds time elapsed ( + 1,19% ) ``` With std::timed_mutex: ``` 10.452,71 msec taskclock # 18,617 CPUs utilized ( + 0,31% ) 384.406 contextswitches # 0,037 M/sec ( + 0,55% ) 49.914 cpumigrations # 0,005 M/sec ( + 13,92% ) 49.685 pagefaults # 0,005 M/sec ( + 0,38% ) 42.967.520.168 cycles # 4,111 GHz ( + 0,51% ) (82,96%) 29.172.821.383 stalledcyclesfrontend # 67,90% frontend cycles idle ( + 0,51% ) (83,39%) 1.069.232.909 stalledcyclesbackend # 2,49% backend cycles idle ( + 1,60% ) (83,91%) 10.450.752.968 instructions # 0,24 insn per cycle # 2,79 stalled cycles per insn ( + 1,47% ) (83,18%) 2.225.628.518 branches # 212,924 M/sec ( + 1,50% ) (83,15%) 24.390.495 branchmisses # 1,10% of all branches ( + 0,75% ) (83,41%) 0,56145 + 0,00924 seconds time elapsed ( + 1,65% ) ```

Milian Wolff authored

Milian Wolff authored

Milian Wolff authored
Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.077,40 msec taskclock # 1,142 CPUs utilized ( + 0,35% ) 845 contextswitches # 0,119 K/sec ( + 2,37% ) 245 cpumigrations # 0,035 K/sec ( + 5,21% ) 139.577 pagefaults # 0,020 M/sec ( + 0,27% ) 30.284.047.519 cycles # 4,279 GHz ( + 0,22% ) (83,44%) 750.013.909 stalledcyclesfrontend # 2,48% frontend cycles idle ( + 1,05% ) (83,24%) 6.383.953.436 stalledcyclesbackend # 21,08% backend cycles idle ( + 1,17% ) (83,19%) 57.113.121.471 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn ( + 0,05% ) (83,30%) 11.452.771.446 branches # 1618,217 M/sec ( + 0,02% ) (83,40%) 209.271.140 branchmisses # 1,83% of all branches ( + 0,17% ) (83,42%) 6,1972 + 0,0240 seconds time elapsed ( + 0,39% ) ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 6.144,14 msec taskclock # 1,156 CPUs utilized 611 contextswitches # 0,099 K/sec 167 cpumigrations # 0,027 K/sec 173.538 pagefaults # 0,028 M/sec 26.075.118.232 cycles # 4,244 GHz (83,44%) 542.631.671 stalledcyclesfrontend # 2,08% frontend cycles idle (83,40%) 6.013.468.979 stalledcyclesbackend # 23,06% backend cycles idle (83,12%) 50.576.695.591 instructions # 1,94 insn per cycle # 0,12 stalled cycles per insn (83,32%) 10.046.690.454 branches # 1635,166 M/sec (83,35%) 179.839.003 branchmisses # 1,79% of all branches (83,38%) 5,316701146 seconds time elapsed 5,790727000 seconds user 0,338084000 seconds sys ```

Milian Wolff authored
Should be faster, but I don't really care how much in this utility.

Milian Wolff authored
Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst': 10.913,06 msec taskclock # 1,317 CPUs utilized 948 contextswitches # 0,087 K/sec 279 cpumigrations # 0,026 K/sec 224.389 pagefaults # 0,021 M/sec 46.901.681.727 cycles # 4,298 GHz (83,43%) 704.668.201 stalledcyclesfrontend # 1,50% frontend cycles idle (83,33%) 14.905.704.111 stalledcyclesbackend # 31,78% backend cycles idle (83,23%) 108.365.697.169 instructions # 2,31 insn per cycle # 0,14 stalled cycles per insn (83,30%) 19.402.933.947 branches # 1777,956 M/sec (83,35%) 222.289.116 branchmisses # 1,15% of all branches (83,36%) 8,285154704 seconds time elapsed 10,577763000 seconds user 0,313054000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst': 10.372,46 msec taskclock # 1,321 CPUs utilized 865 contextswitches # 0,083 K/sec 229 cpumigrations # 0,022 K/sec 222.593 pagefaults # 0,021 M/sec 45.087.440.140 cycles # 4,347 GHz (83,38%) 596.167.950 stalledcyclesfrontend # 1,32% frontend cycles idle (83,23%) 14.191.678.571 stalledcyclesbackend # 31,48% backend cycles idle (83,22%) 108.176.660.565 instructions # 2,40 insn per cycle # 0,13 stalled cycles per insn (83,40%) 19.375.195.741 branches # 1867,946 M/sec (83,40%) 223.596.938 branchmisses # 1,15% of all branches (83,37%) 7,850580978 seconds time elapsed 10,029832000 seconds user 0,319581000 seconds sys ```

Milian Wolff authored
Before: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 320000000 (100% overhead) pointers removed: 73600 trimmed: 73600 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash': 9.160,03 msec taskclock # 1,000 CPUs utilized 26 contextswitches # 0,003 K/sec 0 cpumigrations # 0,000 K/sec 250.563 pagefaults # 0,027 M/sec 40.159.559.869 cycles # 4,384 GHz (83,33%) 340.503.937 stalledcyclesfrontend # 0,85% frontend cycles idle (83,33%) 36.881.695.646 stalledcyclesbackend # 91,84% backend cycles idle (83,33%) 11.942.611.820 instructions # 0,30 insn per cycle # 3,09 stalled cycles per insn (83,33%) 2.468.955.121 branches # 269,536 M/sec (83,33%) 10.938.236 branchmisses # 0,44% of all branches (83,35%) 9,160886661 seconds time elapsed 8,911365000 seconds user 0,225871000 seconds sys ``` After: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 224 (7e05% overhead) pointers removed: 73600 trimmed: 73600 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash': 3.281,53 msec taskclock # 1,000 CPUs utilized 6 contextswitches # 0,002 K/sec 0 cpumigrations # 0,000 K/sec 490.237 pagefaults # 0,149 M/sec 14.705.858.311 cycles # 4,481 GHz (83,27%) 1.799.042.842 stalledcyclesfrontend # 12,23% frontend cycles idle (83,35%) 8.573.087.340 stalledcyclesbackend # 58,30% backend cycles idle (83,36%) 11.080.704.070 instructions # 0,75 insn per cycle # 0,77 stalled cycles per insn (83,36%) 2.223.987.372 branches # 677,728 M/sec (83,36%) 36.702.866 branchmisses # 1,65% of all branches (83,30%) 3,282066401 seconds time elapsed 2,802599000 seconds user 0,465220000 seconds sys ``` Note how bench_pointerhash is now seemingly faster and uses less memory than bench_pointermap! But blindly replacing one with the other shows much worse performance in heaptrack  so it seems like the benchmarks aren't showing realworld usage currently.

Milian Wolff authored
Before: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 62208848 (19.4403% overhead) pointers removed: 96448 trimmed: 96448 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap': 5.685,95 msec taskclock # 1,000 CPUs utilized 9 contextswitches # 0,002 K/sec 0 cpumigrations # 0,000 K/sec 114.357 pagefaults # 0,020 M/sec 25.672.059.553 cycles # 4,515 GHz (83,33%) 416.801.612 stalledcyclesfrontend # 1,62% frontend cycles idle (83,33%) 17.952.673.581 stalledcyclesbackend # 69,93% backend cycles idle (83,33%) 12.248.196.258 instructions # 0,48 insn per cycle # 1,47 stalled cycles per insn (83,33%) 2.341.601.571 branches # 411,822 M/sec (83,34%) 95.859.176 branchmisses # 4,09% of all branches (83,35%) 5,686546406 seconds time elapsed 5,510546000 seconds user 0,146093000 seconds sys ``` After: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 60645536 (18.9517% overhead) pointers removed: 96448 trimmed: 96448 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap': 4.906,87 msec taskclock # 1,000 CPUs utilized 9 contextswitches # 0,002 K/sec 0 cpumigrations # 0,000 K/sec 115.556 pagefaults # 0,024 M/sec 21.930.029.620 cycles # 4,469 GHz (83,31%) 314.571.970 stalledcyclesfrontend # 1,43% frontend cycles idle (83,31%) 14.492.419.080 stalledcyclesbackend # 66,08% backend cycles idle (83,31%) 12.059.692.467 instructions # 0,55 insn per cycle # 1,20 stalled cycles per insn (83,35%) 2.287.570.344 branches # 466,197 M/sec (83,37%) 95.750.624 branchmisses # 4,19% of all branches (83,36%) 4,907463968 seconds time elapsed 4,749223000 seconds user 0,132838000 seconds sys ```

Milian Wolff authored
Upstream: https://github.com/Tessil/robinmap Not using a git submodule, but may rethink that in the future... This is a much faster hash map which we can leverage in multiple places in heaptrack. It is MIT licensed, so should be fine to use here.

Milian Wolff authored
This requires boost 1.60 which was released roughly 5 years ago, so I hope this is fine for everyone. Eventually we can just replace it all by std::pmr, once we depend on C++17. The impact is small but noticeable: Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.339,25 msec taskclock # 1,153 CPUs utilized ( + 1,98% ) 860 contextswitches # 0,117 K/sec ( + 9,56% ) 242 cpumigrations # 0,033 K/sec ( + 8,87% ) 139.946 pagefaults # 0,019 M/sec ( + 0,33% ) 31.016.368.103 cycles # 4,226 GHz ( + 1,36% ) (83,45%) 549.091.116 stalledcyclesfrontend # 1,77% frontend cycles idle ( + 5,87% ) (83,25%) 7.195.865.845 stalledcyclesbackend # 23,20% backend cycles idle ( + 1,28% ) (83,19%) 58.379.008.232 instructions # 1,88 insn per cycle # 0,12 stalled cycles per insn ( + 0,05% ) (83,31%) 11.802.836.750 branches # 1608,180 M/sec ( + 0,01% ) (83,40%) 209.444.455 branchmisses # 1,77% of all branches ( + 0,33% ) (83,41%) 6,365 + 0,142 seconds time elapsed ( + 2,23% ) ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.077,40 msec taskclock # 1,142 CPUs utilized ( + 0,35% ) 845 contextswitches # 0,119 K/sec ( + 2,37% ) 245 cpumigrations # 0,035 K/sec ( + 5,21% ) 139.577 pagefaults # 0,020 M/sec ( + 0,27% ) 30.284.047.519 cycles # 4,279 GHz ( + 0,22% ) (83,44%) 750.013.909 stalledcyclesfrontend # 2,48% frontend cycles idle ( + 1,05% ) (83,24%) 6.383.953.436 stalledcyclesbackend # 21,08% backend cycles idle ( + 1,17% ) (83,19%) 57.113.121.471 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn ( + 0,05% ) (83,30%) 11.452.771.446 branches # 1618,217 M/sec ( + 0,02% ) (83,40%) 209.271.140 branchmisses # 1,83% of all branches ( + 0,17% ) (83,42%) 6,1972 + 0,0240 seconds time elapsed ( + 0,39% ) ```

Milian Wolff authored
Will allow me to experiment with PMR and monotonic resource. Performancewise it's pretty much the same compared to QHash now that we simplified Symbol.

Milian Wolff authored
Basically this allows us to operate more on cheap ids instead of the fat strings. This approach has multiple advantages:  Symbol is now just 8byte large instead of 32byte  Accordingly RowData is down to 56byte from 80byte  We can now experiment with using PMR allocators and put a monotonic buffer resources into the new ResultData There are two downsides to this approach:  On one hand, we are now sorting our data differently. I.e. before the data was sorted by the actual string contents whereas now it's sorted by the string index. That shouldn't be a problem though, as we want to sort by actual metrics in most areas anyways. The only area where this is noticeable is the flame graph.  On the other side, we are now doing some things repeatedly. Most notably we don't cache the string basename anymore. But that's not overly noticeable either, as we usually only need that for a handful of rows at any time. If needed, we can even cache this in the future. The benchmark also shows some significant wins with this new approach. Most notably compare the total runtime of ~6.82s and peak RSS memory consumption of 555MB with ~6.17s and 490MB. Before: `` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.825,54 msec taskclock:u # 1,148 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 155.988 pagefaults:u # 0,020 M/sec 32.848.963.956 cycles:u # 4,198 GHz (83,22%) 434.588.005 stalledcyclesfrontend:u # 1,32% frontend cycles idle (83,18%) 6.992.025.659 stalledcyclesbackend:u # 21,29% backend cycles idle (83,36%) 63.255.414.218 instructions:u # 1,93 insn per cycle # 0,11 stalled cycles per insn (83,39%) 12.941.138.245 branches:u # 1653,705 M/sec (83,43%) 225.630.395 branchmisses:u # 1,74% of all branches (83,41%) 6,817015591 seconds time elapsed 7,613402000 seconds user 0,189718000 seconds sys Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst" User time (seconds): 7.48 System time (seconds): 0.19 Percent of CPU this job got: 114% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.70 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 554496 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 156906 Voluntary context switches: 827 Involuntary context switches: 65 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.244,75 msec taskclock:u # 1,174 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 139.184 pagefaults:u # 0,019 M/sec 29.982.409.942 cycles:u # 4,139 GHz (83,44%) 542.168.522 stalledcyclesfrontend:u # 1,81% frontend cycles idle (83,22%) 6.438.890.717 stalledcyclesbackend:u # 21,48% backend cycles idle (83,17%) 57.743.166.637 instructions:u # 1,93 insn per cycle # 0,11 stalled cycles per insn (83,40%) 11.758.342.073 branches:u # 1623,016 M/sec (83,40%) 221.034.296 branchmisses:u # 1,88% of all branches (83,37%) 6,170274615 seconds time elapsed 6,975850000 seconds user 0,249394000 seconds sys Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst" User time (seconds): 6.73 System time (seconds): 0.27 Percent of CPU this job got: 116% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 490132 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 140792 Voluntary context switches: 647 Involuntary context switches: 110 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ```

Milian Wolff authored

Script Kiddy authored

 16 Nov, 2020 8 commits


Milian Wolff authored
Calling erase in a loop leads to quadratic behavior, as can be seen by the impact of this patch. Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst': 75.307,69 msec taskclock # 0,944 CPUs utilized 5.104 contextswitches # 0,068 K/sec 376 cpumigrations # 0,005 K/sec 196.391 pagefaults # 0,003 M/sec 307.733.655.134 cycles # 4,086 GHz (83,34%) 1.516.564.276 stalledcyclesfrontend # 0,49% frontend cycles idle (83,30%) 111.558.209.335 stalledcyclesbackend # 36,25% backend cycles idle (83,30%) 333.691.330.970 instructions # 1,08 insn per cycle # 0,33 stalled cycles per insn (83,33%) 37.612.802.347 branches # 499,455 M/sec (83,35%) 243.957.169 branchmisses # 0,65% of all branches (83,38%) 79,799304157 seconds time elapsed 74,538971000 seconds user 0,344263000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst': 11.540,35 msec taskclock # 1,310 CPUs utilized 1.173 contextswitches # 0,102 K/sec 309 cpumigrations # 0,027 K/sec 195.605 pagefaults # 0,017 M/sec 48.974.438.032 cycles # 4,244 GHz (83,40%) 749.567.505 stalledcyclesfrontend # 1,53% frontend cycles idle (83,42%) 17.760.476.929 stalledcyclesbackend # 36,26% backend cycles idle (83,06%) 114.694.556.598 instructions # 2,34 insn per cycle # 0,15 stalled cycles per insn (83,32%) 20.690.990.493 branches # 1792,926 M/sec (83,39%) 236.844.837 branchmisses # 1,14% of all branches (83,41%) 8,808486636 seconds time elapsed 11,187237000 seconds user 0,322839000 seconds sys ```

Milian Wolff authored
Instead of looping through the allocations at every timestamp, looking up whether to include it in the chart, do this once and then remember the allocation indices that should be used. Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 9.912,44 msec taskclock:u # 1,114 CPUs utilized 0 contextswitches:u # 0,000 K/sec 0 cpumigrations:u # 0,000 K/sec 154.526 pagefaults:u # 0,016 M/sec 41.541.932.930 cycles:u # 4,191 GHz (83,42%) 650.025.055 stalledcyclesfrontend:u # 1,56% frontend cycles idle (83,31%) 8.109.154.392 stalledcyclesbackend:u # 19,52% backend cycles idle (83,08%) 70.928.385.225 instructions:u # 1,71 insn per cycle # 0,11 stalled cycles per insn (83,36%) 14.890.145.672 branches:u # 1502,168 M/sec (83,39%) 303.284.007 branchmisses:u # 2,04% of all branches (83,44%) 8,898062968 seconds time elapsed 9,584299000 seconds user 0,294486000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.959,38 msec taskclock # 1,143 CPUs utilized 963 contextswitches # 0,121 K/sec 241 cpumigrations # 0,030 K/sec 155.984 pagefaults # 0,020 M/sec 34.013.707.752 cycles # 4,273 GHz (83,42%) 944.704.460 stalledcyclesfrontend # 2,78% frontend cycles idle (83,14%) 6.987.086.160 stalledcyclesbackend # 20,54% backend cycles idle (83,18%) 64.337.973.769 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn (83,42%) 13.140.848.740 branches # 1650,989 M/sec (83,42%) 225.817.303 branchmisses # 1,72% of all branches (83,42%) 6,960656687 seconds time elapsed 7,704212000 seconds user 0,232872000 seconds sys ```

Milian Wolff authored

Milian Wolff authored

Milian Wolff authored

Milian Wolff authored

Milian Wolff authored
It has exactly the semantics that we need here, no need for special functions that replicate it.

Milian Wolff authored
Replace lookup+insert with just an insert and check if the size changed to track whether we actually overwrote an item or not. Sadly, QSet doesn't have the nice API of std::unordered_set here, but otherwise it's faster. Also use SymbolId in one more place. Sadly, the performance impact isn't really noticable.
