- 21 Jan, 2021 1 commit
-
-
Script Kiddy authored
-
- 18 Jan, 2021 1 commit
-
-
Script Kiddy authored
-
- 14 Jan, 2021 1 commit
-
-
Script Kiddy authored
-
- 13 Jan, 2021 1 commit
-
-
Script Kiddy authored
-
- 12 Jan, 2021 1 commit
-
-
Script Kiddy authored
-
- 11 Jan, 2021 2 commits
-
-
-
Script Kiddy authored
-
- 05 Jan, 2021 1 commit
-
-
Milian Wolff authored
It will become a required dependency in the future, but isn't yet.
-
- 25 Dec, 2020 1 commit
-
-
Denis Khalikov authored
Build-ID based standalone debug info might be ignored for some shared objects. See details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86198
-
- 14 Dec, 2020 1 commit
-
-
Milian Wolff authored
Thanks Nuno
-
- 01 Dec, 2020 1 commit
-
-
Arran Cudbard-Bell authored
Looks like the KDE homebrew tap maintainers switched hosting about 5 months back and started depending on more upstream formulae. This fixes the tap command and updates the names of the dependencies.
-
- 26 Nov, 2020 7 commits
-
-
Milian Wolff authored
Use a monotonic_buffer_resource to show what we can gain by using it for this purpose: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.526,48 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 910.944 page-faults:u # 0,165 M/sec 20.395.572.533 cycles:u # 3,691 GHz (83,34%) 58.593.916 stalled-cycles-frontend:u # 0,29% frontend cycles idle (83,34%) 15.917.051.157 stalled-cycles-backend:u # 78,04% backend cycles idle (83,33%) 22.396.075.369 instructions:u # 1,10 insn per cycle # 0,71 stalled cycles per insn (83,34%) 4.567.250.732 branches:u # 826,431 M/sec (83,34%) 21.088.217 branch-misses:u # 0,46% of all branches (83,32%) 5,527112576 seconds time elapsed 4,724403000 seconds user 0,780738000 seconds sys Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::pmr::slist': 4.096,87 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 910.958 page-faults:u # 0,222 M/sec 15.097.193.259 cycles:u # 3,685 GHz (83,30%) 30.032.956 stalled-cycles-frontend:u # 0,20% frontend cycles idle (83,30%) 12.890.862.885 stalled-cycles-backend:u # 85,39% backend cycles idle (83,32%) 9.589.139.418 instructions:u # 0,64 insn per cycle # 1,34 stalled cycles per insn (83,38%) 1.683.224.838 branches:u # 410,857 M/sec (83,38%) 19.650.364 branch-misses:u # 1,17% of all branches (83,31%) 4,097475027 seconds time elapsed 3,366684000 seconds user 0,714428000 seconds sys ``` Memory consumption is pretty much unaffected, but that's obviously OK: ``` slist: Maximum resident set size (kbytes): 3649904 pmr::list: Maximum resident set size (kbytes): 3649888 ```
-
Milian Wolff authored
-
Milian Wolff authored
This has a significant impact on the vector-based trees, as we now need to do one full run over the tree to set the parents. Now, the list based trees are already significantly better: ``` for tag in QVector std::vector std::list boost::slist; do perf stat ~/projects/build/heaptrack/tests/benchmarks/bench_tree $tag; done 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree QVector': 7.258,90 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 1.066.324 page-faults:u # 0,147 M/sec 28.597.338.274 cycles:u # 3,940 GHz (83,30%) 106.681.727 stalled-cycles-frontend:u # 0,37% frontend cycles idle (83,33%) 18.253.436.626 stalled-cycles-backend:u # 63,83% backend cycles idle (83,34%) 35.021.338.418 instructions:u # 1,22 insn per cycle # 0,52 stalled cycles per insn (83,35%) 6.910.244.080 branches:u # 951,968 M/sec (83,34%) 24.486.655 branch-misses:u # 0,35% of all branches (83,33%) 7,259623426 seconds time elapsed 6,407177000 seconds user 0,823973000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::vector': 6.559,15 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 913.372 page-faults:u # 0,139 M/sec 25.178.246.000 cycles:u # 3,839 GHz (83,31%) 61.494.028 stalled-cycles-frontend:u # 0,24% frontend cycles idle (83,32%) 18.412.909.524 stalled-cycles-backend:u # 73,13% backend cycles idle (83,35%) 25.738.738.521 instructions:u # 1,02 insn per cycle # 0,72 stalled cycles per insn (83,35%) 5.149.522.578 branches:u # 785,090 M/sec (83,35%) 23.133.623 branch-misses:u # 0,45% of all branches (83,32%) 6,559888601 seconds time elapsed 5,804386000 seconds user 0,730870000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list': 6.328,67 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 1.068.075 page-faults:u # 0,169 M/sec 23.138.038.923 cycles:u # 3,656 GHz (83,31%) 57.658.887 stalled-cycles-frontend:u # 0,25% frontend cycles idle (83,31%) 18.076.221.406 stalled-cycles-backend:u # 78,12% backend cycles idle (83,31%) 22.337.549.206 instructions:u # 0,97 insn per cycle # 0,81 stalled cycles per insn (83,34%) 4.534.944.282 branches:u # 716,572 M/sec (83,36%) 21.234.052 branch-misses:u # 0,47% of all branches (83,35%) 6,329363185 seconds time elapsed 5,384786000 seconds user 0,908987000 seconds sys 8, 40223304 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.377,76 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 910.954 page-faults:u # 0,169 M/sec 20.828.436.359 cycles:u # 3,873 GHz (83,32%) 31.529.423 stalled-cycles-frontend:u # 0,15% frontend cycles idle (83,32%) 16.674.025.296 stalled-cycles-backend:u # 80,05% backend cycles idle (83,32%) 22.416.653.118 instructions:u # 1,08 insn per cycle # 0,74 stalled cycles per insn (83,32%) 4.560.100.713 branches:u # 847,956 M/sec (83,38%) 21.152.807 branch-misses:u # 0,46% of all branches (83,34%) 5,378376956 seconds time elapsed 4,596761000 seconds user 0,761155000 seconds sys ```
-
Milian Wolff authored
This one looks very promising, as it is already at the performance of the current best container, std::vector, but will allows us to easily use a PMR container in the future: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist': 5.455,84 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 910.951 page-faults:u # 0,167 M/sec 20.295.488.575 cycles:u # 3,720 GHz (83,32%) 33.927.249 stalled-cycles-frontend:u # 0,17% frontend cycles idle (83,34%) 16.038.938.738 stalled-cycles-backend:u # 79,03% backend cycles idle (83,34%) 22.319.974.139 instructions:u # 1,10 insn per cycle # 0,72 stalled cycles per insn (83,34%) 4.564.927.240 branches:u # 836,705 M/sec (83,34%) 21.123.736 branch-misses:u # 0,46% of all branches (83,32%) 5,456638203 seconds time elapsed 4,656348000 seconds user 0,777383000 seconds sys ``` Memory: ``` Maximum resident set size (kbytes): 3649960 ```
-
Milian Wolff authored
As-is, this lies somewhere in-between std::vector and QVector: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list': 6.227,71 msec task-clock:u # 1,000 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 1.068.071 page-faults:u # 0,172 M/sec 23.672.061.063 cycles:u # 3,801 GHz (83,33%) 142.641.882 stalled-cycles-frontend:u # 0,60% frontend cycles idle (83,33%) 17.758.938.284 stalled-cycles-backend:u # 75,02% backend cycles idle (83,33%) 22.212.698.253 instructions:u # 0,94 insn per cycle # 0,80 stalled cycles per insn (83,33%) 4.492.589.760 branches:u # 721,387 M/sec (83,33%) 21.193.689 branch-misses:u # 0,47% of all branches (83,33%) 6,228371570 seconds time elapsed 5,293329000 seconds user 0,909763000 seconds sys ``` Peak RSS: ``` Maximum resident set size (kbytes): 4278300 ```
-
Milian Wolff authored
This is - contrary to what I thought - significantly faster than using QVector here: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std': 5.657,58 msec task-clock # 1,000 CPUs utilized 30 context-switches # 0,005 K/sec 0 cpu-migrations # 0,000 K/sec 913.370 page-faults # 0,161 M/sec 24.925.687.516 cycles # 4,406 GHz (83,30%) 1.071.926.434 stalled-cycles-frontend # 4,30% frontend cycles idle (83,35%) 15.463.080.632 stalled-cycles-backend # 62,04% backend cycles idle (83,35%) 29.942.331.671 instructions # 1,20 insn per cycle # 0,52 stalled cycles per insn (83,35%) 5.935.680.160 branches # 1049,154 M/sec (83,35%) 24.781.895 branch-misses # 0,42% of all branches (83,30%) 5,658346811 seconds time elapsed 4,769389000 seconds user 0,867054000 seconds sys ``` Also, it uses less peak memory: ``` Maximum resident set size (kbytes): 3659356 ```
-
Milian Wolff authored
This is basically the status quo used in heaptrack. Current timings: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree qt': 6.389,58 msec task-clock # 1,000 CPUs utilized 11 context-switches # 0,002 K/sec 0 cpu-migrations # 0,000 K/sec 1.066.317 page-faults # 0,167 M/sec 27.562.650.647 cycles # 4,314 GHz (83,33%) 1.112.140.389 stalled-cycles-frontend # 4,03% frontend cycles idle (83,33%) 15.580.172.700 stalled-cycles-backend # 56,53% backend cycles idle (83,33%) 39.147.522.858 instructions # 1,42 insn per cycle # 0,40 stalled cycles per insn (83,33%) 7.680.291.744 branches # 1202,002 M/sec (83,33%) 25.106.869 branch-misses # 0,33% of all branches (83,34%) 6,390189396 seconds time elapsed 5,500982000 seconds user 0,863794000 seconds sys ``` Peak RSS: ``` Maximum resident set size (kbytes): 4271420 ```
-
- 17 Nov, 2020 14 commits
-
-
Milian Wolff authored
This reverts commit fadba34c. It is actually slower to use std::timed_mutex here. Since the sleeping in the timer thread isn't too important, it's better to optimize the common case of writing data. With std::mutex: ``` heaptrack --analyze "/home/milian/projects/build/heaptrack/heaptrack.threaded.21113.zst" Performance counter stats for 'heaptrack ./tests/manual/threaded' (5 runs): 8.903,06 msec task-clock # 17,911 CPUs utilized ( +- 1,29% ) 314.336 context-switches # 0,035 M/sec ( +- 1,50% ) 40.313 cpu-migrations # 0,005 M/sec ( +- 13,18% ) 49.464 page-faults # 0,006 M/sec ( +- 0,53% ) 36.517.191.099 cycles # 4,102 GHz ( +- 1,42% ) (83,37%) 24.890.585.818 stalled-cycles-frontend # 68,16% frontend cycles idle ( +- 1,60% ) (84,13%) 926.640.652 stalled-cycles-backend # 2,54% backend cycles idle ( +- 2,28% ) (83,55%) 9.228.118.514 instructions # 0,25 insn per cycle # 2,70 stalled cycles per insn ( +- 1,97% ) (83,08%) 1.951.426.273 branches # 219,186 M/sec ( +- 2,14% ) (83,48%) 21.883.712 branch-misses # 1,12% of all branches ( +- 1,32% ) (82,39%) 0,49707 +- 0,00592 seconds time elapsed ( +- 1,19% ) ``` With std::timed_mutex: ``` 10.452,71 msec task-clock # 18,617 CPUs utilized ( +- 0,31% ) 384.406 context-switches # 0,037 M/sec ( +- 0,55% ) 49.914 cpu-migrations # 0,005 M/sec ( +- 13,92% ) 49.685 page-faults # 0,005 M/sec ( +- 0,38% ) 42.967.520.168 cycles # 4,111 GHz ( +- 0,51% ) (82,96%) 29.172.821.383 stalled-cycles-frontend # 67,90% frontend cycles idle ( +- 0,51% ) (83,39%) 1.069.232.909 stalled-cycles-backend # 2,49% backend cycles idle ( +- 1,60% ) (83,91%) 10.450.752.968 instructions # 0,24 insn per cycle # 2,79 stalled cycles per insn ( +- 1,47% ) (83,18%) 2.225.628.518 branches # 212,924 M/sec ( +- 1,50% ) (83,15%) 24.390.495 branch-misses # 1,10% of all branches ( +- 0,75% ) (83,41%) 0,56145 +- 0,00924 seconds time elapsed ( +- 1,65% ) ```
-
Milian Wolff authored
-
Milian Wolff authored
-
Milian Wolff authored
Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.077,40 msec task-clock # 1,142 CPUs utilized ( +- 0,35% ) 845 context-switches # 0,119 K/sec ( +- 2,37% ) 245 cpu-migrations # 0,035 K/sec ( +- 5,21% ) 139.577 page-faults # 0,020 M/sec ( +- 0,27% ) 30.284.047.519 cycles # 4,279 GHz ( +- 0,22% ) (83,44%) 750.013.909 stalled-cycles-frontend # 2,48% frontend cycles idle ( +- 1,05% ) (83,24%) 6.383.953.436 stalled-cycles-backend # 21,08% backend cycles idle ( +- 1,17% ) (83,19%) 57.113.121.471 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn ( +- 0,05% ) (83,30%) 11.452.771.446 branches # 1618,217 M/sec ( +- 0,02% ) (83,40%) 209.271.140 branch-misses # 1,83% of all branches ( +- 0,17% ) (83,42%) 6,1972 +- 0,0240 seconds time elapsed ( +- 0,39% ) ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 6.144,14 msec task-clock # 1,156 CPUs utilized 611 context-switches # 0,099 K/sec 167 cpu-migrations # 0,027 K/sec 173.538 page-faults # 0,028 M/sec 26.075.118.232 cycles # 4,244 GHz (83,44%) 542.631.671 stalled-cycles-frontend # 2,08% frontend cycles idle (83,40%) 6.013.468.979 stalled-cycles-backend # 23,06% backend cycles idle (83,12%) 50.576.695.591 instructions # 1,94 insn per cycle # 0,12 stalled cycles per insn (83,32%) 10.046.690.454 branches # 1635,166 M/sec (83,35%) 179.839.003 branch-misses # 1,79% of all branches (83,38%) 5,316701146 seconds time elapsed 5,790727000 seconds user 0,338084000 seconds sys ```
-
Milian Wolff authored
Should be faster, but I don't really care how much in this utility.
-
Milian Wolff authored
Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst': 10.913,06 msec task-clock # 1,317 CPUs utilized 948 context-switches # 0,087 K/sec 279 cpu-migrations # 0,026 K/sec 224.389 page-faults # 0,021 M/sec 46.901.681.727 cycles # 4,298 GHz (83,43%) 704.668.201 stalled-cycles-frontend # 1,50% frontend cycles idle (83,33%) 14.905.704.111 stalled-cycles-backend # 31,78% backend cycles idle (83,23%) 108.365.697.169 instructions # 2,31 insn per cycle # 0,14 stalled cycles per insn (83,30%) 19.402.933.947 branches # 1777,956 M/sec (83,35%) 222.289.116 branch-misses # 1,15% of all branches (83,36%) 8,285154704 seconds time elapsed 10,577763000 seconds user 0,313054000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst': 10.372,46 msec task-clock # 1,321 CPUs utilized 865 context-switches # 0,083 K/sec 229 cpu-migrations # 0,022 K/sec 222.593 page-faults # 0,021 M/sec 45.087.440.140 cycles # 4,347 GHz (83,38%) 596.167.950 stalled-cycles-frontend # 1,32% frontend cycles idle (83,23%) 14.191.678.571 stalled-cycles-backend # 31,48% backend cycles idle (83,22%) 108.176.660.565 instructions # 2,40 insn per cycle # 0,13 stalled cycles per insn (83,40%) 19.375.195.741 branches # 1867,946 M/sec (83,40%) 223.596.938 branch-misses # 1,15% of all branches (83,37%) 7,850580978 seconds time elapsed 10,029832000 seconds user 0,319581000 seconds sys ```
-
Milian Wolff authored
Before: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 320000000 (100% overhead) pointers removed: 73600 trimmed: 73600 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash': 9.160,03 msec task-clock # 1,000 CPUs utilized 26 context-switches # 0,003 K/sec 0 cpu-migrations # 0,000 K/sec 250.563 page-faults # 0,027 M/sec 40.159.559.869 cycles # 4,384 GHz (83,33%) 340.503.937 stalled-cycles-frontend # 0,85% frontend cycles idle (83,33%) 36.881.695.646 stalled-cycles-backend # 91,84% backend cycles idle (83,33%) 11.942.611.820 instructions # 0,30 insn per cycle # 3,09 stalled cycles per insn (83,33%) 2.468.955.121 branches # 269,536 M/sec (83,33%) 10.938.236 branch-misses # 0,44% of all branches (83,35%) 9,160886661 seconds time elapsed 8,911365000 seconds user 0,225871000 seconds sys ``` After: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 224 (7e-05% overhead) pointers removed: 73600 trimmed: 73600 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash': 3.281,53 msec task-clock # 1,000 CPUs utilized 6 context-switches # 0,002 K/sec 0 cpu-migrations # 0,000 K/sec 490.237 page-faults # 0,149 M/sec 14.705.858.311 cycles # 4,481 GHz (83,27%) 1.799.042.842 stalled-cycles-frontend # 12,23% frontend cycles idle (83,35%) 8.573.087.340 stalled-cycles-backend # 58,30% backend cycles idle (83,36%) 11.080.704.070 instructions # 0,75 insn per cycle # 0,77 stalled cycles per insn (83,36%) 2.223.987.372 branches # 677,728 M/sec (83,36%) 36.702.866 branch-misses # 1,65% of all branches (83,30%) 3,282066401 seconds time elapsed 2,802599000 seconds user 0,465220000 seconds sys ``` Note how bench_pointerhash is now seemingly faster and uses less memory than bench_pointermap! But blindly replacing one with the other shows much worse performance in heaptrack - so it seems like the benchmarks aren't showing real-world usage currently.
-
Milian Wolff authored
Before: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 62208848 (19.4403% overhead) pointers removed: 96448 trimmed: 96448 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap': 5.685,95 msec task-clock # 1,000 CPUs utilized 9 context-switches # 0,002 K/sec 0 cpu-migrations # 0,000 K/sec 114.357 page-faults # 0,020 M/sec 25.672.059.553 cycles # 4,515 GHz (83,33%) 416.801.612 stalled-cycles-frontend # 1,62% frontend cycles idle (83,33%) 17.952.673.581 stalled-cycles-backend # 69,93% backend cycles idle (83,33%) 12.248.196.258 instructions # 0,48 insn per cycle # 1,47 stalled cycles per insn (83,33%) 2.341.601.571 branches # 411,822 M/sec (83,34%) 95.859.176 branch-misses # 4,09% of all branches (83,35%) 5,686546406 seconds time elapsed 5,510546000 seconds user 0,146093000 seconds sys ``` After: ``` allocated vector: 73376 allocated input pointers: 320000000 freed input pointers: 224 begin actual benchmark: 224 pointers added: 60645536 (18.9517% overhead) pointers removed: 96448 trimmed: 96448 Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap': 4.906,87 msec task-clock # 1,000 CPUs utilized 9 context-switches # 0,002 K/sec 0 cpu-migrations # 0,000 K/sec 115.556 page-faults # 0,024 M/sec 21.930.029.620 cycles # 4,469 GHz (83,31%) 314.571.970 stalled-cycles-frontend # 1,43% frontend cycles idle (83,31%) 14.492.419.080 stalled-cycles-backend # 66,08% backend cycles idle (83,31%) 12.059.692.467 instructions # 0,55 insn per cycle # 1,20 stalled cycles per insn (83,35%) 2.287.570.344 branches # 466,197 M/sec (83,37%) 95.750.624 branch-misses # 4,19% of all branches (83,36%) 4,907463968 seconds time elapsed 4,749223000 seconds user 0,132838000 seconds sys ```
-
Milian Wolff authored
Upstream: https://github.com/Tessil/robin-map Not using a git submodule, but may rethink that in the future... This is a much faster hash map which we can leverage in multiple places in heaptrack. It is MIT licensed, so should be fine to use here.
-
Milian Wolff authored
This requires boost 1.60 which was released roughly 5 years ago, so I hope this is fine for everyone. Eventually we can just replace it all by std::pmr, once we depend on C++17. The impact is small but noticeable: Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.339,25 msec task-clock # 1,153 CPUs utilized ( +- 1,98% ) 860 context-switches # 0,117 K/sec ( +- 9,56% ) 242 cpu-migrations # 0,033 K/sec ( +- 8,87% ) 139.946 page-faults # 0,019 M/sec ( +- 0,33% ) 31.016.368.103 cycles # 4,226 GHz ( +- 1,36% ) (83,45%) 549.091.116 stalled-cycles-frontend # 1,77% frontend cycles idle ( +- 5,87% ) (83,25%) 7.195.865.845 stalled-cycles-backend # 23,20% backend cycles idle ( +- 1,28% ) (83,19%) 58.379.008.232 instructions # 1,88 insn per cycle # 0,12 stalled cycles per insn ( +- 0,05% ) (83,31%) 11.802.836.750 branches # 1608,180 M/sec ( +- 0,01% ) (83,40%) 209.444.455 branch-misses # 1,77% of all branches ( +- 0,33% ) (83,41%) 6,365 +- 0,142 seconds time elapsed ( +- 2,23% ) ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs): 7.077,40 msec task-clock # 1,142 CPUs utilized ( +- 0,35% ) 845 context-switches # 0,119 K/sec ( +- 2,37% ) 245 cpu-migrations # 0,035 K/sec ( +- 5,21% ) 139.577 page-faults # 0,020 M/sec ( +- 0,27% ) 30.284.047.519 cycles # 4,279 GHz ( +- 0,22% ) (83,44%) 750.013.909 stalled-cycles-frontend # 2,48% frontend cycles idle ( +- 1,05% ) (83,24%) 6.383.953.436 stalled-cycles-backend # 21,08% backend cycles idle ( +- 1,17% ) (83,19%) 57.113.121.471 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn ( +- 0,05% ) (83,30%) 11.452.771.446 branches # 1618,217 M/sec ( +- 0,02% ) (83,40%) 209.271.140 branch-misses # 1,83% of all branches ( +- 0,17% ) (83,42%) 6,1972 +- 0,0240 seconds time elapsed ( +- 0,39% ) ```
-
Milian Wolff authored
Will allow me to experiment with PMR and monotonic resource. Performance-wise it's pretty much the same compared to QHash now that we simplified Symbol.
-
Milian Wolff authored
Basically this allows us to operate more on cheap ids instead of the fat strings. This approach has multiple advantages: - Symbol is now just 8byte large instead of 32byte - Accordingly RowData is down to 56byte from 80byte - We can now experiment with using PMR allocators and put a monotonic buffer resources into the new ResultData There are two downsides to this approach: - On one hand, we are now sorting our data differently. I.e. before the data was sorted by the actual string contents whereas now it's sorted by the string index. That shouldn't be a problem though, as we want to sort by actual metrics in most areas anyways. The only area where this is noticeable is the flame graph. - On the other side, we are now doing some things repeatedly. Most notably we don't cache the string basename anymore. But that's not overly noticeable either, as we usually only need that for a handful of rows at any time. If needed, we can even cache this in the future. The benchmark also shows some significant wins with this new approach. Most notably compare the total runtime of ~6.82s and peak RSS memory consumption of 555MB with ~6.17s and 490MB. Before: `` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.825,54 msec task-clock:u # 1,148 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 155.988 page-faults:u # 0,020 M/sec 32.848.963.956 cycles:u # 4,198 GHz (83,22%) 434.588.005 stalled-cycles-frontend:u # 1,32% frontend cycles idle (83,18%) 6.992.025.659 stalled-cycles-backend:u # 21,29% backend cycles idle (83,36%) 63.255.414.218 instructions:u # 1,93 insn per cycle # 0,11 stalled cycles per insn (83,39%) 12.941.138.245 branches:u # 1653,705 M/sec (83,43%) 225.630.395 branch-misses:u # 1,74% of all branches (83,41%) 6,817015591 seconds time elapsed 7,613402000 seconds user 0,189718000 seconds sys Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst" User time (seconds): 7.48 System time (seconds): 0.19 Percent of CPU this job got: 114% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.70 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 554496 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 156906 Voluntary context switches: 827 Involuntary context switches: 65 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.244,75 msec task-clock:u # 1,174 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 139.184 page-faults:u # 0,019 M/sec 29.982.409.942 cycles:u # 4,139 GHz (83,44%) 542.168.522 stalled-cycles-frontend:u # 1,81% frontend cycles idle (83,22%) 6.438.890.717 stalled-cycles-backend:u # 21,48% backend cycles idle (83,17%) 57.743.166.637 instructions:u # 1,93 insn per cycle # 0,11 stalled cycles per insn (83,40%) 11.758.342.073 branches:u # 1623,016 M/sec (83,40%) 221.034.296 branch-misses:u # 1,88% of all branches (83,37%) 6,170274615 seconds time elapsed 6,975850000 seconds user 0,249394000 seconds sys Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst" User time (seconds): 6.73 System time (seconds): 0.27 Percent of CPU this job got: 116% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 490132 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 140792 Voluntary context switches: 647 Involuntary context switches: 110 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ```
-
Milian Wolff authored
-
Script Kiddy authored
-
- 16 Nov, 2020 8 commits
-
-
Milian Wolff authored
Calling erase in a loop leads to quadratic behavior, as can be seen by the impact of this patch. Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst': 75.307,69 msec task-clock # 0,944 CPUs utilized 5.104 context-switches # 0,068 K/sec 376 cpu-migrations # 0,005 K/sec 196.391 page-faults # 0,003 M/sec 307.733.655.134 cycles # 4,086 GHz (83,34%) 1.516.564.276 stalled-cycles-frontend # 0,49% frontend cycles idle (83,30%) 111.558.209.335 stalled-cycles-backend # 36,25% backend cycles idle (83,30%) 333.691.330.970 instructions # 1,08 insn per cycle # 0,33 stalled cycles per insn (83,33%) 37.612.802.347 branches # 499,455 M/sec (83,35%) 243.957.169 branch-misses # 0,65% of all branches (83,38%) 79,799304157 seconds time elapsed 74,538971000 seconds user 0,344263000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst': 11.540,35 msec task-clock # 1,310 CPUs utilized 1.173 context-switches # 0,102 K/sec 309 cpu-migrations # 0,027 K/sec 195.605 page-faults # 0,017 M/sec 48.974.438.032 cycles # 4,244 GHz (83,40%) 749.567.505 stalled-cycles-frontend # 1,53% frontend cycles idle (83,42%) 17.760.476.929 stalled-cycles-backend # 36,26% backend cycles idle (83,06%) 114.694.556.598 instructions # 2,34 insn per cycle # 0,15 stalled cycles per insn (83,32%) 20.690.990.493 branches # 1792,926 M/sec (83,39%) 236.844.837 branch-misses # 1,14% of all branches (83,41%) 8,808486636 seconds time elapsed 11,187237000 seconds user 0,322839000 seconds sys ```
-
Milian Wolff authored
Instead of looping through the allocations at every timestamp, looking up whether to include it in the chart, do this once and then remember the allocation indices that should be used. Before: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 9.912,44 msec task-clock:u # 1,114 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 154.526 page-faults:u # 0,016 M/sec 41.541.932.930 cycles:u # 4,191 GHz (83,42%) 650.025.055 stalled-cycles-frontend:u # 1,56% frontend cycles idle (83,31%) 8.109.154.392 stalled-cycles-backend:u # 19,52% backend cycles idle (83,08%) 70.928.385.225 instructions:u # 1,71 insn per cycle # 0,11 stalled cycles per insn (83,36%) 14.890.145.672 branches:u # 1502,168 M/sec (83,39%) 303.284.007 branch-misses:u # 2,04% of all branches (83,44%) 8,898062968 seconds time elapsed 9,584299000 seconds user 0,294486000 seconds sys ``` After: ``` Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst': 7.959,38 msec task-clock # 1,143 CPUs utilized 963 context-switches # 0,121 K/sec 241 cpu-migrations # 0,030 K/sec 155.984 page-faults # 0,020 M/sec 34.013.707.752 cycles # 4,273 GHz (83,42%) 944.704.460 stalled-cycles-frontend # 2,78% frontend cycles idle (83,14%) 6.987.086.160 stalled-cycles-backend # 20,54% backend cycles idle (83,18%) 64.337.973.769 instructions # 1,89 insn per cycle # 0,11 stalled cycles per insn (83,42%) 13.140.848.740 branches # 1650,989 M/sec (83,42%) 225.817.303 branch-misses # 1,72% of all branches (83,42%) 6,960656687 seconds time elapsed 7,704212000 seconds user 0,232872000 seconds sys ```
-
Milian Wolff authored
-
Milian Wolff authored
-
Milian Wolff authored
-
Milian Wolff authored
-
Milian Wolff authored
It has exactly the semantics that we need here, no need for special functions that replicate it.
-
Milian Wolff authored
Replace lookup+insert with just an insert and check if the size changed to track whether we actually overwrote an item or not. Sadly, QSet doesn't have the nice API of std::unordered_set here, but otherwise it's faster. Also use SymbolId in one more place. Sadly, the performance impact isn't really noticable.
-