1. 21 Jan, 2021 1 commit
  2. 18 Jan, 2021 1 commit
  3. 14 Jan, 2021 1 commit
  4. 13 Jan, 2021 1 commit
  5. 12 Jan, 2021 1 commit
  6. 11 Jan, 2021 2 commits
  7. 05 Jan, 2021 1 commit
  8. 25 Dec, 2020 1 commit
  9. 14 Dec, 2020 1 commit
  10. 01 Dec, 2020 1 commit
    • Arran Cudbard-Bell's avatar
      Update homebrew dependencies · e46f454c
      Arran Cudbard-Bell authored
      Looks like the KDE homebrew tap maintainers switched hosting about 5 months back and started depending on more upstream formulae.  This fixes the tap command and updates the names of the dependencies.
      e46f454c
  11. 26 Nov, 2020 7 commits
    • Milian Wolff's avatar
      Expand benchmark to also cover boost::container::pmr::slist · 4c97231f
      Milian Wolff authored
      Use a monotonic_buffer_resource to show what we can gain by using
      it for this purpose:
      
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist':
      
                5.526,48 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 910.944      page-faults:u             #    0,165 M/sec
          20.395.572.533      cycles:u                  #    3,691 GHz                      (83,34%)
              58.593.916      stalled-cycles-frontend:u #    0,29% frontend cycles idle     (83,34%)
          15.917.051.157      stalled-cycles-backend:u  #   78,04% backend cycles idle      (83,33%)
          22.396.075.369      instructions:u            #    1,10  insn per cycle
                                                        #    0,71  stalled cycles per insn  (83,34%)
           4.567.250.732      branches:u                #  826,431 M/sec                    (83,34%)
              21.088.217      branch-misses:u           #    0,46% of all branches          (83,32%)
      
             5,527112576 seconds time elapsed
      
             4,724403000 seconds user
             0,780738000 seconds sys
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::pmr::slist':
      
                4.096,87 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 910.958      page-faults:u             #    0,222 M/sec
          15.097.193.259      cycles:u                  #    3,685 GHz                      (83,30%)
              30.032.956      stalled-cycles-frontend:u #    0,20% frontend cycles idle     (83,30%)
          12.890.862.885      stalled-cycles-backend:u  #   85,39% backend cycles idle      (83,32%)
           9.589.139.418      instructions:u            #    0,64  insn per cycle
                                                        #    1,34  stalled cycles per insn  (83,38%)
           1.683.224.838      branches:u                #  410,857 M/sec                    (83,38%)
              19.650.364      branch-misses:u           #    1,17% of all branches          (83,31%)
      
             4,097475027 seconds time elapsed
      
             3,366684000 seconds user
             0,714428000 seconds sys
      ```
      
      Memory consumption is pretty much unaffected, but that's obviously OK:
      
      ```
      slist:     Maximum resident set size (kbytes): 3649904
      pmr::list: Maximum resident set size (kbytes): 3649888
      ```
      4c97231f
    • Milian Wolff's avatar
      Make numNodes reusable · 1518d3b3
      Milian Wolff authored
      1518d3b3
    • Milian Wolff's avatar
      Set tree node parents · 13d55e7d
      Milian Wolff authored
      This has a significant impact on the vector-based trees, as we now
      need to do one full run over the tree to set the parents. Now, the
      list based trees are already significantly better:
      
      ```
      for tag in QVector std::vector std::list boost::slist; do perf stat ~/projects/build/heaptrack/tests/benchmarks/bench_tree $tag; done
      8, 40223304
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree QVector':
      
                7.258,90 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
               1.066.324      page-faults:u             #    0,147 M/sec
          28.597.338.274      cycles:u                  #    3,940 GHz                      (83,30%)
             106.681.727      stalled-cycles-frontend:u #    0,37% frontend cycles idle     (83,33%)
          18.253.436.626      stalled-cycles-backend:u  #   63,83% backend cycles idle      (83,34%)
          35.021.338.418      instructions:u            #    1,22  insn per cycle
                                                        #    0,52  stalled cycles per insn  (83,35%)
           6.910.244.080      branches:u                #  951,968 M/sec                    (83,34%)
              24.486.655      branch-misses:u           #    0,35% of all branches          (83,33%)
      
             7,259623426 seconds time elapsed
      
             6,407177000 seconds user
             0,823973000 seconds sys
      
      8, 40223304
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::vector':
      
                6.559,15 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 913.372      page-faults:u             #    0,139 M/sec
          25.178.246.000      cycles:u                  #    3,839 GHz                      (83,31%)
              61.494.028      stalled-cycles-frontend:u #    0,24% frontend cycles idle     (83,32%)
          18.412.909.524      stalled-cycles-backend:u  #   73,13% backend cycles idle      (83,35%)
          25.738.738.521      instructions:u            #    1,02  insn per cycle
                                                        #    0,72  stalled cycles per insn  (83,35%)
           5.149.522.578      branches:u                #  785,090 M/sec                    (83,35%)
              23.133.623      branch-misses:u           #    0,45% of all branches          (83,32%)
      
             6,559888601 seconds time elapsed
      
             5,804386000 seconds user
             0,730870000 seconds sys
      
      8, 40223304
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list':
      
                6.328,67 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
               1.068.075      page-faults:u             #    0,169 M/sec
          23.138.038.923      cycles:u                  #    3,656 GHz                      (83,31%)
              57.658.887      stalled-cycles-frontend:u #    0,25% frontend cycles idle     (83,31%)
          18.076.221.406      stalled-cycles-backend:u  #   78,12% backend cycles idle      (83,31%)
          22.337.549.206      instructions:u            #    0,97  insn per cycle
                                                        #    0,81  stalled cycles per insn  (83,34%)
           4.534.944.282      branches:u                #  716,572 M/sec                    (83,36%)
              21.234.052      branch-misses:u           #    0,47% of all branches          (83,35%)
      
             6,329363185 seconds time elapsed
      
             5,384786000 seconds user
             0,908987000 seconds sys
      
      8, 40223304
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist':
      
                5.377,76 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 910.954      page-faults:u             #    0,169 M/sec
          20.828.436.359      cycles:u                  #    3,873 GHz                      (83,32%)
              31.529.423      stalled-cycles-frontend:u #    0,15% frontend cycles idle     (83,32%)
          16.674.025.296      stalled-cycles-backend:u  #   80,05% backend cycles idle      (83,32%)
          22.416.653.118      instructions:u            #    1,08  insn per cycle
                                                        #    0,74  stalled cycles per insn  (83,32%)
           4.560.100.713      branches:u                #  847,956 M/sec                    (83,38%)
              21.152.807      branch-misses:u           #    0,46% of all branches          (83,34%)
      
             5,378376956 seconds time elapsed
      
             4,596761000 seconds user
             0,761155000 seconds sys
      ```
      13d55e7d
    • Milian Wolff's avatar
      Also benchmark boost::container::slist · eb572d8d
      Milian Wolff authored
      This one looks very promising, as it is already at the performance
      of the current best container, std::vector, but will allows us to
      easily use a PMR container in the future:
      
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree boost::slist':
      
                5.455,84 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 910.951      page-faults:u             #    0,167 M/sec
          20.295.488.575      cycles:u                  #    3,720 GHz                      (83,32%)
              33.927.249      stalled-cycles-frontend:u #    0,17% frontend cycles idle     (83,34%)
          16.038.938.738      stalled-cycles-backend:u  #   79,03% backend cycles idle      (83,34%)
          22.319.974.139      instructions:u            #    1,10  insn per cycle
                                                        #    0,72  stalled cycles per insn  (83,34%)
           4.564.927.240      branches:u                #  836,705 M/sec                    (83,34%)
              21.123.736      branch-misses:u           #    0,46% of all branches          (83,32%)
      
             5,456638203 seconds time elapsed
      
             4,656348000 seconds user
             0,777383000 seconds sys
      ```
      
      Memory:
      
      ```
              Maximum resident set size (kbytes): 3649960
      ```
      eb572d8d
    • Milian Wolff's avatar
      Also benchmark std::list for tree construction · bc0be0d1
      Milian Wolff authored
      As-is, this lies somewhere in-between std::vector and QVector:
      
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std::list':
      
                6.227,71 msec task-clock:u              #    1,000 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
               1.068.071      page-faults:u             #    0,172 M/sec
          23.672.061.063      cycles:u                  #    3,801 GHz                      (83,33%)
             142.641.882      stalled-cycles-frontend:u #    0,60% frontend cycles idle     (83,33%)
          17.758.938.284      stalled-cycles-backend:u  #   75,02% backend cycles idle      (83,33%)
          22.212.698.253      instructions:u            #    0,94  insn per cycle
                                                        #    0,80  stalled cycles per insn  (83,33%)
           4.492.589.760      branches:u                #  721,387 M/sec                    (83,33%)
              21.193.689      branch-misses:u           #    0,47% of all branches          (83,33%)
      
             6,228371570 seconds time elapsed
      
             5,293329000 seconds user
             0,909763000 seconds sys
      ```
      
      Peak RSS:
      
      ```
              Maximum resident set size (kbytes): 4278300
      ```
      bc0be0d1
    • Milian Wolff's avatar
      Allow using std::vector instead of QVector in bench_tree · c9231de2
      Milian Wolff authored
      This is - contrary to what I thought - significantly faster than
      using QVector here:
      
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree std':
      
                5.657,58 msec task-clock                #    1,000 CPUs utilized
                      30      context-switches          #    0,005 K/sec
                       0      cpu-migrations            #    0,000 K/sec
                 913.370      page-faults               #    0,161 M/sec
          24.925.687.516      cycles                    #    4,406 GHz                      (83,30%)
           1.071.926.434      stalled-cycles-frontend   #    4,30% frontend cycles idle     (83,35%)
          15.463.080.632      stalled-cycles-backend    #   62,04% backend cycles idle      (83,35%)
          29.942.331.671      instructions              #    1,20  insn per cycle
                                                        #    0,52  stalled cycles per insn  (83,35%)
           5.935.680.160      branches                  # 1049,154 M/sec                    (83,35%)
              24.781.895      branch-misses             #    0,42% of all branches          (83,30%)
      
             5,658346811 seconds time elapsed
      
             4,769389000 seconds user
             0,867054000 seconds sys
      ```
      
      Also, it uses less peak memory:
      
      ```
              Maximum resident set size (kbytes): 3659356
      ```
      c9231de2
    • Milian Wolff's avatar
      Add a benchmark for creating a large tree · 3a41b660
      Milian Wolff authored
      This is basically the status quo used in heaptrack. Current timings:
      
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_tree qt':
      
                6.389,58 msec task-clock                #    1,000 CPUs utilized
                      11      context-switches          #    0,002 K/sec
                       0      cpu-migrations            #    0,000 K/sec
               1.066.317      page-faults               #    0,167 M/sec
          27.562.650.647      cycles                    #    4,314 GHz                      (83,33%)
           1.112.140.389      stalled-cycles-frontend   #    4,03% frontend cycles idle     (83,33%)
          15.580.172.700      stalled-cycles-backend    #   56,53% backend cycles idle      (83,33%)
          39.147.522.858      instructions              #    1,42  insn per cycle
                                                        #    0,40  stalled cycles per insn  (83,33%)
           7.680.291.744      branches                  # 1202,002 M/sec                    (83,33%)
              25.106.869      branch-misses             #    0,33% of all branches          (83,34%)
      
             6,390189396 seconds time elapsed
      
             5,500982000 seconds user
             0,863794000 seconds sys
      ```
      
      Peak RSS:
      ```
              Maximum resident set size (kbytes): 4271420
      ```
      3a41b660
  12. 17 Nov, 2020 14 commits
    • Milian Wolff's avatar
      Revert "Leverage std::timed_mutex::try_lock_for" · c2beecb8
      Milian Wolff authored
      This reverts commit fadba34c.
      
      It is actually slower to use std::timed_mutex here. Since the
      sleeping in the timer thread isn't too important, it's better to
      optimize the common case of writing data.
      
      With std::mutex:
      ```
        heaptrack --analyze "/home/milian/projects/build/heaptrack/heaptrack.threaded.21113.zst"
      
       Performance counter stats for 'heaptrack ./tests/manual/threaded' (5 runs):
      
                8.903,06 msec task-clock                #   17,911 CPUs utilized            ( +-  1,29% )
                 314.336      context-switches          #    0,035 M/sec                    ( +-  1,50% )
                  40.313      cpu-migrations            #    0,005 M/sec                    ( +- 13,18% )
                  49.464      page-faults               #    0,006 M/sec                    ( +-  0,53% )
          36.517.191.099      cycles                    #    4,102 GHz                      ( +-  1,42% )  (83,37%)
          24.890.585.818      stalled-cycles-frontend   #   68,16% frontend cycles idle     ( +-  1,60% )  (84,13%)
             926.640.652      stalled-cycles-backend    #    2,54% backend cycles idle      ( +-  2,28% )  (83,55%)
           9.228.118.514      instructions              #    0,25  insn per cycle
                                                        #    2,70  stalled cycles per insn  ( +-  1,97% )  (83,08%)
           1.951.426.273      branches                  #  219,186 M/sec                    ( +-  2,14% )  (83,48%)
              21.883.712      branch-misses             #    1,12% of all branches          ( +-  1,32% )  (82,39%)
      
                 0,49707 +- 0,00592 seconds time elapsed  ( +-  1,19% )
      ```
      
      With std::timed_mutex:
      ```
               10.452,71 msec task-clock                #   18,617 CPUs utilized            ( +-  0,31% )
                 384.406      context-switches          #    0,037 M/sec                    ( +-  0,55% )
                  49.914      cpu-migrations            #    0,005 M/sec                    ( +- 13,92% )
                  49.685      page-faults               #    0,005 M/sec                    ( +-  0,38% )
          42.967.520.168      cycles                    #    4,111 GHz                      ( +-  0,51% )  (82,96%)
          29.172.821.383      stalled-cycles-frontend   #   67,90% frontend cycles idle     ( +-  0,51% )  (83,39%)
           1.069.232.909      stalled-cycles-backend    #    2,49% backend cycles idle      ( +-  1,60% )  (83,91%)
          10.450.752.968      instructions              #    0,24  insn per cycle
                                                        #    2,79  stalled cycles per insn  ( +-  1,47% )  (83,18%)
           2.225.628.518      branches                  #  212,924 M/sec                    ( +-  1,50% )  (83,15%)
              24.390.495      branch-misses             #    1,10% of all branches          ( +-  0,75% )  (83,41%)
      
                 0,56145 +- 0,00924 seconds time elapsed  ( +-  1,65% )
      ```
      c2beecb8
    • Milian Wolff's avatar
      143af1d0
    • Milian Wolff's avatar
      5dd0bc68
    • Milian Wolff's avatar
      Leverage tsl::robin_{map,set} in GUI parser · 33df9c34
      Milian Wolff authored
      Before:
      ```
           Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs):
      
                    7.077,40 msec task-clock                #    1,142 CPUs utilized            ( +-  0,35% )
                         845      context-switches          #    0,119 K/sec                    ( +-  2,37% )
                         245      cpu-migrations            #    0,035 K/sec                    ( +-  5,21% )
                     139.577      page-faults               #    0,020 M/sec                    ( +-  0,27% )
              30.284.047.519      cycles                    #    4,279 GHz                      ( +-  0,22% )  (83,44%)
                 750.013.909      stalled-cycles-frontend   #    2,48% frontend cycles idle     ( +-  1,05% )  (83,24%)
               6.383.953.436      stalled-cycles-backend    #   21,08% backend cycles idle      ( +-  1,17% )  (83,19%)
              57.113.121.471      instructions              #    1,89  insn per cycle
                                                            #    0,11  stalled cycles per insn  ( +-  0,05% )  (83,30%)
              11.452.771.446      branches                  # 1618,217 M/sec                    ( +-  0,02% )  (83,40%)
                 209.271.140      branch-misses             #    1,83% of all branches          ( +-  0,17% )  (83,42%)
      
                      6,1972 +- 0,0240 seconds time elapsed  ( +-  0,39% )
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst':
      
                6.144,14 msec task-clock                #    1,156 CPUs utilized
                     611      context-switches          #    0,099 K/sec
                     167      cpu-migrations            #    0,027 K/sec
                 173.538      page-faults               #    0,028 M/sec
          26.075.118.232      cycles                    #    4,244 GHz                      (83,44%)
             542.631.671      stalled-cycles-frontend   #    2,08% frontend cycles idle     (83,40%)
           6.013.468.979      stalled-cycles-backend    #   23,06% backend cycles idle      (83,12%)
          50.576.695.591      instructions              #    1,94  insn per cycle
                                                        #    0,12  stalled cycles per insn  (83,32%)
          10.046.690.454      branches                  # 1635,166 M/sec                    (83,35%)
             179.839.003      branch-misses             #    1,79% of all branches          (83,38%)
      
             5,316701146 seconds time elapsed
      
             5,790727000 seconds user
             0,338084000 seconds sys
      ```
      33df9c34
    • Milian Wolff's avatar
      Use tsl::robin_set for heaptrack_print · 88c6ae8b
      Milian Wolff authored
      Should be faster, but I don't really care how much
      in this utility.
      88c6ae8b
    • Milian Wolff's avatar
      Use tsl::pointer_map for string remapping during diffing · 4c2e0e72
      Milian Wolff authored
      Before:
      ```
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst':
      
               10.913,06 msec task-clock                #    1,317 CPUs utilized
                     948      context-switches          #    0,087 K/sec
                     279      cpu-migrations            #    0,026 K/sec
                 224.389      page-faults               #    0,021 M/sec
          46.901.681.727      cycles                    #    4,298 GHz                      (83,43%)
             704.668.201      stalled-cycles-frontend   #    1,50% frontend cycles idle     (83,33%)
          14.905.704.111      stalled-cycles-backend    #   31,78% backend cycles idle      (83,23%)
         108.365.697.169      instructions              #    2,31  insn per cycle
                                                        #    0,14  stalled cycles per insn  (83,30%)
          19.402.933.947      branches                  # 1777,956 M/sec                    (83,35%)
             222.289.116      branch-misses             #    1,15% of all branches          (83,36%)
      
             8,285154704 seconds time elapsed
      
            10,577763000 seconds user
             0,313054000 seconds sys
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst ./heaptrack.kdevelop.189071.zst':
      
               10.372,46 msec task-clock                #    1,321 CPUs utilized
                     865      context-switches          #    0,083 K/sec
                     229      cpu-migrations            #    0,022 K/sec
                 222.593      page-faults               #    0,021 M/sec
          45.087.440.140      cycles                    #    4,347 GHz                      (83,38%)
             596.167.950      stalled-cycles-frontend   #    1,32% frontend cycles idle     (83,23%)
          14.191.678.571      stalled-cycles-backend    #   31,48% backend cycles idle      (83,22%)
         108.176.660.565      instructions              #    2,40  insn per cycle
                                                        #    0,13  stalled cycles per insn  (83,40%)
          19.375.195.741      branches                  # 1867,946 M/sec                    (83,40%)
             223.596.938      branch-misses             #    1,15% of all branches          (83,37%)
      
             7,850580978 seconds time elapsed
      
            10,029832000 seconds user
             0,319581000 seconds sys
      ```
      4c2e0e72
    • Milian Wolff's avatar
      Leverage tsl::robin_map in bench_pointerhash · 8866b584
      Milian Wolff authored
      Before:
      ```
      allocated vector:               73376
      allocated input pointers:       320000000
      freed input pointers:           224
      begin actual benchmark:         224
      pointers added:                 320000000 (100% overhead)
      pointers removed:               73600
      trimmed:                        73600
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash':
      
                9.160,03 msec task-clock                #    1,000 CPUs utilized
                      26      context-switches          #    0,003 K/sec
                       0      cpu-migrations            #    0,000 K/sec
                 250.563      page-faults               #    0,027 M/sec
          40.159.559.869      cycles                    #    4,384 GHz                      (83,33%)
             340.503.937      stalled-cycles-frontend   #    0,85% frontend cycles idle     (83,33%)
          36.881.695.646      stalled-cycles-backend    #   91,84% backend cycles idle      (83,33%)
          11.942.611.820      instructions              #    0,30  insn per cycle
                                                        #    3,09  stalled cycles per insn  (83,33%)
           2.468.955.121      branches                  #  269,536 M/sec                    (83,33%)
              10.938.236      branch-misses             #    0,44% of all branches          (83,35%)
      
             9,160886661 seconds time elapsed
      
             8,911365000 seconds user
             0,225871000 seconds sys
      ```
      
      After:
      ```
      allocated vector:               73376
      allocated input pointers:       320000000
      freed input pointers:           224
      begin actual benchmark:         224
      pointers added:                 224 (7e-05% overhead)
      pointers removed:               73600
      trimmed:                        73600
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointerhash':
      
                3.281,53 msec task-clock                #    1,000 CPUs utilized
                       6      context-switches          #    0,002 K/sec
                       0      cpu-migrations            #    0,000 K/sec
                 490.237      page-faults               #    0,149 M/sec
          14.705.858.311      cycles                    #    4,481 GHz                      (83,27%)
           1.799.042.842      stalled-cycles-frontend   #   12,23% frontend cycles idle     (83,35%)
           8.573.087.340      stalled-cycles-backend    #   58,30% backend cycles idle      (83,36%)
          11.080.704.070      instructions              #    0,75  insn per cycle
                                                        #    0,77  stalled cycles per insn  (83,36%)
           2.223.987.372      branches                  #  677,728 M/sec                    (83,36%)
              36.702.866      branch-misses             #    1,65% of all branches          (83,30%)
      
             3,282066401 seconds time elapsed
      
             2,802599000 seconds user
             0,465220000 seconds sys
      ```
      
      Note how bench_pointerhash is now seemingly faster and uses less memory than
      bench_pointermap! But blindly replacing one with the other shows
      much worse performance in heaptrack - so it seems like the benchmarks
      aren't showing real-world usage currently.
      8866b584
    • Milian Wolff's avatar
      Leverage tsl::robin_{map/set} for pointermap · 95a58296
      Milian Wolff authored
      Before:
      ```
      allocated vector:               73376
      allocated input pointers:       320000000
      freed input pointers:           224
      begin actual benchmark:         224
      pointers added:                 62208848 (19.4403% overhead)
      pointers removed:               96448
      trimmed:                        96448
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap':
      
                5.685,95 msec task-clock                #    1,000 CPUs utilized
                       9      context-switches          #    0,002 K/sec
                       0      cpu-migrations            #    0,000 K/sec
                 114.357      page-faults               #    0,020 M/sec
          25.672.059.553      cycles                    #    4,515 GHz                      (83,33%)
             416.801.612      stalled-cycles-frontend   #    1,62% frontend cycles idle     (83,33%)
          17.952.673.581      stalled-cycles-backend    #   69,93% backend cycles idle      (83,33%)
          12.248.196.258      instructions              #    0,48  insn per cycle
                                                        #    1,47  stalled cycles per insn  (83,33%)
           2.341.601.571      branches                  #  411,822 M/sec                    (83,34%)
              95.859.176      branch-misses             #    4,09% of all branches          (83,35%)
      
             5,686546406 seconds time elapsed
      
             5,510546000 seconds user
             0,146093000 seconds sys
      ```
      
      After:
      ```
      allocated vector:               73376
      allocated input pointers:       320000000
      freed input pointers:           224
      begin actual benchmark:         224
      pointers added:                 60645536 (18.9517% overhead)
      pointers removed:               96448
      trimmed:                        96448
      
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_pointermap':
      
                4.906,87 msec task-clock                #    1,000 CPUs utilized
                       9      context-switches          #    0,002 K/sec
                       0      cpu-migrations            #    0,000 K/sec
                 115.556      page-faults               #    0,024 M/sec
          21.930.029.620      cycles                    #    4,469 GHz                      (83,31%)
             314.571.970      stalled-cycles-frontend   #    1,43% frontend cycles idle     (83,31%)
          14.492.419.080      stalled-cycles-backend    #   66,08% backend cycles idle      (83,31%)
          12.059.692.467      instructions              #    0,55  insn per cycle
                                                        #    1,20  stalled cycles per insn  (83,35%)
           2.287.570.344      branches                  #  466,197 M/sec                    (83,37%)
              95.750.624      branch-misses             #    4,19% of all branches          (83,36%)
      
             4,907463968 seconds time elapsed
      
             4,749223000 seconds user
             0,132838000 seconds sys
      ```
      95a58296
    • Milian Wolff's avatar
      Import tsl::robin-map commit 84c1bee16e2449c28589ccd6ab366df257d18c24 · e316a352
      Milian Wolff authored
      Upstream: https://github.com/Tessil/robin-map
      
      Not using a git submodule, but may rethink that in the future...
      
      This is a much faster hash map which we can leverage in multiple
      places in heaptrack. It is MIT licensed, so should be fine to use
      here.
      e316a352
    • Milian Wolff's avatar
      Leverage monotonic_buffer_resource for the ReusableGuardBuffer · be5aff3f
      Milian Wolff authored
      This requires boost 1.60 which was released roughly 5 years ago,
      so I hope this is fine for everyone. Eventually we can just replace
      it all by std::pmr, once we depend on C++17.
      
      The impact is small but noticeable:
      
      Before:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs):
      
                7.339,25 msec task-clock                #    1,153 CPUs utilized            ( +-  1,98% )
                     860      context-switches          #    0,117 K/sec                    ( +-  9,56% )
                     242      cpu-migrations            #    0,033 K/sec                    ( +-  8,87% )
                 139.946      page-faults               #    0,019 M/sec                    ( +-  0,33% )
          31.016.368.103      cycles                    #    4,226 GHz                      ( +-  1,36% )  (83,45%)
             549.091.116      stalled-cycles-frontend   #    1,77% frontend cycles idle     ( +-  5,87% )  (83,25%)
           7.195.865.845      stalled-cycles-backend    #   23,20% backend cycles idle      ( +-  1,28% )  (83,19%)
          58.379.008.232      instructions              #    1,88  insn per cycle
                                                        #    0,12  stalled cycles per insn  ( +-  0,05% )  (83,31%)
          11.802.836.750      branches                  # 1608,180 M/sec                    ( +-  0,01% )  (83,40%)
             209.444.455      branch-misses             #    1,77% of all branches          ( +-  0,33% )  (83,41%)
      
                   6,365 +- 0,142 seconds time elapsed  ( +-  2,23% )
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst' (5 runs):
      
                7.077,40 msec task-clock                #    1,142 CPUs utilized            ( +-  0,35% )
                     845      context-switches          #    0,119 K/sec                    ( +-  2,37% )
                     245      cpu-migrations            #    0,035 K/sec                    ( +-  5,21% )
                 139.577      page-faults               #    0,020 M/sec                    ( +-  0,27% )
          30.284.047.519      cycles                    #    4,279 GHz                      ( +-  0,22% )  (83,44%)
             750.013.909      stalled-cycles-frontend   #    2,48% frontend cycles idle     ( +-  1,05% )  (83,24%)
           6.383.953.436      stalled-cycles-backend    #   21,08% backend cycles idle      ( +-  1,17% )  (83,19%)
          57.113.121.471      instructions              #    1,89  insn per cycle
                                                        #    0,11  stalled cycles per insn  ( +-  0,05% )  (83,30%)
          11.452.771.446      branches                  # 1618,217 M/sec                    ( +-  0,02% )  (83,40%)
             209.271.140      branch-misses             #    1,83% of all branches          ( +-  0,17% )  (83,42%)
      
                  6,1972 +- 0,0240 seconds time elapsed  ( +-  0,39% )
      ```
      be5aff3f
    • Milian Wolff's avatar
      Use std::unordered_set for recursion guards · 2437baf9
      Milian Wolff authored
      Will allow me to experiment with PMR and monotonic resource.
      Performance-wise it's pretty much the same compared to QHash now
      that we simplified Symbol.
      2437baf9
    • Milian Wolff's avatar
      Refactor GUI code to leverage string ids · db40cdf3
      Milian Wolff authored
      Basically this allows us to operate more on cheap ids instead of
      the fat strings. This approach has multiple advantages:
      
      - Symbol is now just 8byte large instead of 32byte
      - Accordingly RowData is down to 56byte from 80byte
      - We can now experiment with using PMR allocators and put a monotonic
        buffer resources into the new ResultData
      
      There are two downsides to this approach:
      
      - On one hand, we are now sorting our data differently. I.e. before
      the data was sorted by the actual string contents whereas now it's
      sorted by the string index. That shouldn't be a problem though, as
      we want to sort by actual metrics in most areas anyways. The only
      area where this is noticeable is the flame graph.
      
      - On the other side, we are now doing some things repeatedly. Most
      notably we don't cache the string basename anymore. But that's not
      overly noticeable either, as we usually only need that for a handful
      of rows at any time. If needed, we can even cache this in the future.
      
      The benchmark also shows some significant wins with this new approach.
      Most notably compare the total runtime of ~6.82s and peak RSS memory
      consumption of 555MB with ~6.17s and 490MB.
      
      Before:
      ``
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst':
      
                7.825,54 msec task-clock:u              #    1,148 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 155.988      page-faults:u             #    0,020 M/sec
          32.848.963.956      cycles:u                  #    4,198 GHz                      (83,22%)
             434.588.005      stalled-cycles-frontend:u #    1,32% frontend cycles idle     (83,18%)
           6.992.025.659      stalled-cycles-backend:u  #   21,29% backend cycles idle      (83,36%)
          63.255.414.218      instructions:u            #    1,93  insn per cycle
                                                        #    0,11  stalled cycles per insn  (83,39%)
          12.941.138.245      branches:u                # 1653,705 M/sec                    (83,43%)
             225.630.395      branch-misses:u           #    1,74% of all branches          (83,41%)
      
             6,817015591 seconds time elapsed
      
             7,613402000 seconds user
             0,189718000 seconds sys
      
              Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst"
              User time (seconds): 7.48
              System time (seconds): 0.19
              Percent of CPU this job got: 114%
              Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.70
              Average shared text size (kbytes): 0
              Average unshared data size (kbytes): 0
              Average stack size (kbytes): 0
              Average total size (kbytes): 0
              Maximum resident set size (kbytes): 554496
              Average resident set size (kbytes): 0
              Major (requiring I/O) page faults: 0
              Minor (reclaiming a frame) page faults: 156906
              Voluntary context switches: 827
              Involuntary context switches: 65
              Swaps: 0
              File system inputs: 0
              File system outputs: 0
              Socket messages sent: 0
              Socket messages received: 0
              Signals delivered: 0
              Page size (bytes): 4096
              Exit status: 0
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst':
      
                7.244,75 msec task-clock:u              #    1,174 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 139.184      page-faults:u             #    0,019 M/sec
          29.982.409.942      cycles:u                  #    4,139 GHz                      (83,44%)
             542.168.522      stalled-cycles-frontend:u #    1,81% frontend cycles idle     (83,22%)
           6.438.890.717      stalled-cycles-backend:u  #   21,48% backend cycles idle      (83,17%)
          57.743.166.637      instructions:u            #    1,93  insn per cycle
                                                        #    0,11  stalled cycles per insn  (83,40%)
          11.758.342.073      branches:u                # 1623,016 M/sec                    (83,40%)
             221.034.296      branch-misses:u           #    1,88% of all branches          (83,37%)
      
             6,170274615 seconds time elapsed
      
             6,975850000 seconds user
             0,249394000 seconds sys
      
              Command being timed: "/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst"
              User time (seconds): 6.73
              System time (seconds): 0.27
              Percent of CPU this job got: 116%
              Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.00
              Average shared text size (kbytes): 0
              Average unshared data size (kbytes): 0
              Average stack size (kbytes): 0
              Average total size (kbytes): 0
              Maximum resident set size (kbytes): 490132
              Average resident set size (kbytes): 0
              Major (requiring I/O) page faults: 0
              Minor (reclaiming a frame) page faults: 140792
              Voluntary context switches: 647
              Involuntary context switches: 110
              Swaps: 0
              File system inputs: 0
              File system outputs: 0
              Socket messages sent: 0
              Socket messages received: 0
              Signals delivered: 0
              Page size (bytes): 4096
              Exit status: 0
      ```
      db40cdf3
    • Milian Wolff's avatar
    • Script Kiddy's avatar
      GIT_SILENT made messages (after extraction) · a12d5515
      Script Kiddy authored
      a12d5515
  13. 16 Nov, 2020 8 commits
    • Milian Wolff's avatar
      Use std::remove_if to optimize diffing · a1d85b6c
      Milian Wolff authored
      Calling erase in a loop leads to quadratic behavior, as can be
      seen by the impact of this patch.
      
      Before:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst':
      
               75.307,69 msec task-clock                #    0,944 CPUs utilized
                   5.104      context-switches          #    0,068 K/sec
                     376      cpu-migrations            #    0,005 K/sec
                 196.391      page-faults               #    0,003 M/sec
         307.733.655.134      cycles                    #    4,086 GHz                      (83,34%)
           1.516.564.276      stalled-cycles-frontend   #    0,49% frontend cycles idle     (83,30%)
         111.558.209.335      stalled-cycles-backend    #   36,25% backend cycles idle      (83,30%)
         333.691.330.970      instructions              #    1,08  insn per cycle
                                                        #    0,33  stalled cycles per insn  (83,33%)
          37.612.802.347      branches                  #  499,455 M/sec                    (83,35%)
             243.957.169      branch-misses             #    0,65% of all branches          (83,38%)
      
            79,799304157 seconds time elapsed
      
            74,538971000 seconds user
             0,344263000 seconds sys
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser heaptrack.kdevelop.189071.zst heaptrack.kdevelop.189071.zst':
      
               11.540,35 msec task-clock                #    1,310 CPUs utilized
                   1.173      context-switches          #    0,102 K/sec
                     309      cpu-migrations            #    0,027 K/sec
                 195.605      page-faults               #    0,017 M/sec
          48.974.438.032      cycles                    #    4,244 GHz                      (83,40%)
             749.567.505      stalled-cycles-frontend   #    1,53% frontend cycles idle     (83,42%)
          17.760.476.929      stalled-cycles-backend    #   36,26% backend cycles idle      (83,06%)
         114.694.556.598      instructions              #    2,34  insn per cycle
                                                        #    0,15  stalled cycles per insn  (83,32%)
          20.690.990.493      branches                  # 1792,926 M/sec                    (83,39%)
             236.844.837      branch-misses             #    1,14% of all branches          (83,41%)
      
             8,808486636 seconds time elapsed
      
            11,187237000 seconds user
             0,322839000 seconds sys
      ```
      a1d85b6c
    • Milian Wolff's avatar
      Optimize Parser::handleTimeStamp when assembling chart data · 19313ef5
      Milian Wolff authored
      Instead of looping through the allocations at every timestamp,
      looking up whether to include it in the chart, do this once and then
      remember the allocation indices that should be used.
      
      Before:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst':
      
                9.912,44 msec task-clock:u              #    1,114 CPUs utilized
                       0      context-switches:u        #    0,000 K/sec
                       0      cpu-migrations:u          #    0,000 K/sec
                 154.526      page-faults:u             #    0,016 M/sec
          41.541.932.930      cycles:u                  #    4,191 GHz                      (83,42%)
             650.025.055      stalled-cycles-frontend:u #    1,56% frontend cycles idle     (83,31%)
           8.109.154.392      stalled-cycles-backend:u  #   19,52% backend cycles idle      (83,08%)
          70.928.385.225      instructions:u            #    1,71  insn per cycle
                                                        #    0,11  stalled cycles per insn  (83,36%)
          14.890.145.672      branches:u                # 1502,168 M/sec                    (83,39%)
             303.284.007      branch-misses:u           #    2,04% of all branches          (83,44%)
      
             8,898062968 seconds time elapsed
      
             9,584299000 seconds user
             0,294486000 seconds sys
      ```
      
      After:
      ```
       Performance counter stats for '/home/milian/projects/build/heaptrack/tests/benchmarks/bench_parser ./heaptrack.kdevelop.189071.zst':
      
                7.959,38 msec task-clock                #    1,143 CPUs utilized
                     963      context-switches          #    0,121 K/sec
                     241      cpu-migrations            #    0,030 K/sec
                 155.984      page-faults               #    0,020 M/sec
          34.013.707.752      cycles                    #    4,273 GHz                      (83,42%)
             944.704.460      stalled-cycles-frontend   #    2,78% frontend cycles idle     (83,14%)
           6.987.086.160      stalled-cycles-backend    #   20,54% backend cycles idle      (83,18%)
          64.337.973.769      instructions              #    1,89  insn per cycle
                                                        #    0,11  stalled cycles per insn  (83,42%)
          13.140.848.740      branches                  # 1650,989 M/sec                    (83,42%)
             225.817.303      branch-misses             #    1,72% of all branches          (83,42%)
      
             6,960656687 seconds time elapsed
      
             7,704212000 seconds user
             0,232872000 seconds sys
      ```
      19313ef5
    • Milian Wolff's avatar
      merge comment lines · 594d89da
      Milian Wolff authored
      594d89da
    • Milian Wolff's avatar
      a3dfb3d5
    • Milian Wolff's avatar
      Move test-only code to test · 7e97f69d
      Milian Wolff authored
      7e97f69d
    • Milian Wolff's avatar
      Cleanup: use find_if instead of raw loop · d6a80f7c
      Milian Wolff authored
      d6a80f7c
    • Milian Wolff's avatar
      Simply code by leveraging QHash::operator[] · f2464df4
      Milian Wolff authored
      It has exactly the semantics that we need here, no need for special
      functions that replicate it.
      f2464df4
    • Milian Wolff's avatar
      Further optimize recursion guard handling · 0cf8cfa6
      Milian Wolff authored
      Replace lookup+insert with just an insert and check if the size
      changed to track whether we actually overwrote an item or not.
      Sadly, QSet doesn't have the nice API of std::unordered_set here,
      but otherwise it's faster.
      
      Also use SymbolId in one more place.
      
      Sadly, the performance impact isn't really noticable.
      0cf8cfa6