Skip to content
  • Akarsh Simha's avatar
    Experiment 1: Can we cache trig values during first computation? · 2d4f65a2
    Akarsh Simha authored
    Is there an improvement if we cache trig values at the cost of one
    conditional branch to see if the cache is available or not?
    
    The answer seems to be no! Running this code and comparing the results
    printed to the console against the previous one do not seem to
    indicate any improvement:
    
    BEFORE:
    
    DeepStarComponent::draw (322) - Spent 0.254752 seconds doing 2233550
    trigonometric function calls amounting to an average of 0.000114057 ms
    per call
    
    AFTER
    
    DeepStarComponent::draw (322) - Spent 0.229613 seconds doing 2233550
    trigonometric function calls amounting to an average of 0.000102802 ms
    per call
    
    The difference seems very insignificant. Although there might be a
    consistent trend of improvement, it is not staggering.
    
    I rationalize the results as follows (on Intel x86_64):
    
    1. If we assume that a branch mispredict takes about 10 CPU cycles
       (https://gist.github.com/jboner/2841832) and the float comparison
       takes another 10 CPU cycles, we add about 20 CPU cycles by
       introducing the branch.
    
    2. sin() and cos() seem to take about 55 CPU cycles each, sincos()
       taking about 210. So let's suppose 52 CPU cycles on average.
    
    3. Since about 50% of the trig function calls are redundant, and the
       redundant and non-redundant calls are interspersed, the branch
       predictor is very likely to mispredict.
    
    So this suggests that in the 50% cases where we are asked to do a
    fresh computation, we spend ~ 72 CPU cycles, whereas in the 50% cases
    where we don't need to do any computation, we spend ~ 20 CPU
    cycles. The resulting average is about 46 CPU cycles, which is not a
    significant improvement from 52 CPU cycles.
    
    This suggests that Experiment 2 should be as follows:
    
    Create an inherited class of dms called FastDms that caches
    trigonometric values every time the angle changes. So in this class,
    we basically assume that we _will_ call sin() and cos()
    eventually. Otherwise, we introduce overhead. The assumption can be
    verified by counting calls and profiling.
    2d4f65a2