Improving parallelization of the blur effect

Currently, the blur effect works like this:

copy part of the main framebuffer to an offscreen texture
do the blur steps
copy it back to the main framebuffer

Step 1 hides something though: It needs to make the GPU wait for rendering to be complete before it can copy from the framebuffer. When multiple blur regions are involved, that waiting happens for each one of them, even if they don't overlap.

I propose that the blur effect should use a different approach, which avoids that serialization:

render everything below the window into an offscreen framebuffer
do the blur steps
render the blurred image to the main framebuffer

This would improve parallelization a bunch and thus reduce the time needed for rendering, as it allows the GPU to handle rendering of the main scene and of each blur region separately. It would also increase CPU and GPU usage a bit, as the scene renders the bits below blurred windows that afterwards just get painted over by the blur effect, but that should be possible to fix.

As another advantage, it would also fix the issues mentioned #115 (closed), as the effect can then render any unblurred parts below the window that it needs at any time.