Sometimes you stumble upon an exceptional bug ; most bugs are stupidly boring but some are interesting, especially the ones that are perfectly deterministic but whose causes are really mysterious at first.
My weirdest bug ever was a UI involving layered window on Windows. If you’re unfamiliar with win32 programming, it is the name of the technology that allows to do transparent windows under Windows. That UI was a sort of very fluid transparent floating buttons over the desktop.
The mystery machine
This bug manifested itself by the display rate being slow once every 2 times. Once the UI was loaded, it was slow or fast, but we could never know beforehand. I didn’t noticed that distribution before knowing what’s the problem really was, because of course it didn’t occured in a row. It only made sense at the end.
Kernrate profiling
Facing a performance issue you know nothing about, your first tool is kernrate, which is a statistical profiler. It pools the instruction pointer at periodic interval and summarize where the time goes. It’s a very fast and easy, non invasive way of doing profiling. Nyanaeve has a good introduction about it.
With kernrate, we learned that the time was spend in kernel level, not in user level. Kernel level mostly means : drivers. I also got a position :
- EngModifySurface+0x8cc
The only issue is that this meant : nothing.
Windows source code is obviously unavailable to most of us, but Microsoft just know that sometimes you need a little more insight on the platform and provide debug symbol files which can tell you more precisely where you are. If you’re unfamiliar with them, they mainly provide the correspondance between memory adresses and function names.
With the debug symbols installed, we got a much more interesting position :
- mmxAlphaPerPixelOnly
- vSrcAlignCopyMemory
MMX & memory allocations
It took me a while to exactly understand it. What happened was that the video card didn’t had an hardware implementation for layered windows so its driver allocated memory and forwarded it to the default implementation.
Writing gui drivers for Windows is not an easy task, you have to handle pixel writing in many pixels format like 2/2/2, 2/2/4, etc and conversions between all thoses format as well. So MS was nice enough to write default implementation for all those cases. When a driver doesn’t know how to handle something, it allocates memory and gives it to the default Windows implementation. So you write a miniui driver and let windows handle all the special cases.
This default implementation is potentially slower, but not that slow, and
beyond that it wouldn’t be random. So that’s not the reason. But this was
all we got, and we needed to know.
This specific default implementation of the operation was MMX optimised,
because it’s faster of course, and was probably enabled as soon as an
appropriate processor is available. MMX intructions can only operate on 8 bytes aligned memory.
The driver didn’t cared about that and provided a 4 byte aligned memory block. By shear luck the allocation was magically 8 bytes aligned, other times it was 4 bytes aligned.
The Windows implementation couldn’t operate on 4 bytes
aligned memory so it copied it to another memory block, processed it and copied it back again.
As the driver only allocated the memory once at the creation of
the window it was fast once every 2 and never changed after that.
Dead end
That’s a leaky abstraction to its fullest, an hardware requirement caused a very visible user experience problem, sadly unfixable.
A fix would require the source code or the driver, rebuilding it and redeploying it to everyone. A workaround would be to keep creating window until once got created with correct memory alignement, but there is no way to detect it.
Dead end, won’t fix ! But the trip was awesome still.
And you what was your weirdest bug ?