My weirdest bug

Sometimes you stumble upon an exceptional bug ; most bugs are stupidly boring but some are interesting, especially the ones that are perfectly deterministic but whose causes are really mysterious at first.

My weirdest bug ever was a UI involving layered window on Windows. If you’re unfamiliar with win32 programming, it is the name of the technology that allows to do transparent windows under Windows. That UI was a sort of very fluid transparent floating buttons over the desktop.

The mystery machine

This bug manifested itself by the display rate being slow once every 2 times. Once the UI was loaded, it was slow or fast, but we could never know beforehand. I didn’t noticed that distribution before knowing what’s the problem really was, because of course it didn’t occured in a row. It only made sense at the end.

Kernrate profiling

Facing a performance issue you know nothing about, your first tool is kernrate, which is a statistical profiler. It pools the instruction pointer at periodic interval and summarize where the time goes. It’s a very fast and easy, non invasive way of doing profiling. Nyanaeve has a good introduction about it.

With kernrate, we learned that the time was spend in kernel level, not in user level. Kernel level mostly means : drivers. I also got a position :

- EngModifySurface+0x8cc

The only issue is that this meant : nothing.

Windows source code is obviously unavailable to most of us, but Microsoft just know that sometimes you need a little more insight on the platform and provide debug symbol files which can tell you more precisely where you are. If you’re unfamiliar with them, they mainly provide the correspondance between memory adresses and function names.

With the debug symbols installed, we got a much more interesting position :

- mmxAlphaPerPixelOnly
- vSrcAlignCopyMemory

MMX & memory allocations

It took me a while to exactly understand it. What happened was that the video card didn’t had an hardware implementation for layered windows so its driver allocated memory and forwarded it to the default implementation.

Writing gui drivers for Windows is not an easy task, you have to handle pixel writing in many pixels format like 2/2/2, 2/2/4, etc and conversions between all thoses format as well. So MS was nice enough to write default implementation for all those cases. When a driver doesn’t know how to handle something, it allocates memory and gives it to the default Windows implementation. So you write a miniui driver and let windows handle all the special cases.

This default implementation is potentially slower, but not that slow, and
beyond that it wouldn’t be random. So that’s not the reason. But this was
all we got, and we needed to know.

This specific default implementation of the operation was MMX optimised,
because it’s faster of course, and was probably enabled as soon as an
appropriate processor is available. MMX intructions can only operate on 8 bytes aligned memory.

The driver didn’t cared about that and provided a 4 byte aligned memory block. By shear luck the allocation was magically 8 bytes aligned, other times it was 4 bytes aligned.

The Windows implementation couldn’t operate on 4 bytes
aligned memory so it copied it to another memory block, processed it and copied it back again.

As the driver only allocated the memory once at the creation of
the window it was fast once every 2 and never changed after that.

Dead end

That’s a leaky abstraction to its fullest, an hardware requirement caused a very visible user experience problem, sadly unfixable.

A fix would require the source code or the driver, rebuilding it and redeploying it to everyone. A workaround would be to keep creating window until once got created with correct memory alignement, but there is no way to detect it.

Dead end, won’t fix ! But the trip was awesome still.

And you what was your weirdest bug ?

Should I put more bugs into my software ?

Issues are one of the rare occasions were, when everything is automated, you talk with users of your software. Recently I had some contact with users that liked my software but had some bugs :

SwiffOut, my browser extension is exactly the kind of problem I’d prefer to stay away of ; it’s unfixable : I parse webpages to find games and makes them really fullscreen in a browserless way. No website is built the same way, of course, so I use an heuristic to find what’s the most likely to be a game, pick up that and pray for the best.

Please god of mozilla, chrome, IE, add a way to get the position of an
element across iframes…

Social games like Farmville tend to not work as they do some javascript
transaction with the webpage and the servers; some websites don’t really like that you transform their stuff, so they add tokens so that pages and games can only load as a pair. I could have created a subwebpage that could have solved some of the issue, but it would probably still not fix everything. Try to get facebook environment without the UI ?

Some time ago, one of my user suggested that I add a donation button because he’d like to give back. While a very nice proposition, I had tried it and I knew that very few people actually give back. Still, I would have really liked to accept and it cost me to refuse (who doesn’t want some money for their work ?). But, I’m not really in need for 5 or 10 bucks so I asked that he
talk about it to his friends instead.

So he went and wrote that on reddit

“It’s a pretty clever idea; instead of resizing the flash (which typically will
cause lag) it resizes your monitor resolution to the closest / best size for
displaying the swf full-screen. That’s why it’s called Swiffout — it gets the
.SWF out of the browser. After playing a ton of games on Kongregate and Armor it’s pretty clear to me that many of these games were intended to be played full-screen. I gave the developer some love (and a crash report) via email and suggested he put a donate button on his page so i could support his efforts. He requested that instead of donating that I please spread the word about his plug-in. So that’s what I’m doing. If you check this out and give it a
thumbs-up, please thank him by doing the same.”

Probably because it was a good story it did quite well on reddit up to the
subreddit gaming home page.

Two weeks later ghack.net spotted my extension and wrote a nice review, then lifehacker.ru wrote another review, then some italian websites did too.

The fun thing is that all of this happened because SwiffOut didn’t work. A smartbear already spotted that tech support is sales some times ago, but I now realise how much it can be true.

So should I add more bugs into my software ?