The cause of Z2k ...
No doubt by now some of you have heard of the Z2K bug that's been affecting all 1st generation Zune 30 models the day before the new year; mine being one of them. Apparently, and the Microsoft Zune guys confirm this, there was a bug in the driver that controls the real time clock on that specific model. The problem was with how the Zune's firmware handled resetting it's internal clock for the last day of a leap year.
Bloops!
If anyone has this model of the Zune, what they saw when they tried using it was a hanging boot screen with a full progress bar.
Word from Microsoft was that the counter would reset itself exactly 24 hours after the day our Zune's failed. The fix was, basically, to do nothing but wait a day without our Zunes. Well, that day has come and gone and I can gladly say that mine is back up and running like a champ.
I was curious as to what the exact problem was with the firmware and why it would go tits up while doing something as mundane as a simple date check. Well, I wasn't the only curious coder as someone managed to post the offending source file on the internets.
Mulling about the code snippet we can traverse the library's path of execution to figure out where it all went wrong. At some point during the boot process, the kernel calls the OEMGetRealTime() function to grab the current date. This function invokes a helper function declared as GetTime() which calculates the time and day from a provided timestamp.
Everything seems fine until it calls yet another helper function named ConvertDays() which calculates the current day number of the provided year. Taking a very close look at this function reveals the problem. Here is the offending bit of code at around line 259:
while (days > 365)
{
if (IsLeapYear(year))
{
if (days > 366)
{
days -= 366;
year += 1;
}
}
else
{
days -= 365;
year += 1;
}
}
This loop was written to execute only on the last day of the year. If we iterate through the days of a year using code, chances are we're using some sort of incrementor. So, by the time we hit 366 days, we know we're pretty close to the end of, at least, a standard year. But, what about leap years? 2008 was a leap year because it's divisible by 4. So, this means that 2008 had 366 days. If this is the case, our incrementor will stop at 367.
We can see the check for the leap year in line 3 of the code snippet above. What happens next is what killed the Zune. The code accounts for a day count of anything greater than 366, which would include 367. But, the zinger here is we're not checking if the day count IS EQUAL TO 366. Since we aren't resetting the day count to 1 and breaking the loop, the poor Zune is stuck in an infinite loop which causes the boot screen to hang.
What should have happened is something like the following:
while (days > 365)
{
if (IsLeapYear(year))
{
if (days >= 366)
{
days -= 366;
year += 1;
}
}
else
{
days -= 365;
year += 1;
}
}
If the day counter reaches 367, we now this is the first day after the new year following a leap year, January 1st, 2009. We can now exit the loop and continue booting.
Microsoft told us to wait 24 hours because then we wouldn't be in a leap year, bypassing the troublesome block of code completely. We shouldn't see this happen again for another 4 years!
Could this have been prevented? Well, of course it could have. Code review doesn't always catch all the problems because it's being reviewed by human eyes. We tend to miss things in the code, especially if we've been staring at 1,000's of lines of it all day. What could have prevented this is simple and thorough unit testing.
But, hey, I'm gonna give Microsoft a break. They've made a great product and after going through 5 5th generation iPods in ONE year, I decided it was time for a change. I picked up the white brick the week it was released and haven't had a single issue until 2 days ago. Not bad, guys, not bad at all.
Besides, how many times have we caused self-inflicted facepalm moments? 'Duh' moments like will happen to any software engineer.
One last thing before I go, you gotta hand it to the guys at Redmond, their code is quite pleasant to look at. I don't care what your coding standard is as long as you use it consistently and these guys got it juuuust the way I like it. Kudos 2 u!
Till next time!
Note: This article as been republished after being accidentally wiped from my hosting provider. Hurray for cached RSS feeds!
Also Note: This article has been partially updated to reflect a great suggestion from Nick Shepherd


