Downtime
From TheCommandLineWiki
N.B. The notes for this episode are especially anemic, the transcript is going to be important.
Part of The Inner Chapters Unbook.
Originally part of podcast episode number forty-nine].
Dedicated audio available from Podiobooks.
Contents |
Original Notes
- Valuable for learning
- Experiment, prototype
- Writing first JMX beans
- DHTML scheduling application
- Guerilla coding projects
Transcript
All of the hassle, heartache, grief, and stress that I went through over the past five or six weeks finally did taper off. We did get a pretty reasonable release out, some patches to deal with some stability issues, and then I got about a week and a half of downtime after everything had been settled. We had set a meeting for this most recent past Monday, actually, to do release planning for the next release and the rest of the year. But that left me in a state where I had, as I said, about a week, week and a half of no pressing things that I had to fix towards a short-cycle release and no marching orders for the next major release. So I was — well, I won't say at loose ends, but yeah, I didn't really have anything pressing that I needed to get done. So thinking back over that, now that I'm ramping up on the next release, just realized that it's important to take good advantage of those opportunities. It's so often the case that you get into a sprint mode from release to release to release to release, that when you're given a breather you don't sit down and relax and take full advantage of it for what it is. You immediately go, "What do I have to do next?!" and then just start doing that instead of saying, "Well, what would be best for me to do right now? I don't have to do something else. Nobody's asking it of me. There's time where the stakeholders have to meet and chew it over. There's a natural opportunity to look at some off-critical path things."
I think that downtime first and foremost is valuable for learning. When you've got a week where you don't have a project plan in front of you with milestones that you have to hit, that's the time when you can crack a new book that you've been saving to take a look at. Also, very valuable for experimenting and prototyping. That's how I spent my downtime. I had been meaning to write some instrumentation via the JMX spec (I think that's actually JSR-1) for our current applications, actually for the panel server which is, you know, my favorite piece of code to work on, arguably the most critical piece of our system, at least from the server perspective as I've talked about. The firmware on the embedded side is just as critical for those devices, and I think that's the critical link that makes our business possible on either side.
But in any case, a lot of the stability issues that we had to deal with over the last release, a lot of the problems that we had to address, we had unfortunately too coarse-grained instrumentation to tell exactly what was going on. If we had good logging in the right place, then we had a pretty good idea. But if it was in some place where we hadn't given much attention to logging, then we weren't able to really tell. And one of the things that I'd been meaning to do to address that was to implement some management means to capture things of interest to us and give us some operations, too, to give us a little more control over the running panel server. Now, unfortunately, this was not a high enough priority to say, "Well, this is something that's going to happen as part of some given release, and therefore it's going to be given due time and due testing". And JMX is just not something I had implemented before. So during my downtime week I grabbed the book, I took a one-off component... We had this, well, we call it the Poker Daemon — although, I don't know, I don't really like what that name conjures up as far as an image goes. But it sings a ping out, sends a four-byte ping out when it detects a change. It's a polling daemon unfortunately; we'll probably work that over over the coming year to make that more efficient, but every five minutes it wakes up, figures out whether a panel has some changes pending, and for wireless panels that talk to us less frequently — because all the panels have to initiate communications first. It's http. We can't contact those clients, and it's actually, that is an advantage for us selling into a company where we require less of an IT manager to configure. There's less concern. They can treat it just like any web browser on their network.
05:03
But anyway, it does leave us with these wireless panels where we turn the polling on the wireless panels down to conserve bandwidth, because we pay for all of the aggregate bandwidth on the wireless panels, but the service level may drop off as a consequence. This may mean that at maximally worst case scenario, you've got to wait 15 minutes for a panel configuration change: a new user, a new card, a new PIN code, a new schedule on a door, access schedule, an unlock schedule may take at most 15 minutes to go down. So this Poker Daemon was a compromise to say, well, we've go the time stamps, we can tell when the panel last checked in, we can tell when a change occurred. And then in the five-minute interval, which is comparable to the service level for the wired panels — ones plugged directly into ethernet, just go out through a standard existing LAN and internet connection — every five minutes it will send, if a change is detected in the database, a four-byte ping out to the wireless panel. The panel receives it, it will wake up and come in, and it will send whatever it has cued to us when it needs to send up to us, but more importantly it will pull its data file down and get those changes.
The problem with the Poker Daemon as implemented is, it was single-threaded, so it wasn't very efficient. You know, if it had to contact more than a handful of panels it would take a lot longer than it should, and we weren't getting the service levels out of it. So something that has needed a rewrite for a while. It's very small, a single class, less than 200-250 lines of code, a very well encapsulated piece of work. So I looked at rewriting that, and then in the process of rewriting that (because it's not mission critical that we rewrite it) adding in and experimenting with some prototyping some management code. And I have to say I'm pretty pleased with the results. I can tell what the new ping daemon is doing, how many threads it's running. I can actually change its configuration on the fly, I can stop it and start it thanks to the JMX folks, without having to stop and start the container, which is nice. Because if I have to stop the container I have to stop the panel server as well, and there's a cost associated in bandwidth renegociating all of our open SSL sessions.
So that was an excellent use of downtime. Prototyping something where I didn't have the comfort level to do it as part of a release and maintain the confidence in my estimates that I normally like to maintain. The nice thing now is going into the next release now I'm going to be a lot more comfortable adding instrumentation in as a parallel feature for a lot of the new code and a lot of the refactored code because I'm a lot, that much more comfortable for that prototype experience, that experiment experience.
Another great example of this was Trithemius (who I've talked about in the past if you listened for some time) is my counterpart on the embedded side. So he's the lead embedded engineer working on the new stand-alone products. This runs on the same hardware, it runs most of the same firmware, but it has an embedded web server and an administrative UI in it that maybe is a 80-85% feature set, feature complete, of the main server product. And the scheduling application for setting up a week-long schedule again for unlock or access on a door wasn't very strong. We have a Flash application that nobody really likes, and it was a royal pain to localize on the server side. He didn't want to be running Flash on the stand-alone because he has to conserve storage. He's only got, I think, 32 Megs of actual, essentially of disk storage — it's all flash, and you want to minimize the read and writes on that anyway. So he set about, he's been really ramping up incredibly on both Ajax and just pre-Ajax standard dhtml. So he set about writing a drop-in replacement for this Flash application in just html and Java script. And I don't think that... Now I don't know his schedule, so this is my perception. So I'll have to apologize because I know he's a sometime listener to the show, and if he thinks that I'm mischaracterizing that he did this during downtime, I apologize if that wasn't the case. But I think this is a great downtime kind of project, a great experiment / prototype sort of project.
9:12
It wasn't critical. He had something that worked well enough. Nobody was complaining — or if they were complaining, it wasn't at the top of their lists that they wanted a new, better schedule editor. But it was something that he wanted to do, it was something we had talked about, that we thought that if he did it right, he could do it very efficiently. He did. It's only a couple hundred lines of Java script. It's phenomenal. It looks great, it works great, and we'll be able to drop it into the server code and replace the Flash app that we're using for online server, as well. Which means, localizing is going to be a real treat in comparison because it's going to be all plain text and all html. We'll be able to use the Java for the localization of the server product that we're using elsewhere.
So, again, it's something that, it wasn't important enough, I think, for him to pursue as a "this has to be there for feature complete". It wasn't something where, I think that if he had other things pressing he could justify prioritizing it above. I think he also, he was doing some diagnostic work and he needed to take a breather, so this was almost a good example of some enforced downtime, so he was shifting his brain to work on a different problem, something more constructive, more positive, to give his hind-brain more time to chew over and think over something he was having trouble with. Now, I don't think he's made unfortunately any progress on the buggy flash chips. But he's maintained his sanity: he hasn't flipped out and hurt himself or hurt anybody else. So I think it's time well spent. We have a great little widget. It really is great! If he's listening, you know, I hope he's chuckling as I'm talking about this and understand why I'm talking about him and not getting upset, of course. If he is listening I'll hear about this Tuesday when I get back to work, and he'll probably smack me or whatever, but...
Actually to my final point in talking about downtime, in talking about some of these second priorities, second-tier projects. We used to call these, back a couple of jobs ago when I was working with a great peer group of senior lead developers. We used to call these guerilla coding projects. These were things that we couldn't go to project management and say, "We have to do this-and-this release or the architecture will fall apart". We knew that they were purely engineering projects. We knew that the priorities and the reasons for doing it were purely based on the engineering. It didn't have anything to do with the business case or business opportunity. So we knew we'd have a really hard time justifying them. So we would very often use our down time, or you know, the interstitial spaces between other features that we were developing. Or, like the DHTML scheduling example, we might force — you know, work a couple of late nights, work a Saturday to inject a little extra time into the schedule to address some of the stuff that we knew from an engineering standpoint had to be done. You know, rewriting a mail component that was single-threaded and slow. Ah geez, it's been too long, I can't think of any great first-hand examples, and I'm also running long anyway.
But I think you get the idea of why I'm talking about this downtime as valuable. You can shift your perspective and work on a project that's more enjoyable, more positive, more fun for you. You can undertake something that's a little more risky without feeling like you're putting the rest of the project at risk. So like the man said, the music is as much about the spaces as it is about the notes. So appreciate those pauses that you'll get in your development cycle from time to time and capitalize on them.

