Debugging

From TheCommandLineWiki

Jump to: navigation, search
This transcription is complete!.

Part of The Inner Chapters Unbook.

Originally part of podcast episode number twenty-seven.

Dedicated audio available from Podiobooks.

A second chapter on the same subject was release in the February 13th, 2008 episode. I am pretty sure these two are distinct and will need their own transcriptions that can then be coalesced into either a single chapter or two chapters, the latter building on the former.

Contents

Notes

December 27th, 2005 Release

  • For me, the week before last week was heavy debugging
  • "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan
  • Forensic mentality
    • "Works for me" is not an acceptable answer
      • This means something is different between you and the tester
      • Even if it is a valid data difference, you should understand this and either correct their data or make the code more robust
    • Understand the variables
      • If reported by someone else, talk to them, observe them
      • Otherwise, change one variable at a time
      • Variables include the OS and environment
    • Run many repetitions to reveal load and timing issues
  • Techniques
    • Printf/logging
      • Still valid, even with interactive debuggers
      • Sometimes, this is quickest for large systems with multiple dependencies
      • Also reveals information about running system, when there is a new or unexpected problem
    • Interactive
      • Great for small programs with limited dependencies
      • Can be worth the investment for larger programs
    • Unit testing
      • When you can duplicate a bug, write an automated test to stress it
      • Once youve fixed, the new unit test ensures it stays fixed
  • When you have fixed a bug, the code should reflect what you learned
    • There should be enhanced or new guards in the code for these problem conditions
    • Logging should be enhanced to help reveal similar problems if they occur later
    • Comments should capture any implicit knowledge that may have been confusing the issue

February 13th, 2008 Release

  • What is a bug?
    • A programming error in your own code
    • An error in someone else's code
    • An edge case or counter example not considered
    • Bad user input not handled well
    • Unusually circumstances, like server outage, not handled well
  • Bugs are usually multivariate
    • The simple, obvious stuff gets tested during development
    • Even automated unit testing only exercises what hacker anticipated
    • Fully understanding a bug requires asking questions
    • Need a way to answer those questions
  • Debugging is the set of practices used to answer questions about a bug
    • Usually with the goal of fixing
    • Not always, bugs vary in frequency and severity
    • If a bug is not very bad or happens rarely, may not be worth fixing
  • Similar to scientific method
    • Observe something unusual
    • Form a hypothesis about how, why it is happening
    • Craft an experiment
      • Send particular inputs into the application
      • Stress it an a specific way
      • Change its configuration
    • Invalidate or confirm hypothesis
    • Iterate if the hypothesis is invalid
    • Make a plan if you have a working theory
    • Important to control the experiment
      • Create a base line, first
      • Only change one variable
      • If you don't have a baseline, no idea whether things are improving
      • If you change more than one variable, no idea what causes what difference
      • Run the experiment repeatedly to smooth out fluctuations
      • Some applications have one time conditions on startup
      • Also smoothes out small differences in time if working on performance
  • Advanced tools
    • Logging
      • Sometimes called print or printf
      • printf refers to a C function for formatted output
      • Usually involves confirming via output statements that a program hit some code
      • Originally, could only really write to standard out or standard error
      • Even then, idea of verbosity added some control
      • Turn on debug, more verbose, get more output
      • Ability to turn off lets application run faster
      • Logging refers to built in facility to write directly to a log file
      • Some systems, like log4j, are very sophisticated, flexible
      • Many try to optimize performance so it is as close to a non-concern as possible
      • Logging is useful if you would want to output something anyway
      • Drawback is it requires code changes to uncover new things
      • Expensive for watching variables change
      • After a bug is resolved, do you still need that logging?
    • Interactive debuggers
      • Usually application is run inside another, custom execution environment
      • Can inspect all data of a running application as it runs
      • Can pause the application, can step line by line through it
      • Usually can step into functions, procedures
      • Break points, places where code pauses, can often have conditions
      • Changed execution environment can alter program behavior
      • Debuggers require a way to map to source
      • Variable, function names usually stripped by compiler
      • Can usually choose to leave debug symbols in
      • This can inflate the size of the executable
      • It often makes de-compiling easier, if secrecy is a concern
    • Permanent, semi-permanent instrumentation
      • Some languages, environments exploring permanent instrumentation
      • Java has debugging hooks built in
      • Makes builder debuggers easier
      • Means a debugger can attach to a regular JVM, as it runs
      • Less likely to alter runtime behavior
      • Also means debugger can more easily attach over a network
      • gdb, traditional debugger, can also support remote debugging
      • If debugger is detached, little or no overhead
      • dtrace is similar, allows investigation of running application
      • Doesn't require intrusive instrumentation
      • Can investigate on demand, use code like queries, expressions
  • Like any other practice, need to understand all aspects, variations
    • You may not need them for each case
    • Should be able to make appropriate choices
    • If you cannot successfully debug, you cannot maintain software
    • Software spends more of its life in maintenance than any other phase of its life cycle

Transcript

December 27, 2005 Release

As promised, I have another Inner Chapter for you. This week the Inner Chapter is on the practice of debugging. Now, a good friend of mine sent me a wonderful quote from Brian W. Kernighan: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Wow! What a wonderful quote to lead off with. I couldn't have said it any better than that myself. That speaks to what I've been talking about with simplicity and budgeting, simplicity and complexity in your designs and in your implementation, that particular dimension guiding your refactoring work. But it really speaks to why, in the implementation, you want to strive for simplicity as much as possible, and I think that's a great point to lead off with: debugging actually doesn't start when you have a problem, but goes all the way back to your design and your implementation -- that, if you write maintainable code, if you write the simplest code, the clearest code to get the job done, then your debugging is going to be that much easier as a consequence -- with, again, the admission that debugging is harder than coding in the first place.

But I think you have to, in order to approach debugging well and do a good job of it, serve your organization, serve your user base well, you have to cultivate a forensic mentality. What I mean by this is, well, maybe a couple of things: an observance of scientific method, formulating hypotheses and theories, testing them rigorously and in a disciplined way to either validate and verify or falsify in order to amend and improve your theories, your hypotheses about what is causing the buggy behavior, what is causing the problem until you're able to arrive somewhat confidently at a solution, an understanding at least of what's going on, and then if you have that understanding, then that's the best position to be in in order to affect a fix.

Now I find, this being the case, that the pat answer of "works for me!" -- which, sadly, I've actually seen encoded into, institutionalized into bug reporting systems, as a valid resolution status for a defect -- is not an acceptable answer. "Works for me" implies that you don't know what's going on. There's a difference between your behavior in trying to duplicate and express the bug and the tester. Or there's a difference in the data environment of the tester, and you don't understand what those differences are. Even if it just boils down to a valid difference in the data, that historically the tester has built up their data set slightly different from you, and they just happened to express the bug and you don't, you need to understand that. And you either need to be able to correct that data and test your case, or you need to figure out how the code needs to be made more robust to deal with that emergent characteristic of the data set that you found.

You definitely need to understand the variables at play. This goes back to my point at the outset, talking about the definition of forensic mentality. You have to be disciplined enough to only change one variable at a time. Actually, some of this, you have to go back to just pure observation: empiricism. If the problem was reported by somebody else, if it wasn't something that you yourself found, you need to go talk to them so you really understand what it is they were doing instead of operating on implicit knowledge or assumptions. You need to even sometimes observe them cause the bug to express itself so you can understand maybe something that you took for granted that they do slightly differently. That if you, as I said, operated on your implicit knowledge, if you operate on your assumptions based on a simplistic description of the bug, of the behavior, you might have missed. You wouldn't be able to reproduce it. Otherwise, as you're working through this, if it's something you found yourself I think understanding and controlling for variables is even more critical, that you know, you're just going through your integration testing, you see a bug pop up out of that. You want to stop what you're doing, back up, change only one variable at a time, either regress back to the last known point where the code unit you're working on, the feature you're working on functioned correctly, functioned as expected, and then move forward one variable at a time until it breaks. You need to isolate the problem, you know, this is classic troubleshooting. Same thing if you're supporting your mom's or your grandmom's PC and they're reporting a problem. What do you do? You isolate all the things that couldn't be a cause until all that's left are the actual causes. And the variables in this case, if you're dealing with any software of non-trivial complexity, is going to include the operating system and just the running environment, that you have to be sensitive to that as well, that it may be beyond your code. Or it may be a combination of something in your code with the environment in which it runs that expresses the bug. So again, you know, that informs my point about controlling for variables, and if somebody else reported it, observing them closely. Observe, "oh, hey, they're running on XP home edition instead of XP pro" or they're running on a slightly older kernel on linux, or a different version of OS X or, you know, they have their dynamic library path set differently than you do. You'd be surprised: that can be a contributive par of the equation, and if you're not paying enough attention, if you're not observant enough you're going to miss that, and you're going to cause yourself a lot of heartburn and stress as a consequence.

Also, you want to run many repetitions to expose bugs that may be a consequence of load or timing issues. Trying to reproduce a bug once writing a script or an automated unit test that performs some action once, if that doesn't do it, and you're absolutely certain that you're reproducing the inputs as described by the original reporter, and you're tempted to go for that "works for me" value on the drop-down, try putting in a loop to run it a thousand times. Try setting up multiple threads to execute it concurrently to see if you can cause it to pop that way. Like I said, you have to really approach debugging, I should have said, with humility too. I think that's part of the mentality as well, that your assumptions are invariably going to be wrong. I think it's something that debugging has in common with performance tuning: that you really have to observe, observe, observe, and let that empirical data lead you to where the underlying causes are, rather than letting your assumptions lead you astray.

It's worth taking a little bit of time and talking about technique as well. Again, I don't intend for the Inner Chapters to be prescriptive, to lay out any particular methodology. I really just want to talk about some of the concerns and issues at play, things to think about to improve your overall game. And here it's worth stating, I think, that even though interactive debuggers are becoming more prevalent, more common, more popular, that printf-style or log-style debugging has a lot of validity. For instance, if you have a very large system like a web application that has a container that runs on top of an interpreter that then runs on top of your OS, it may have a back-end dependency on some sort of data store or database, your client may be a very different critter, like a web browser in that instance, that setting up an interactive debugger is not necessarily worth the overhead, that you have a very simple problem that you just want to get into and see, am I making it to a certain point? You want to test the validity of assumptions, essentially you want to do assertion-style debugging, then calling your printf to emit a message to your standard out, or sending a message into your logging subsystem is going to be good enough. It's going to be faster and quicker to do that, recompile, redeploy than it is to figure out the vagaries of setting up a remote debugging section.

If you leave, in particular, some of that logging in there, some of that kind of checkpoint-style "hey, I got here", you know, "here's the argument I was given, here's the state of the system", you leave that in either running at an information level of logging or a debug level of logging, you can actually use that to work on problems pro-actively, that you're actually getting snapshots over time of your running systems. That may have value beyond debugging a particular bug or defect or issue.

I think the interactive debuggers, on the flip side, are great for small programs with limited dependencies. And I think that if you get into a much more difficult problem to debug in a large system that the costs, the investment actually to get a remote debugging session set up can be worth it, that the limitations inherent in assertion-style, either printf or logging debugging may be such that you're just spending too much cycle time recompiling or redeploying, and not seeing a lot of progress. You want to set up some watches; you want to set up some break points; you want to set up some conditional break points; you know, the problem is just a little more intractable. I think that, you know, even as difficult as it may be to do remote debugging, that it may be warranted. Now that's a discretion call for you based on your experience, based on the problem at hand, and the technologies that you're working with.

I think that something that's critical on the backside of understanding a bug, going through all the debugging techniques, getting to that underlying cause, and then effecting a fix is that you should capture the bug as a unit as I said at the outset, ideally as part of your efforts to duplicate the bug. And if I didn't mention that earlier I'm going to stress that right now: that duplicating any reported bug or defect is a critical first step. And I'm going to state that very assertively and almost prescriptively, say that you know, if you can't duplicated it on demand, then anything else you do is absolutely worthless. But anyway, I think that a good confluence of techniques here is to leverage automated unit testing to encapsulate the work required to duplicate the bug. Once you understand what you need to do to stress it, make it pop, you write an automated unit test to do that for you, and then you can stress it on demand. And then on the backside of that as well, again, like my point about leaving some of your logging statements in when you're doing assertion-style debugging, is that ongoing, that automated unit test then becomes part of your integration suite or your regression suite, and it ensures that once you fix that bug, that any other fixes that you put into the system, any feature enhancement or changes that you make, if they cause a regression bug on that particular fix, it's going to pop right away, before your customer sees it, before your end user sees it. And that's critical. It's very, very valuable. So I think it's worth, in that case, absolutely a test of automating the unit test.

The last recommendation and piece of advice I'm going to give you -- and this is very similar in character to what I had to say about leaving in assertion-style debugging through a logging subsystem and automated unit tests, being used for duplication of stressing bugs, is when you have fixed the bug, your fix, the code you worked on to resolve that bug, should reflect what you learned in the course of fixing that bug. There should be enhanced or new guards in the code for these problem conditions. So if you're familiar with design-by-constraint, hopefully you should have identified some new constraints to add or improve the conditionals in the existing constraints that you're catching your failure early, you're catching it often, you're doing it in a very verbose fashion. Your logging I think should also be enhanced to reveal similar problems if they occur elsewhere, so thinking about your informational logging and the checkpoints that you're emitting for operational purposes, that there may be new conditions, there may be parts of the state graph of the overall system that you're adding some visibility into because you now have some understanding that perhaps some of the bugs come out of your state transitions so it may not be localised to one particular piece of code or one particular component of a running system, but may be more systemic based on what your actual state graph looks like.

I think absolutely, the last thing I'm going to leave you with, is that your comments should capture any implicit knowledge that may have been confusing the issue, or that just may not have been captured, made explicit at any point. I think that a large variety of bugs pop out, not because somebody explicitly made a mistake in their coding or chose the wrong algorithm or data structure or something like that, but that at the time where they were originally implementing some code they were so close to it that there was some assumption, some piece of implicit knowledge that they were taking for granted, that was never documented into the code in the form of the running code itself, the names of variables, methods, what have you, or code comments. And then somebody else coming along doing maintenance work on that or somebody else integrating with that code who did not have that implicit knowledge, violated some assumption, or violated some assumed constraint and caused the defect that you saw. So when you sort that out and come to an understand that that was actually the source of the problem, then it behooves you to go ahead and document that for the next people in.

Feburary 13, 2008 Release

07:13

I've talked about errors and failure conditions, and I've talked about testing. And it occurs to me, especially in light the recent feedback from Jed on using aspect oriented programming to help with logging and debugging, that to date I haven't actually talked about the practice of debugging.

So what is debugging? Well, just deconstructing the name: it's the process of removing bugs from the program. Let's back-up a little bit first then. What's a “bug”? It's quite simply a programming error in your code. It might also be realistically an error in someone else's code (a library or component you have included because of its utility, but you yourself did not author).

It also could be and edge case or a counter-example in the development of your code that you had not considered. And this is where some of the silly hackish terms like “misfeature” come up—it's not a bug in the sense of the program's not operating correctly, it's a problem because the program's just doing something you didn't anticipate, something you didn't plan for. So somebody gives what would otherwise seem like valid inputs to the program and it does something surprising, that often can be considered a form of a bug. Bad user input not being handled well is a very common bug and, unfortunately, also has some distressing security implications. We see SQL injection and cross-site scripting arise because of bad user input handling. So it's not it doesn't necessarily have to be something deep in the program, it could be something closer to the user, in the user interface.

It's not just that interface to the users, though, that can also commonly surface bugs, but often sometimes to the interface to other systems, like servers (database servers in particular). A bug can arise when the circumstance where a database server goes away is not handled well. So this gives you some feeling for bugs. Issues that during maintenance, coders are asked to resolve with software.

09:23

Bugs are usually multivariate—that is that they have multiple aspects to them, so they're not simple and obvious. Usually the bugs that do fall into that category of being—just having a single variable or single component really get flushed out during the testing and development work before software is released. I mean, think about it: the obvious stuff developers are going to think to stress that themselves and deal with that directly as part of their core development activity. Even automated unit testing only exercises, really, what the hacker anticipates.

Fully understanding a bug, then, requires asking a lot of questions to help pin-down all these variables involved with causing this particular problem to arise. We need a way to answer those questions. So, debugging, generally speaking, is a set of practices used to answer questions about a bug. I know I said it was actually removing a bug from a program, but once you understand and can characterize a bug fully, fixing the line of code or taking it out is a relatively straight-forward exercise. The real challenge is: understanding it, so that's why I've kind of shifted emphasis in characterizing debugging that way. And certainly the lion's share of the time you spend debugging is asking and answering—seeking to answer questions about some buggy behavior.

You can also find yourself debugging a bug that ultimately doesn't get fixed. Bugs vary in both frequency and severity. If a bug is not very bad, i.e., it doesn't cause a program crash, it's just an odd cosmetic quirk, or it happens very, very rarely. You know, the database server has to be out at the same time that some other unlikely set of actions takes place. Then, it may just not be worth fixing. Cost to deal with something that's just not that stressful to the end user, or so astronomically unlikely to happen that it's not a concern, may lead to that circumstance: that the bug is left in the program intentionally because there are other more severe or more frequent bugs that require attention. After all, we don't have infinite resources (typically) in software engineering efforts. That's one of the first complaints of software engineers is: if only we had more people. And then, usually, of course, the second complaint is: if only we had better people.

I like to think of the exercise of debugging as being very comparable to the scientific method. In fact, I highly encourage you to try to think about it the same way. I think that's probably one of the most fruitful and sane ways to think about that procedure of answering questions about why some strange thing is happening in a program. It starts out with observation, of course, that something out of the ordinary is happening in the program. Something you didn't anticipate. Something surprising. You form hypothesis about: how or why this is happening, and then the key part (and the part that I enjoy the most, definitely) is crafting an experiment in order to confirm or invalidate that hypothesis. So, figuring out what particular sets of inputs you wanna try to confirm this behavior.

Stressing the application in a particular way. Usually bugs are—or often, rather, not usually—often bugs only show under extreme load. You've got an unusual number of users accessing the system, or an unusually large file, or something like that. We refer to these cases, usually, as edge cases. And, sometimes it's a limit case as well as an edge case, so it's at the upper limits of what the program is capable of doing that you see strange things happening. And that goes back to my point about bugs can often just be a graceless or poor handling of hitting the limits of what an application is capable of doing.

Increasingly as we deal more and more with server applications, and web applications in particular, bugs can also often arise as a consequence of being deployed to a particular server. Usually some sort of configuration or configuration management issue. So that's also something you might have to experiment with. If you alter the deployment profile, if you alter the configuration, if you deploy it do a different sever to isolate whether it's something about the particular hardware you were originally deploying it to. Or, something that is more portable, something that goes closer to the code and goes along with it as it moves from server to server. That can be a valid experiment as well, one we that we increasingly have to consider with these newer classes of applications.

Ultimately, the experiment (as I said) should either prove the hypothesis correct or prove that your thinking—that there was something about it that was wrong, and ideally in that failure you learn just as much, and it suggests a new direction. And, if you ultimately do invalidate your hypothesis, you just iterate. You form a new hypothesis based on what you learned from that experiment, what you learned from the failure to confirm it, and you experiment again. And, so it goes until you've got a working hypothesis that holds up under repeated, consistent empirical evidence.

At the point where you do have a working theory you can make a plan as to what you want to do with that. Whether the bug is severe enough, that it's worth the cost to fix it. Usually when you have that working theory, you've done enough static analysis and poking around in the source code, and considering the problem more generally, that you should have at least kind of a ballpark figure of how much it's going to—how much effort it's going to take to fix the bug. So, it's a good point to make that sort of decision about go/no go, of “holy crap, we're going to have to completely retool all the application code for a bug that only happens once a month” or something silly like that. You get the idea.

It's very important, as with the practice of experimentation for the scientific method, that you control the experiments. As I already mentioned, repeatability and consistent empirical observation. Ideally, you even want to start out—if you can—creating a baseline set of observations first. What does the application do ordinarily in the absence of the bug so that you can compare as you're changing—deliberately altering the inputs to an application to change it's behavior, compare it to some known starting point. In the process of changing the inputs and changing the environment and experimenting with the application, you want to be very careful to only change one thing, one variable at a time. If you change more than on variable, when you observe a change in the result in the output of the program, you have no idea which of the variables you changed is responsible for that difference. I cannot stress that enough. When running an experiment for the purpose of debugging, for any class of bug, only change one variable at a time.

Take good notes, a common practice I've recommended a lot when it comes to hacking. Write down. The act of just documenting forces you to slow your pace down, consider what you're doing. Very often, looking at it on the page can help you start to coalesce some patterns, suggest some additional iterations of the experiment to run.

Iterate your experiments. Run them repeatedly. You may see fluctuations in behavior. A lot of applications have a one-time hit on start-up when you're talking about performance, or they may have other conditions that only apply at certain points during the application's run time. And, if you run your experiment repeatedly over a longer duration, you're more likely to “smooth-out” any of these small differences based on different states of the program. Right before it starts, right before it ends, if it's doing some check-pointing, cleaning-up resources, stuff like that. If you run it once, you run the risk of having something that just coincidental queering your results and ultimately leading you to the wrong working theory. So, that's something else that's very critical. Much more so in dealing with performance issues, but I think that, depending on the bug, it may be applicable more generally. So, it's worth considering, especially if you have a long running application. Again, going back to multiuser, server applications or web applications, it's more likely that you want to try to find ways to “smooth-out” those sorts of hiccups and bumps that occur as a long running app runs.

18:02

There are a few tools I want to talk about (some tried and true, some a little bit newer) that are very useful in the practice of debugging. And, the first I generally refer to as logging (and again, this goes back to Jed getting me thinking about using aspects for the purposes of logging and debugging). This goes even further back than formal logging as such: writing to a log file, using syslog, or writing to an application console log, or an event log, or something like that.

It goes all the way back to, in the practice of some procedural languages—C most notably, of just using a print statement, or better yet, a printf statement to just send some message out to the standard out or standard error. “Printf” here refers to a particular C function for formatted output. (And, you shouldn't use printf anymore, you should use one of the newer, safer variations of printf.) But, what printf would take is: it would take a template string that had kind of embedded placeholders that would say “some text goes here”, “a number goes there” and give you some control over the ordering of that, the formatting, and the typing. So that, rather than doing a lot of string concatenation (which is prone to errors, especially if you have a null pointer), it gives you a simpler way to do something similar, and write more much more expressive messages for the purpose of debug logging, or even giving feedback to the user in a console application. The point is: you'll often hear older hackers referring to this class of debugging, this approach to debugging, as “printf debugging” or “print debugging”. It usually involves confirming that the code has reached a particular point via these output statements.

Originally, you can only really write to, as I said, standard out or standard error, so logging refers to some of the advancements—to building generalized facilities to send that output to a file of some sort and then management it based on date, time, or size. But, even in the days where all we had—was just the choice of standard out or standard error, the idea of verbosity was often as added as a form of control. So, you could turn debug on or even turn it up further to make it more verbose to see more information about the running state of the program (just on the console as it ran). And, if you weren't having problems, you could turn the debug down or turn the verbosity down to allow the application to run faster. (It's not spending as much time doing IO in-line as it's running.) Systems like log4j, these newer, more log-oriented systems, are quite sophisticated. They bring a lot more flexibility.

As I said, they allow you to redirect your log output to a variety of facilities. You can continue to output it to the console, you can apply more advanced formatting to it, you can enrich your log output with some context around how—what thread is currently running, the statement that's getting logged-out, to what class is it associated. You can send it to syslog, as I said, a great operating systems-level logging facility; you can send it out through email; you can send it out through SMS. You can do all kinds of things in terms of categorizing more verbose messages, less verbose messages. Great, great tool. And, there are comparable tools available in most programming languages.

If you use a logging toolkit also, one of the things the authors of these toolkits spend a fair amount of time doing is making sure they are as efficient as possible. So that, yeah, it's not going to be a zero-cost, but they try to get as close to a zero-cost so you can leave your logging statements in permanently and just exercise your verbosity via external control mechanisms. So, you can turn it up or down like that traditional console logging that I mentioned. This being the case, this leads me to my objection to Jed's suggestion of using aspects for logging.

With a declarative logging system like log4j, or comparable logging toolkit, the goal is you want to leave these things in there permanently and control their output by turning up or down the priority on the logging toolkit. And, as such, you really want to think about: if I'm doing debug, what is another programmer really going to want to see just in the normal course of operation. And, it's not everything because all of that information is useless. Like I said, you can perform some static analysis. There are other tools, that I'll get to in a minute, that do a far better job if you want to know everything about everything, but explore it in a more interrogative way rather than just spewing everything out to a console or spewing everything out to a log. And, there are log levels like info, warn, error, fatal, severe, and so forth, that are designed for permanent usage. To say: “as things are running you log to error to let an operator know that some condition that needs attention has occurred.” You log a warning to say: “something might be going on that's not fatal, not problematic, but it may indicate that an error is about to happen.” And, fatal can be a: “help me, I'm about to die, help me right now” kind of thing.

When you say “logging”, and maybe this is just a bias of how I've used logging, the log toolkits that I've used, and maybe Jed made—to be fair, to give the benefit of the doubt—meant something else when he was referring to using aspects for logging and debugging. But, I still think that discretion is key. Especially, when you talk about and you frame the practice of debugging like I have, in terms of: asking questions and trying to find answers. Signal to noise ratio is important here. Having all the information in the running program available to you doesn't really help you answer the question, does it? You want to have a little bit of a way to filter and dial-in, and just get to the things of interest.

Now, Jed's idea in using an aspect may address one of the criticisms with logging, is that: with it being in your main code you actually have to alter and recompile your code to uncover something new to the log. An aspect, while in most systems I'm familiar with, there is going to be some compilation involved. The application of advice may be less cumbersome from the outside than recompiling the entire app. Especially, for an app beyond a certain size or a certain amount of complexity. So, that may be part of what he was thinking in suggesting it. But, I would—as I said—I would temper that with: logging is a permanent part of the system's behavior, and operators, system administrators, applications administrators, very often like to see regular information about the proceeding state of the system. So, it has other advantage outside of debugging that just happens to overlap with debugging. I think there are some better tools—and I'll get into the next one—next: interactive debuggers that can address some of the gaps in logging style or print style debugging.

25:07

It took me a while, personally to be convinced that interactive debuggers were worth the time to set up and use correctly. But I have long since been convinced that this is a must-have tool and skill in every modern hacker's toolkit. Usually, an interactive debugger works by running application code inside of a custom execution environment. As such then, it's able to inspect all of the data of the running application as it runs: every variable you want to know what its value is, snap, it's like that. It's right at your fingertips, it's referred to usually as the instrumentation, usually an instrumented environment to use a more accurate, more specific terminology. But usually what you'd do is, you'd run gdb followed by a typical command line tool, and it will run ls or Netcat or something like that in a debug environment, allow you to actually even stop execution, stop the world, so that you can walk through the stack, you can look at values on the heap — these are just two different areas of memory allocation — and see what's what. "For the variable foo, what is this value at this exact moment in time?" They also allow you to step through the application code one line at a time, usually allowing you to associate back to the original sources (And I'll get to how that's done in a minute) so that you can actually kind of interactively interpret the program step-by-step. The better, more modern advanced debuggers will even let you step into and out of functions and procedures, step over them as well if you can take something as written — you don't have to go all the way down into its guts — specially if it's some third-party code that you don't have to sort out; you just want to skip over it and look at its output captured into a local variable, and not concern yourself so much about the gyrations that that third-party code goes through.

That ability to associate the running state of the program back to your static sources does come at a cost. It does require typically that you compile your code with debug symbols left in it. And the consequence of this is usually twofold: first, it increases the overall size of the final executable after compilation is done. And if secrecy — that is, protecting your code from being decompiled and somebody else examining what that code looks like — is a concern, debug symbols can make decompilation easier. It's not a silver bullet; leaving them in there doesn't mean that it's instantly available, that people can recover your source instantly, but it does make it easier. So if you're concerned about that at all, you're not doing open-source software or you're in a very draconian environment, you may get some pressure not to share binaries outside of your organization that have debug symbols in them. I'm seeing that concern, to be honest, less and less as I go on in years professionally, but it is there. I have encountered it. See, the real problem is, the compiler strips out those human-readable symbols. It doesn't need them. It just uses its own, internal, usually much shorter, pithier symbols to refer to function points and variables and so forth. So the debug symbols add back into human readability and associations, as I said, back to the static sources, for the purposes of running an interactive debugger.

There's a new class of debugging tools, permanent or semi-permanent instrumentation that seem to be in vogue, I'm seeing a lot more commonly. Some languages and environments just kind of leave debug hooks in place so that really, what you can do is... First of all, it makes building interactive debuggers simpler if the language execution environment or the operating system has instrumentation semi-hardwired into it or fully hardwired into it. You don't have to provide that instrumentation on your own. You can kind of meet the resource developer halfway. It also means that very often a debugger can kind of attach to a running program outside of having to run it under its direct control. And a great example of this is actually the Java Virtual Machine. In recent versions, since about version 3 or so, the instrumentation was more or less permanent and there were wire-level protocols and memory-level protocols for a Java debugger, JDB, and third-party implementers to that same API to actually hook into any JVM that's out there, interrogate its heap and its stack and so forth, and do all the things that I described that an interactive debugger can do.

One of the advantages of this is that in the old way of running an interactive debugger where you have a custom execution environment, very often that can alter the behavior of the program itself just by the act of instrumenting it. In these permanent or semi-permanent instrumented environments, well, nothing changes. It's always there. The cost is kept to a minimum, the resource developer is able, as I said, to better tend towards performance concerns, the costs and overhead of leaving that instrumentation there, but it's less likely to alter the program in a telling way when you're debugging because, by definition, the instrumentation is always there.

Another great example of this, and I'm not as familiar with this, but I think that this is a fair characterization, is DTrace under linux, and it's also been recently ported elsewhere to the BSD kernels, most notably to OS X. In DTrace's OS-level permanent or semi-permanent instrumentation and you're able to attach a tool, and DTrace actually refers to the tool, to do semi-interactive querying of running applications. And DTrace is kind of cool because beyond some of the simple set break points — condition break points, pause, step into, step over, step next — you can actually build some macros to do a little bit more than that. So you've got small programs that run on top of the instrumentation alongside of your running app that allow you to do some surprisingly cool and sophisticated things. Just Google for DTrace and you'll find a plethora of articles gushing about what a great tool it is. And I have to agree from all the descriptions I've read, DTrace sounds like a very nice approach and a very forward-facing approach to make debugging simpler and a lot more powerful.

The last thought I'm going to leave you with is just to try to emphasize why I think debugging, understanding that the practice is so important. Really, what I've given you here in 20 minutes or so is just the briefest glimpse, if you're unfamiliar, into all that really is debugging. There's so much more for you to explore, and it's really important that you do, that you kind of "get religion" as it were, with formal tools and practices of debugging. The better you understand them, the better equipped you are to maintain your software. And not all of the tools that I talked about, not all techniques — which is why there are several in a debugger's toolkit ("debugger" being the person who debugs, not the interactive software debuggers I was talking about earlier!) — not all of them may be applicable to every class of bug that you're dealing with. So if you have a diverse kit of tools to apply in trying to figure out why some bug is occurring, that increases your odds, obviously, for resolving that bug and improving the quality of your software. You should be able to make appropriate choices. This is something you should read up on, you should experiment with, go out there with the languages and tools that you're familiar with, find out what's available, learn them better, get to know them a lot better. After all, the bulk of the time software is active and viable in use is spent in maintenance — not in new development, not in testing and QA — in maintenance. And debugging is the most common activity that you're going to undertake during maintenance. So just from sheer utilitarian logic, it's worth your while to look into the subject much, much further than what I've presented here.

Personal tools