Wandering Pointers
Dangling pointers kill - but they are everywhere. Originally in Embedded Systems
Programming, November, 1999.
Want to
increase your team's productivity?
Reduce Bugs? Meet deadlines? Take Jack's one day Better Firmware Faster
seminar. You’ll learn how to estimate a schedule
accurately, thwart schedule-killing bugs, manage reuse, build
predictable real-time code, better ways to deal with uniquely
embedded problems like reentrancy, heaps, stacks and hardware drivers,
and much, much more. Jack will be presenting this
seminar in Chicago (April 23, 2008),
Denver (April 25) and
London, UK (May 19). Want
to be your company’s embedded guru? Join us! More info
here.
For hints, tricks and ideas about better ways to build embedded systems,
subscribe to The Embedded
Muse, a free biweekly e- newsletter. No advertising,
just down to earth embedded talk.
Click here to subscribe.
It’s rather interesting – and to me distressing - to
look over the history of embedded debug tools and realize that most of the
features we use today were introduced by Intel in their very first emulator back
in the mid-70s. After a quarter century our products have rocketed from 2 MHz
eight bit CPUs to 10 gigawowiehertz 32 bit computing monsters, yet sometimes our
tools seem trapped in a time warp. We’re using breakpoints, real time trace,
emulation RAM, and not a lot more.
During my time in the in-circuit emulator business,
in the pursuit of product differentiation we were constantly on the prowl for
solutions to common debugging problems. Our dream was the ultimate debugging
interface: two buttons, one labeled “find bug” and the other “fix bug”.
A lot of generally unsuccessful time spent looking for any sort of improvement
to the tools left me feeling that debugging is still an art, something
generations away from being codified into science.
Every programming text urges us to use the very best
defensive strategies in our code, yet, whether due to panic-inducing schedules
or just poor self management skills few developers ever think beyond today’s
coding to tomorrow’s inescapable debugging.
In 15 years as a tool vendor I metaphorically looked
over the shoulders of thousands of embedded developers and saw virtually all
just cranking code as fast as possible, with nary a thought about just how
they’d make their creations actually work. Been there, done that myself.
We’re an optimistic bunch, usually convinced that this
time things will require minimal debugging. The ugly truth is that debugging
eats something like 50% of project development time. An uglier truth is that
reactionary debugging – responding to bugs – guarantees poor quality
firmware. Too many bugs won’t exhibit problems immediately; many lie latent
for years till the code is unusually stressed.
Wandering Pointers
C compilers should come with a warning label: “Danger!
Use of this product may lead to memory leaks, corrupt data, and erratic
crashes.” C brings joys and perils; we must deal with both.
It’s intriguing how two major languages from the
late 60s went in quite different directions. ADA proposed the idea of creating
correct code from the outset; an old saw says that if an ADA program compiles,
it will work. A very picky compiler largely keeps developers from writing code
that fails due to stupid problems. C, on the other hand, fits the 60s
“anything goes” image: free love, free music, and the freedom to write
incomprehensible programs that fail in mysterious ways.
Anyone who has worked in C suffers from pointer-wielding
scars. On one hand pointers helped make C the embedded language of choice. They
give us much of the power of assembly, while packaging the capabilities inside a
HLL. On the other hand pointers almost always lead to travail. Novices misuse
them, mixing up referencing and dereferencing, adding not enough or too many
asterisk-prefixes. Professionals bypass such simple errors, but still create
code that overruns buffers or writes over the code itself.
Though pointers create agony in pretty much every
system ever produced, most of us react to each problem with surprise. Given that
pointer issues are so common, why don’t we build our systems from the outset
in a way to trap these inevitable problems? Why don’t we buy tools that track
and flag these problems automatically?
A couple of companies do sell pointer and memory
checking tools, though are mostly aimed at the desktop applications market.
Geodesic, which sells their Great Circle memory analyzer (www.geodesic.com),
claims that 99% of all applications ship with significant memory and pointer
problems. How this number translates into the embedded market no one knows. Even
if off by an order of magnitude, it’s still pretty scary.
Parasoft (www.parasoft.com)
and Nu-Mega (www.numega.com) both have
tools aimed at the desktop and Windows CE market, about as close to the embedded
industry as any commercial product I’ve found. Nu-Mega’s BoundsChecker has
surely been an industry staple for a very long time.
It’s a shame all of this great technology hasn’t
been ported to the mainstream embedded world. That task has been left to
a few charitable souls who wrote decent tools which they put into the public
domain. Walter Bright, author of a popular C compiler made his mem.c routines
available to us (available at www.snippets.org).
This package detects most common problems associated with memory
allocations,
Mem.c tracks obvious problems like frees without
corresponding mallocs. More interestingly, it picks up many sorts of pointer
problems by allocating a bit of storage before and after each malloc’ed block,
and then filling these extra areas with signatures. After a free, mem.c checks
to insure the signature is intact; if not, a pointer over- or under-ran the
buffer.
Jeff Dunlop’s memory checking package, also at www.snippets.org,
offers more checks, including some more appropriate for embedded systems that
may not use the malloc() function call. Malloc(), of course, often leads to
memory fragmentation in systems that run for months or years. A desktop
application might tolerate fragmentation, since the user probably exits the
program from time to time… and ultimately (at least in the Windows
environment) expects a certain number of system crashes. Though Dunlop’s
package includes tests for malloc’d blocks, it also supports arrays and
statics.
Both Bright’s and Dunlop’s code replaces standard
library functions, so must be linked into your code during development. The
moral to this is that if you link the code in from the very beginning of product
development, errors will pop up as your code does unreasonable things. Don’t
link it in, and odd crashes will leave you puzzled. Even if you suspect a memory
problem you probably won’t relink to include the diagnostic routines,
believing (as we all do) that a few more minutes work will turn up the source of
the problem. This is sort of like avoiding make files, since creating the make
might eat up 20 or 30 minutes and we just know we’ll only need to build the code a few times.
One commercial product aimed squarely at embedded memory
troubles is CodeTest from Applied Microsystems (www.amc.com).
It’s an external hardware tool that relies on instrumented code to track what
gets allocated and freed when.
There’s a performance penalty, of course,
associated with using any of these packages. If your code must run so fast that
no speed degradation is possible, then these tools are not for you. Remember the
rule of thumb: a 90% loaded system doubles development time; at 95% loading the
schedule is three times longer than for a lightly loaded system. When
performance issues are so severe reasonable tools fail, then it’s time to
reconsider the design.
Wandering Code
Embedded code written in any language seems determined to
exit the required program flow and miraculously start running from data space or
some other address range a very long way from code store. Sometimes keeping the
code executing from ROM addresses feels like herding a flock of sheep, each of
whom is determined to head off in its own direction.
In assembly a simple typo can lead to a jump to a
data item; C, with support for function pointers, means state machines not
perfectly coded might execute all over the CPU’s address space. Hardware
issues – like interrupt service routines with improperly initialized vectors
and controllers – also lead to sudden and bizarre changes in program context.
Over the course of a few years I checked a couple of dozen
embedded systems sent into my lab. The logic analyzer showed writes to ROM
(surely an exercise in futility and a symptom of a bug) in more than half of the
products.
Though there’s no sharp distinction between
wandering code and wandering pointers (as both often come from the same sorts of
problems), diagnosing the problems requires different strategies and tools.
Quite a few companies sell products designed to find
wandering code, or that can easily be adapted to this use. Some emulators, for
instance, let you set up rules for the CPU’s address space: a region might be
enabled as execute-only, another for data read-writes but no executions, and a
third tagged as no accesses allowed. When the code violates a rule the emulator
stops, immediately signaling a serious bug. If your emulator includes this sort
of feature, use it!
One of the most frustrating parts of being a tool
vendor is that most developers use 10% of a tool’s capability. We see
engineers fighting difficult problems for hours, when a simple built-in feature
might turn up the problem in seconds. I found that less than 1% of people I’ve
worked with use these execution monitors, yet probably 100% run into crashes
stemming from code flaws that the tools would pick up instantly.
Developers fall into four camps when using an
execution monitoring device: the first bunch don’t have the tool. Another
group has one but never uses it, perhaps because they have simply not learned
its fundamentals. To have unused debugging power seems a great pity to me. A
third segment sets up and arms the monitoring tool only when it’s obvious the
code indeed wanders off somewhere, somehow.
The fourth, and sadly tiny, group builds a
configuration file loaded by their ICE or debugger on every startup, that
profiles what memory is where. These, in my mind, are the professional
developers, the ones who prepare for disaster long before it inevitably strikes.
Just like with make files, building configuration files takes tens of minutes so
is too often neglected.
If your debugger or ICE doesn’t come with this sort
of feature, then adapt something else! A simple trick is to monitor the address
bus with a logic analyzer programmed to look for illegal memory references. Set
it to trigger on accesses to unused memory (most embedded systems use far less
than the entire CPU address space; any access to an unused area indicates
something is terribly wrong), or data-area executes, etc.
I’ve had great success doing this with HP’s MSO,
a sort of combined logic analyzer and scope. Since the scope half of the
instrument gets the bulk of the use, I’ll leave the analyzer set up as a poor
man’s monitor.
If a logic analyzer is too rich for your budget,
check out the $1295 PodAlyzer (http://www.associatedpro.com/aps/pod-8020.htm),
a device the size of a roll of stamps that connects to a PC’s serial port.
With 18 channels it’s ideal for monitoring 8 and 16 bit systems.
Part of the downside of using any logic analyzer is
that it takes too long to connect all of those annoying probes to a typical
CPU’s whisker thin SMT leads. After using these devices for more than a
generation I’ve gotten neither faster at connecting the leads, nor more
accurate at getting them right, than when I first started. The best solution is
to build a logic analyzer connector onto the prototype target system. Without
it, you’ll resist using this very effective software-diagnosis tool. Add the
connector and you’ll use the analyzer constantly and effectively.
Note that some emulator vendors, frustrated with the
difficulty of connecting to SMT processors, now suggest users install a special
emulator connector on target boards (see www.hitex.com
for one company’s clever approach). Even those of us using nothing more than
an analyzer should emulate this example.
Some ICEs include code coverage, a feature that tells
you if every line of code executed. One study indicated that fully 50% of the
code in embedded systems is never tested (after all, error handling, deep IF
conditions, complex switch statements all lead to special cases most QA programs
can’t manage). Code coverage instruments insure that test cases do check each
possible condition.
Hardware-only code coverage tools watch the address
bus and log each instruction fetch. These will generally log executed addresses
with no corresponding code, which typically indicates wandering code.
Beyond the hardware approaches, write the application
defensively. For instance, fill your unused interrupt vectors will pointers to a
debug routine. Configure the tools to set a breakpoint on that routine
automatically every time you load the debugger. A bad vector will show up
immediately, not after the processor executes a million instructions from your
data area.
Seed unused memory with illegal instructions. Few
apps use every last byte of RAM and ROM; instead of leaving these areas set to
random values, take advantage of the one-byte call, illegal instruction, or
breakpoint instruction that almost every processor supports. On a Z80 it’s RST
7; a 68000 has an illegal instruction trap; the 683xx includes a specific
breakpoint instruction. If the code wanders into one of these unused regions it
will take the exception. You’ve wisely (hopefully) set a breakpoint on the
exception handler, so will find the problem immediately.
If some of the addresses are not tied to a memory
devices, pull the bus to an illegal instruction with resistors – at least for
the prototype – for the same reasons.
Given that you’ve detected that code wandering
merrily outside of the ROM range, what then? If you’re using an ICE, logic
analyzer, or a trace-enhanced BDM, use real time trace, triggering it to stop
collecting when the exception handler starts. Look back a few instructions in
the buffer to find the problem. A more limited tool like a ROM monitor still can
yield significant clues by if you examine the call stack.
Conclusion
Solving problems is a high-visibility process; preventing
problems is low-visibility. This is illustrated by an old parable:
In ancient China there was a family of healers, one
of whom was known throughout the land and employed as a physician to a great
lord. The physician was asked which of his family was the most skillful healer.
He replied, "I tend to the sick and dying with drastic and dramatic
treatments, and on occasion someone is cured and my name gets out among the
lords."
"My elder brother cures sickness when it just begins
to take root, and his skills are known among the local peasants and
neighbors."
"My eldest brother is able to sense the spirit
of sickness and eradicate it before it takes form. His name is unknown outside
our home."
Great developers recognize that their code will be
flawed, so instrument their code, and create toolchains designed to sniff out
problems before a symptom even exists.
Back to home page.
The Ganssle Group
PO Box 38346, Baltimore, MD
21231
Tel: 410-504-6660, Fax: 647-439-1454
Email info@ganssle.com
© 2008 The Ganssle Group
|