Reality vs. best practices (and getting from one to the other)

If you’ve spent any time studying best practices in software engineering, it can be jarring to discover how far the reality of software production often diverges from the ideal.  It isn’t rare to see code in production — stuff that’s critical to a company’s business — that isn’t even under version control, much less documented with bug-fixes tracked to requirements, testing infrastructure in place, etc.

Personally, I like doing reverse-engineering (restoration of lost knowledge) — taking a bundle of mysterious old code and figuring out exactly how it works and what it’s doing, so that it can be upgraded or replaced.  I’ve done it for more than one company.  However (at the risk of cutting down on the need for this sort of work), I’ll tell you that it’s cheaper to fix “free-fall code” while you still have the engineers who designed it on hand, before a reverse-engineering effort becomes necessary.  It’s a little like an old building with electrical wiring that’s not up to code:  replacing all the wiring seems expensive for something that yields no visible results, yet it’s cheaper than leaving the old wiring in place to catch fire and burn the building down.

How does free-fall code get all the way to production?  In my experience, there are a couple of main causes:  First, a lot of code bases are still in use that were actually started before many best practice ideas became commonplace, and overhauling the process just never made to the company’s priority list.  Second, small start-ups often have no choice but to produce code that way.  Typically, you have a critical deadline and just one engineer (who is required to make a superhuman coding effort to get it done).  It’s release or perish, so there’s no time for frills like documentation, precise requirements (which you don’t need anyway if you have only one engineer), or tests.

One company I worked for had recently grown from start-up size to “real company” size, and one of the higher-ups decided that we should try to keep acting like a start-up because “start-ups produce code so much more efficiently.”  That is wrong, wrong, wrong.  Start-up mode means sacrificing a long-term code-maintainability strategy for the sake of the shortest-possible-term results.  And for every start-up that makes it, there are plenty that don’t, so they’re not that amazingly efficient if you look at the big picture: not only do companies have to pay more later to get their missing best practices on track, but also the failed start-ups represent an inefficient waste of coding that balances out the amazing results of those few that succeed.

So the moral is that once your start-up has passed that first hurdle — and has a little cushion of money to spend — the critical task is to move the engineering department from start-up mode to something scalable, efficiently and effectively.

That’s the critical task for the internal operations manager, that is.  The overall #1 priority for your start-up (once you have a little cushion of money) is to make sure that your business strategy makes sense and is on track.  Ideally, your start-up started with a business visionary who knows what to sell and how to sell it, a gifted manager who can run internal operations and grow the company wisely to meet the strategy, and an excellent engineer who can make it so — covering all of the engineering bases until the company is in a position to grow.  I know this because I’ve worked for a number of start-ups, and I’ve kept my eyes open the whole time.  It’s volatile work (as employment goes) but on the plus side there’s never a dull moment. 😉

It’s easy to underestimate the importance of having a talented manager running internal operations — especially in a small company where others are producing visible results like writing an application or selling a contract.  But teamwork and morale are critical when growing a small company.  You can’t afford to pay people to work at less than full capacity, so everybody has to be on the same page.  As soon as you have more than one engineer, you need a skilled manager who will make sure that everyone knows (and is in agreement on) what you’re implementing and on who’s doing what, with clear requirements and objectives.

(To give you an idea of how important clear requirements are to the efficient production of code, note that in Switzerland there’s now a conference covering nothing but requirements engineering.  At least that’s what I think it’s about, given my current level of German comprehension…)

In the first stage of growing your company from “start-up” mode to “real company” mode, you’ll want to hire a QA engineer (in addition to hiring more engineers for R&D).  The QA engineer should be a bit of a jack-of-all-trades (kind of like your initial development engineer), who can write the requirements (based on verbal discussions with the business or sales personnel), get business/commercial and R&D to agree on the requirements, write a test plan, install bug-tracking software, set up version control and a nightly build (unless someone on the development team is doing that), set up a unit testing framework, and get everyone’s agreement on a set of coding and documentation guidelines.  In the early stages, manual testing can be outsourced as long as you have a clear test plan.  The internal operations manager should work closely with the QA engineer to decide on priorities and to ensure that R&D and QA cooperate with each other.  This will get your initial growth off on the right foot.

Now, what if you’re moved to a new project and you discover that your company is already dangerously dependent on a block of free-fall software?  Or what if your company buys a code block that has no tests or documentation, and it’s thrown in your lap?

It’s a little like the situation above (the company’s initial growth spurt): you catch the free-fall code by building a QA framework around it.  First, put it under version control, and second, write a clear document explaining exactly what you need to do in order to build and run the code, including all the tools and other software needed, specifying the hardware and software version numbers.  The document should be complete enough that an engineer who has never seen the code before can take it out of version control and build and run it on virgin hardware.  If you can’t do that, then you are missing critical (burn-down-the-building level) information.  If you still have access to the engineers that wrote the code, it’s a lot cheaper to get their input on this document than it is to hire specialists to figure it out, especially if the code is designed to use proprietary and/or now-obsolete hardware or software.

Once the code is under version control and you’re sure it can be rebuilt, the emergency stage is over.  The next (hopefully more relaxed) stage is to determine your objectives for the software, in terms of bug-fixes, performance, and upgradability.

If the code is working with no major problems, it may just need a bit of documentation to explain what it does and how it works.  If the code is not documented at all, that may be a sign that the engineer who wrote it is not particularly skilled at documentation.  (This is not a failing in engineering terms — often excellent engineers aren’t very good at explaining how code works, even to other engineers — it’s a bit of a separate skill, just as being able to manage engineers is a separate skill that is not directly related to engineering ability.)  If you have access to the original engineer, just assign another engineer to write the document (with input from the person who wrote the code) at a rate of about one or two days a week until it’s done.

If you need to fix or upgrade your free-fall code, then the next step is to analyze it, and get an engineering estimate on the effort/risk involved in fixing it vs. the effort/risk involved in retiring and replacing it.  The fixing estimate should include the effort to write a set of automated regression tests for the code.  Then management should use this estimate to decide which course to take.

If the only problems are a handful of bug-fixes and very specific performance bottlenecks, then often a surgical approach is better than refactoring, particularly when dealing complex, fragile old code.  As an engineer, when you look at old spaghetti code, you will certainly see things that are inefficient and ugly, and it is soooo tempting to just start ripping it out willy-nilly and rewriting it.  But the problem with that approach is that you will almost certainly lose information.  Nine times out of ten, you’re right that the ugly code is just due to inefficient design and/or hastily patching new features onto code that wasn’t designed to accommodate them.  But one time out of ten, the counter-intuitive code was written that way for a real (yet non-obvious) reason, and if you don’t have access to the original engineers, you can’t be certain exactly which parts those are.  Even if you’re an excellent engineer (which I assume you are), and even if you’ve set up a framework of automated tests before you begin (which you should!), undocumented code may potentially be doing things you’re not aware of — hence major changes bring the risk of breaking functionality that is not covered by your tests. If your objectives are punctual, then focus on fixing the parts that are broken, and for the rest, remember the old adage about what to do with stuff that ain’t broke. 😉

Sometimes, however, you will need to refactor the old code.  Keep in mind that code can be optimized for a number of different objectives (readability, speed, memory use, extensibility in various directions, etc.) and these optimizations are not necessarily compatible with one another.  It is a direct corollary that all code can be refactored in a number of different possible ways.  Therefore, any refactoring you do should be done with specific objectives in mind that you’ve discussed and prioritized with your manager.  For example, you may need to re-write a section to accommodate a requested new feature, or to make the code compatible with the latest version of some third-party software that it’s dependent on, or you may need to clean up an inefficient design for clarity if the code is particularly buggy and needs a lot of maintenance.

Of course, that brings us to the other possibility: retire and replace the free-fall code instead of trying to fix it.  Even if you’re planning to replace an application entirely, you should still document it carefully before retiring it, in order to avoid information loss.  Don’t imagine that you can skimp on the build instructions document (mentioned above) because — when the old code itself is its own spec — the development engineers may need to be able to experiment with the old code in order to determine exactly what it does.

Once you’ve done these things, you’re on your way to having a QA safety net to catch your free-fall code.  It’s an up-front investment in best practices that will pay off in terms of decreased risk and increased software development efficiency.

Advertisements

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: