Sunday 18 November 2012

Exploratory Refactoring - Understanding Legacy Code

Remember how Neo can see people and places and events behind the code? Well. Not all of us mortals can see the shape of things at first glance of code. Actually, "He's the One".

Martin Fowler suggests refactoring code when trying to understand it. Also Michael Feathers describes lots of techniques to work with legacy code. This post is about step zero: understanding the legacy code.

Exploratory Refactoring is the act of applying a series of refactorings to legacy code merely to understand what it does. That's it. There is no magic here. After you understood the inner workings of the code in question, you typically throw away all of that refactoring and start to really and seriously work on the code. With end-to-end tests, unittests and the whole shebang. But I'm getting ahead of myself.
I typically start off with simple refactorings, like Extract MethodDecomposing Conditional, Rename Method, Rename Variable, Introduce Explaining Variable. This phase is like when an archeologist hand-cleans stuff during excavation. She might already have ideas of what each part is, was, but that's not the point. In this phase she just wants to make the parts as apparent as possible.
In our case however, you don't have to be very careful - as this is a throw-away refactoring - in extreme cases you could just pseudo refactor the code. However the closer you get to real-life refactoring, the more you can learn from it. You will get to see unexpected inter-dependencies, which  can influence your real refactoring later on.
Once you have done that, you can start to fit the pieces together and you can continue to dig deeper - or go higher in terms of abstraction  - , and use more complex refactorings which help you to unveil the intrinsic structure of the code, like Consolidate Duplicate Conditional Fragments, Remove Control Flag, Split Loop, Reverse Conditional. It could be that after quite a few refactoring, you still don't see structure or intention: Worry not! You will see it soon. Just keep on factoring each small bit of information back to the code.

One last bit of advice:

Don't make the typical mistake of dissing and cussing the author(or the poet we like to call them) of the legacy code. Not because it's simply bad manners, and not because we all have legacy skeletons in our careers  closets, but more importantly, because that way you distance yourself from the poet of that code, making it that much harder to see what he meant to say, when he for example named a variable $last_element
(Whilst if you try to indentify with him, you might realize, he really meant $previous_element )

Read full post...