Agile development in the ’80s

As with many innovations in the software business, if you look back in time, you can see glimpses of agile development long before it was “invented”. Even though we did not know of the term “agile development” at the time, the team I worked with on my first project for a software vendor, 25 years ago, was a more agile than many of the teams I see today.

I was the lead developer on an screen-based product that ran on the MVS, a popular IBM mainframe operating system.

Our team consisted of the following people:

  • Our boss, who was a subject matter expert and a well-known speaker in the industry, with lots of industry contacts.
  • An experienced MVS systems programmer, with some subject-matter expertise.
  • Another experienced systems programmer.
  • A young programmer of modest experience, but who was extremely sharp.
  • An experienced technical writer.
  • A secretary.

We were all new to the company, and we all sat together in our own room, separated from the rest of the development organization. We had a four-person cubicle, where the three programmers and the technical writer sat. The secretary had her own desk, and the boss had his own office, just off the main room.

Since our product was screen based, we started out by designing the screens, printing out the designs, and taping them to the glass wall of our conference room. We wrote the bare minimum of code required to display the screens, and mocked them up, using simple scripts to drive them with dummy data. We let a few potential customers, as well as several people from the greater development organization, try out the demo, adjusting things based on their feedback.

As we developed the production code to run each screen, providing real system data instead of the canned data from the mockup, we would replace the script, and mark that screen done in the conference room. Occasionally we would need to reorganize the hierarchy of screens based on the feedback we got. When that happened, we rearranged the pieces of paper taped to our conference room window. We continued this process until we had a shippable product.

Although we were not officially using agile development, what we were doing had a lot in common with them.

  • Our boss served as our product owner. He had many years subject-matter experience, and a lot of connections in the industry, so he had a good handle on what the industry wanted.
  • We made changes in small increments, demonstrated them to stakeholders, and changed our direction based on their feedback.
  • While we did not have an official backlog or burndown charts, the printed-out screens on the window of our conference room, clearly marked when they were done, served as both.
  • We met every day to discuss our direction and the status of things. These were longer than 15-minute standups; they frequently happened over a long lunch, and would include discussions of the design of new features. One could think of the design discussions as parking lot items.
  • We were all in the same room. The fact that the room was quiet, since we were isolated from the rest of the development organization, and that we were all just one cubicle wall away from each other, meant that one usually didn’t need to get up to ask a question of a co-worker; one could just ask, and receive an answer through the cubicle wall.

What were we lacking?

  • Mostly, the fact that we didn’t use formal sprints. What we were doing was probably more akin to Kanban: A free developer would pick a screen off the wall to work on, and usually was not working on more than one screen at a time. So the work-in-progress limit was generally one times the number of developers.
  • A formal done-done criteria. A screen was considered done when the developer showed it to the boss (product owner) and convinced him (in other words, demonstrated) that it was done. So what we were doing was actually compatible with agile development; we just didn’t formalize it.
  • Automated tests. Testing was done by the developers, but we did not have the capability to do automated testing of screens at the time, and I’m not sure we would have thought of that even if we had had the capability.
  • Documentation was treated as a completely separate project, so screens were generally considered done long before their documentation was completed. On the good side, the technical writer attended all of our design sessions, so she could be a lot more familiar with the product than we are used to today, where we just send documentation changes to the writers.
  • We didn’t have formal stories or burndown charts. The screens on the conference room window served as both.
  • We didn’t do any formal estimation like planning poker. We would look at the screens, guess at the how hard it would be to implement them, and with the boss’s input, decide how to prioritize them.

Why did this work?

  • The nature of the product, with its individual screens, made it very easy to create bite-sized stories.
  • This was a new product, so there was no legacy code to get in the way of doing things the way we wanted to.
  • This was a completely new development team, one that did not interface very much with the existing development organization, so we had no entrenched corporate culture to deal with.
  • There were no existing build tools to deal with. The whole build environment was up for grabs, and we could develop whatever we needed to support us, and change it as we felt necessary.

Interestingly, the team became less agile once we got the first release out. We could no longer rely on our screen mockups taped to the wall to keep track of the work to be done, and how it was progressing. We were forced to change to the more traditional release-cycle-oriented model, and had to deal with the existing procedures and facilities for shipping products, and adopt the ways of the existing development groups. And we started having to do technical support as well as development, which interfered with our development schedule, just as it does today.

Posted in Uncategorized | Leave a comment

An extensible information radiator

Note: This post is being shared with my other blog, The XML Adventure, since it pertains to both agile development and XML.

Teams that are practicing continuous integration often have a central information radiator to indicate when the build fails. Several different kinds have been used, such as an Ambient Orb (a translucent globe that can glow in different colors), lava lamps, talking teddy bears, and all sorts of other things. The idea is that when the information radiator indicates that the build is broken, it is everyone’s first responsibility to get the build working again, so that problems are not compounded by checking in more changes on top of the code that broke the build.

I was working with a team that was not yet ready for continuous integration, but I had some long-running test streams, where each test job, if successful, would submit the next one. Rather than constantly monitoring the test stream, I wanted to have a light, similar to what continuous integration teams use, that would turn amber when my test stream was running, green if all the tests passed, and red if it failed along the way.

First, I needed a light. I chose a red-green-amber LED from Delcom Products.  It comprised the following parts:

  • 904017 USB HID Visual Signal Indicator RGY Black Case w/ Center Cable ($91.35)
  • 804136 Straight Al. Mounting Pole 36″ ($16.80)
  • 804301 Mounting Pole Base ($9.35)

The total cost was $117.50.

I later replaced the mounting pole base with a Grooming Arm Clamp ($17.05), for a total cost of $125.20. This allowed the light to be clamped to the edge of the desk, whereas the mounting pole base had to be screwed into a block of wood.  (In case you are not familiar with grooming arms, they are used in grooming dogs. A grooming arm is a curved pole that clamps to the table, and the dog’s leash is hooked to it to keep the dog in place.)

Here is a picture of the test light in use:

To make things even more interesting, the test jobs were running on an IBM mainframe (z/OS) system, and the light was a USB device that plugged into my desktop computer, so I had to establish communication between the two.

I already had a test framework on the mainframe that would monitor the results of each test job. If the results were as expected, it would submit the next job in the sequence; if they were not, it would stop.

I modified the framework to update a file, LIGHT.DATA.XML. The first job in the sequence updates it with a status of “running”. Any failing job updates it with a status of “failed”. If all jobs are successful, the final job updates it with a status of “passed”. Here is an example of the file. The <jobname> and <system> elements are there for future expansion. Note that the encoding is specified as “EBCDIC-CP-US”, because the IBM mainframe uses the EBCDIC character code, rather than ASCII or UTF-8.

On the PC, there is a Windows batch script, LightDriver.bat, that loops continuously, calling the UpdateLight.bat script, waiting 15 seconds, and repeating the procedure. It looks like this:

@echo off
rem Call the UpdateLight script every 15 seconds to update the light status

:loop
timeout 15
call UpdateLight.bat > nul
goto loop

The UpdateLight.bat script uses FTP to retrieve the LIGHT.DATA.XML file from the mainframe. FTP translates the file from EBCDIC to ASCII as it is downloading it, but the XML header still says its encoding is EBCDIC-CP-US, so the sed stream editor is used to change the encoding specification to UTF-8.

sed "s/EBCDIC-CP-US/UTF-8/" dynamic/light_data_ebcdic.xml > dynamic/light_data.xml

The encoding is really ASCII, rather then UTF-8, but it is close enough, and it makes subsequent XML processing happier. (By the way, I probably would have been OK if I had just specified UTF-8 in the XML declaration in the EBCDIC file on MVS. But I wanted to allow for the possibility of processing the XML on MVS also, and that would require a valid specification of the EBCDIC encoding.)

The modified XML file is processed by an XQuery program, UpdateLight.xq, using the Zorba XQuery processor, to produce a new Windows batch script, setlight.bat:

zorba -f -q C:/SignalLight/eclipse/UpdateLight.xq 
      -o C:/SignalLight/dynamic/setlight.bat 
      --external-variable input:="C:/SignalLight/dynamic/light_data.xml" 
      --serialize-text

The XQuery program looks like this:

(: The $input variable is passed into Zorba, and determines the input file name :)
declare variable $input external;

(: Extract the status of the test run from the rest of the XML data :)
let $testStatus := doc( $input )/test/status/text()

(: Set up a variable to represent the Newline character :)
let $nl := "&#10;"

(: Set the light color, based on the test status :)
let $testColor := 
    if ( $testStatus = "passed" )
    then "green"
    else if ($testStatus = "failed")
    then "red"
    else if ($testStatus = "running")
    then "amber"
    else "unknown"

(: Set the parameters to the light utility, based on the color :)
let $lightParm :=
   if ( $testColor = "green" )
   then "1"
   else if ( $testColor = "red" )
   then "2"
   else if ( $testColor = "amber" )
   then "4"
   else "0"

(: Generate the lines for the light-setting script :)
let $line01 := concat( "@echo off", $nl )    
let $line02 := concat( "rem Turn signal light ", $testColor, $nl )
let $line03 := $nl
let $line04 := concat( "usbcmdapx64 0 0 101 12 ", $lightParm, " 7", $nl )

(: Assemble all the lines. We do this with concat, rather than just returning
   a sequence, to avoid a blank at the beginning of the second and 
   subsequent lines :)
let $lines  := concat( $line01, $line02, $line03, $line04 )

return {$lines}

The program extracts the value of the <status> element, and maps its three values, running, passed, or failed, into corresponding colors, amber, green, or red. If the <status> has an unexpected value, the color is set to dark.

The color is then mapped to a parameter for the command-line utility that sets the color of the LED light, 0 for dark, 1 for green, 2 for red, and 4 for amber.

The Windows command-line utility that comes with the Delcom light is USBCMDAPx64.exe. The program takes the following parameters, specified in this order:

  • v – Verbose.  If specified, this prints more informaton.
  • TID – type.  Specifying 0 means all.
  • SID – serial ID.  Specifying 0 means all.
  • Major command – Specifying 101 means send an eight-byte write command.
  • Minor command – Specifying 12 means set or reset the port 1 pins individually.
  • LSBDATA – specifies the pins to reset.  (Resetting a pin turns it on.)
  • MSBDATA – specifies the pins to set.  (Setting a pin turns it off.)

LSBDATA takes precedence, so specifying 7 for MSBDATA turns off all the pins, and then any pins specified in LSBDATA are turned on.

The pins for the various colors are:

  • 1 – Green
  • 2 – Red
  • 4 – Amber

It is possible to turn more than one color on at the same time, but it does not look very good.

The following commands will set the light to the various colors:

Green:  usbcmdapx64 0 0 101 12 1 7 
Red:    usbcmdapx64 0 0 101 12 2 7 
Amber:  usbcmdapx64 0 0 101 12 4 7 
Dark:   usbcmdapx64 0 0 101 12 0 7

The highlighted parameters are the ones that were set in the XQuery program.

The resulting Windows batch script looks like this:

@echo off
rem Turn signal light red

usbcmdapx64 0 0 101 12 2 7

The generated script is then called:

call dynamic/setlight.bat

This two-process, generating a dynamic Windows script and then calling it, is necessary because it is not easy to call programs out of an XQuery program.

This process worked well, with the LED light on my desk letting me know the status of my test jobs. But when the test stream failed, I really wanted to know which job failed. So I got a BetaBrite Prism scrolling LED sign. (These are sometimes called ticker-tape signs or Times Square signs.) Here are a couple of images of one, from a previous blog post, XML for LEDs:

 

The BetaBrite Prism connects to the PC via USB. The previous post was about BetaBright Classic signs, which used an RS-232 connection. The method described in that post does not work for the USB signs. Luckily, the folks at Industro Logic have a free command-line program to control BetaBrite Prism signs. You can download it here. The file you want is PRISMCOM.EXE.

If you run the program with no parameters, it gives you a help screen that shows you the options, but the basics are that you enclose control information in braces. In our case, we want to specify the color of the message, so the command would like like this:

prismcom usb {red}Test job TEST1234 failed

I modified the UpdateLight.bat script to run another XQuery program that takes the same input XML file and creates a Windows batch script, setsign.bat, that contains the prismcom command to send the message to the BetaBrite sign. As before, the script is then called.

Because the XQuery programs use XPath expressions to select the parts of the XML file that they need, I can add new information to the LIGHT.DATA.XML file without breaking anything. For example, I decided I wanted to have the job number, as well as its name, displayed on the sign, so I changed the test framework to add a <jobnum> element to the XML file. Once that was in place, I could update the XQuery program for the sign at my leisure.

Since the LightDriver.bat script invokes the UpdateLight.bat script every 15 seconds, it is easy to play with the message on the sign to see what works best. You edit the XQuery program, save it, and within 15 seconds you can see the result.

For example, after I added the job number, I had a message that looked like this:

{red}Test job TEST1234{amber}({red}JOB01723{amber}){red}failed

The parentheses surrounding the job number were amber to better set it off from the job name. The messages for running and passed status were similar, except for the colors.

The BetaBright sign scrolls rather quickly, and it was hard to read the job name and number as it scrolled by. So I changed the message to just display the job name and number, in the appropriate color:

{hold}{red}TEST1234 JOB01723

The {hold} tells the sign not to scroll. Instead, it alternates displaying the job name and the job number. I decided that the rest of the message was extraneous. The sign is located right next to the light, so people who are interested in it already know what it means. (It actually would not even need to change color, but having it match the color of the light looks better, and reassures people that the two are in sync.)

But what about color-blind coworkers? Over-reliance on color coding can be a problem for them. Right now, no one using this information radiator has that problem. But should it occur in the future, it can be easily solved by changing to a different kind of light:

This is a Patlite LED stack light. They are available on eBay, frequently for less than $50. (This light has four colors, but you really need only three.) They are not USB devices, but just have wires, one for each color plus a common wire. You will need  a 24 VDC power supply, a USB relay board from Numato Labs, and a little bit different programming. (The relay boards looks like a virtual COM port.)

With a stack light like this, people can tell what the light is saying by the position of the illuminated, even if they cannot differentiate between the colors.

This method lets the mainframe control the light and the sign without a bunch of TCP programming, or that sort of thing, so it can be set up very quickly. Using an XML file as the communication medium makes it easy to extend: new elements can be added to the XML file, and information radiators that are not interested in them will just ignore them.

With proper locking on the mainframe side, the XML file could even convey the status of several test streams, and individual information radiators could decide whether they wanted to report the status of a particular stream or all of them.

Posted in Uncategorized | Leave a comment

Low-cost, scalable, corporate information radiators

Many teams and companies have found information radiators useful. These are displays that show information and statistics, such as burndown charts, open issues, top backlog stories, or days till product release. They are located in well-trafficked areas, so rather than having to look up the information, people can almost absorb it by osmosis as they walk by. And seeing the information each time they walk by makes it more likely that they will notice important changes.

Most of the information radiators reported in the literature have been projects done by single teams, or a few teams in small companies. When you try to scale it up to a company with a lot of teams, it becomes trickier: although each team needs displays customized to the team, each team should not have to come up with their own solution, taking up time that would be better spent on product development. And standardizing hardware makes support easier, as well as keeping individual teams from getting too extravagant. (The 55-inch commercial display used by panic.com’s information radiator is tempting, but at over $3000, is a bit hard on the budget.)

What is needed is a standard way to set up information radiators that is not overly expensive, and does not take a lot of time to do. Here is a proposal for such a scheme.

A lot of companies have spare laptops and monitors, either because of upgrades or, in today’s lagging economy, reductions in force. The monitors are frequently in the 21-inch range which, while not as impressive as a 55-inch display, will do the job if placed on a counter or bookcase. The laptops do not need to be particularly powerful; they just need to be able to run a web browser. They can be running Windows, Linux, FreeBSD Unix, or Mac OSX; it doesn’t matter.

Some companies may be nervous that laptops that sit out day and night running the information displays might be stolen after hours. If this is a concern, or if you do not have a surplus of laptops, Android-powered set-top boxes, which cost less than $100, could be used instead. (You will probably have to clear this with your local IT department, as many have rules about attaching non-standard equipment to their networks.)

Once you have the laptop and monitor set up, you just need a start-up script that brings up the browser, which has its home page set to a special URL for information radiators, perhaps something like radiator.example.com. Spare laptops could be set up this way in advance, so when someone needs an information radiator, you just hand them the laptop and a monitor, and they find an Ethernet connection and plug it in. This relieves the IT department of the burden of setting up the information radiator.

The special URL points to a web server that supplies the pages for the displays. The server checks the IP address of the requester against a list of known IP addresses. If it does not recognize the IP address, it returns a page that displays the IP address, as well as who to contact to register the IP address. (It is assumed that the IP address will be static, since the information radiator is not moving around. If this is not the case, the registrar could, with a bit more effort, add its MAC address to the DHCP server to ensure that it remains the same.)

When someone registers an information radiator, they indicate the team they are with and what reports they want. Most teams will want the same sort of reports, using data extracted from the sprint-tracking or issue-tracking system and customized for their team. The server will return web pages for the various reports. Each page will include a META REFRESH tag that will cause the page to refresh every 15 seconds, and a different page will be displayed each time, cycling through the reports registered for that IP address. The company can also insert additional pages, like the days till product launch, or the date of the company picnic.

Although the pages are refreshing every 15 seconds, the reports will not actually be generated that often. Since the server knows what reports the various information radiators want for each team, it can pre-generate the reports at reasonable intervals and cache them. For example, if sprint hours are burned down in the daily standup, there is no need to generate the burndown report more than a few times a day (to accommodate teams with morning or afternoon standup schedules). On the other hand, a report of outstanding issues should probably be generated much more often.

If teams want to create their own reports, they can be contributed to the server, as long as they are parameterized, so that other teams can use them.

This approach lets teams get the benefits of information radiators without a lot of expense or setup time, imposes some standardization without being onerous, and lets teams easily share custom reports with other teams that might find them useful.

If the volume of new information radiator requests becomes high enough, a web-based GUI could be developed, to let teams register their information radiators and select which reports they would like. Chances are, though, that doing so would be more work than just having someone manually register them.

Posted in Uncategorized | Leave a comment

High-Performance Teams

Bob Schatz from Agile Infusion is visiting our facility this week, as he does from time to time.

(If you hire someone to train your developers about agile development, I highly recommend that you have them come back periodically. Teams forget some things that they had learned, and also, as they understand more about agile development, they can ask better questions, to improve their understanding even more.)

Bob was talking about high-performance teams, and he asked if we knew what it felt like to work in a team like that. The best analogies I can think of come from outside the software business:

  • When I lived in Detroit, I used to have a lot of friends involved in SCCA car racing, and we used to get together and work on our cars. When you are working on a car with someone, and just as you realize you need a 15 mm wrench, your partner hands it to you, that is a high-performance team. He is watching what you are doing, he knows that the nut you want to tighten is a 15 mm, and he sees that you are going to be ready to tighten it in a moment, so with no words spoken, he has the wrench ready for you.
  • Another example is a jazz band I used to go see in nightclubs around the Detroit area. They had two keyboard players, one who sang and played keyboards, and one who doubled on sax and keyboards. One night one of them was in the middle of a solo on the keyboard when a microphone stand started to fall over. He stopped playing and grabbed the stand, and the other keyboard player finished his solo, without missing a note.

I have been on emergency teams with a number very impressive colleagues. Usually, we would divide the work up based on our areas of expertise, and then go off in our corners and work on our parts, periodically getting back together to check our status. It was a great experience, but it still wasn’t a high-performance team, because we were not working closely together on the same thing, and had not been working that way over a long period of time.

This is one of the bad points about shifting resources around between teams frequently. It changes the team dynamics. Everyone has to learn how to work with the new person, or without the person who was pulled off the team. When teams are always in flux, it is difficult to foster the working relationships that result in high-performance teams.

 

Posted in Uncategorized | Leave a comment

Failing so you can win

It has long been known in engineering circles that much can be learned from failure. Claude Albert Claremont, in his 1937 book on bridge building, “Spanning Space,” wrote:

The history of engineering is really the history of breakages, and of learning from those breakages. I was taught at college “the engineer learns most on the scrapheap.”

In one of my past lives, I was involved in performance car rallying. This involved racing in beefed-up cars over forest logging roads. In the ’70s, there was a driver named John Buffum. His day job was running a car dealership in Vermont, but he was also the top rally driver in the U.S. and had factory support from British Leyland, who supplied him with Triumph TR-7s, like this one:

John Buffum, Libra Racing 1979

 

 

 

 

 

 

 

John Buffum, Libre Racing   © 1979, Lynn Grant

He was very fast, but he crashed a lot. Other drivers nicknamed him “Stuff ’em Buffum,” because he stuffed his car into the ditch so often. Each time, British Leyland would ship him a new TR-7 for the next race.

As time went on, he stopped crashing, but he was still very fast. Since he had crashed so many times, he knew exactly what the car felt like when it was right on the edge, so he knew when to back off. Other drivers who hadn’t had the luxury of getting a new car every race didn’t have this experience, so they had to be more cautious, and were thus slower. Buffum went on to win 11 National Pro Rally Championship titles and 117 Pro Rallies.

Tony Dismukes, a martial artist whose blog, BJJ Contemplations, I follow said this about practicing failure:

On the mats we have the opportunity to fail over and over and over again.  This is the only way to learn the limits of our techniques and of ourselves. As Rener and Ryron Gracie are fond of saying: every technique can work some of the time, no technique works all the time. Only by testing our techniques to failure can we learn exactly when and how much we can rely on each one. Only by testing them to failure can we truly understand which details are crucial for success and why.  Only by testing ourselves to failure can we understand exactly where our personal limitations are and begin to learn how to improve upon them.

Engineers, racers, and martial artists all see the benefits of learning from failure, but too often in software development we consider failure a luxury we cannot afford.

Much of this is because we are always under the gun, trying to develop software to meet some too-early deadline. This causes us to stick to things that worked in the past, rather than trying something new, something that, if it works out, could make the team much more efficient or, even if it fails, might teach us valuable lessons.

In his book, Kanban: Successful Evolutionary Change for Your Technology Business, David J. Anderson wrote:

You need slack to enable continuous improvement. You need to balance demand against throughput and limit the quantity of work-in-progress to enable slack.

Too often, we completely fill our release schedules with work, so that any failure that delays something by a sprint is a catastrophe. If we allowed ourselves enough slack to have experiments that failed from time to time, those failures would be made up for by the improved efficiency that comes from continuous improvement.

Posted in Uncategorized | Leave a comment

Meetings: Stop Wasting My Time

At most companies, there seems to be a never-ending supply of meetings to attend, and complaining about their sheer volume is popular water-cooler conversation. We may not be able to reduce the number of meetings, but can we make them more effective (and thus, shorter)? Certainly!

For example, many people still start and end their meetings on the hour. This ensures that the people in the meeting have to wait for everyone who had a previous meeting that ended on the hour to arrive. And if that meeting slopped over its end time by a few minutes, the delay is even more. If you start your meetings five minutes after the hour and end them five minutes before the hour, it gives people time to get from one conference room to another, and maybe even stop for a coffee or a bathroom break.

And about meetings that go over their scheduled time: When you do that, you are taking time that does not belong to you. It may be convenient for you to extend the meeting on-the-fly, but if the attendees have subsequent meetings that you are making them late for, you are wasting the time of the people in those meetings, which is terribly rude. On top of that, Agile development is all about delivering on time. If we can’t even bring a meeting in on time, how can we hope to deliver software on time?

Another example: Many Scrum teams use what Mike Cohn calls commitment-driven iteration planning. (Agile Estimating and Planning, p158). The team first figures out the number of hours available in the sprint, based on the percentage of their time they expect to be able to spend on stories and any days off that team members have. Then they take stories one at a time, break them into tasks, estimate how long each task will take, and subtract the time from the available hours. They continue to do this until they run out of hours. At that point, their sprint is loaded.

I have been in sprint planning meetings where figuring out the days off can take 15 to 20 minutes. The team goes around the table, with each team member saying something like “OK, I was going to take a few days off, maybe this sprint, or maybe in July. Or maybe I could take them in April.” Eventually, after a two- or three-minute monologue, the team member figures out how may days he or she is going to be out during the sprint. Then they move on to the next person.

It doesn’t have to be this way. They all know that they are going to be asked about days off in the sprint; this happens every planning meeting. Couldn’t they figure this out in advance, rather than making everyone sit around twiddling their thumbs while they muse about when it would be good to take vacation? In some cases, there are coverage issues. This can be handled by announcing in the daily standup that you would like to take some time off in the next sprint, and if there is a problem, working it out before the planning meeting.

And in spite of all that has been written in the literature about not keeping a room full of people waiting while a poor typist updates a story in the sprint-tracking tool, we are still doing it regularly. Write it down and do it later!

I have seen lots of stories in the Agile tracking tool that refer to issues in the problem tracking system, or to documents in documentation repositories. It is frequently possible to get a URL for these issues or documents, and this should be put in the story, rather than just saying “Issue 12345” or “the code spec is in the repository under this project”. If you make people search for information that you must of had a link to at the time you were looking at it, you waste the time of everyone who has to look at your story.

Wasting people’s time in meetings isn’t just rude; it severely impacts the flow of the meeting. While you are searching through the document repository to try to find the referenced code specification, the rest of the team starts checking their phone for messages, or talking about last night’s ballgame, or they decide this would be a good time to get a coffee or take a bathroom break. It may take you five or ten minutes to get the meeting back on track.

It is easy to gripe about meetings, but it is not much harder to make them a lot less wasteful, which will go a long way towards making them less irritating.

Posted in Uncategorized | Leave a comment

Creating chaos for better agility

Chaos Monkey is tool developed by Netflix to test the resiliency of their servers on the Amazon cloud when faced with failures. It periodically a terminates a random virtual machine that is running their application. Their automated error recovery is supposed to spin up a new virtual machine to replace the one that failed, and do so in a manner that appears seamless to customers.

Rather than just implementing the error recovery code, testing it once, and assuming that it will do the job, they are constantly testing it, figuring that if it really can seamlessly recover from failures, there should be no problem with randomly blowing away virtual machines.

Realizing that there is the possibility that the recovery could fail, they run Chaos Monkey between 9 AM and 3 PM on weekdays, so if a problem does occur, there will be people present who can deal with it. They also have a way for applications that they know are not ready for this to opt out.

This got me thinking about testing the agility of our teams.

One of the big reasons we do Agile development is so that we can change direction at sprint boundaries, if the priorities for delivering particularly stories changes. By finishing all their work by the end of the sprint, the team is able to change direction immediately.

Some teams have trouble understanding this. They resist breaking large stories into sprint-sized pieces, because they say it will increase the overall elapsed time to implement the change. This overlooks several things:

  • If it takes you six months to implement the change, the customer’s needs may have changed by the time you finish.
  • If you try to test six-months-worth of coding and it doesn’t work, you have to wade through all that code to find the error. If you are implementing it in Sprint-sized stories, you only have one-Sprint’s-worth of code to look through.
  • Priorities may change. The Product Owner may need to have the team implement some feature ahead of time to keep from losing a customer. If you are two months into a six-month product and you have dozens of modules open, it is very difficult to change direction.

For teams that cling to elapsed time as the only viable metric, I would propose engaging the Product Owner in the following exercise:

If you have several epics that have similar priorities, mix them up each Sprint. If the first epic has stories A, B, and C, and the second epic has stories P, Q, and R, and the team is currently working on story A, they will expect that in the next sprint they will be working on story B, and in the next, story C. Instead, have them work on story A, then P, then B, then Q, etc.

This will make transparent how agile they are, and will help get them out of the habit of assuming that if they don’t finish a story in a sprint, they can just roll it over to the next with no consequences.

Of course, just as Netflix runs Chaos Monkey only during weekdays when people are present, you would want to be careful about how you do this exercise:

  • Don’t do it during a deadline crunch.
  • Let the team know a sprint or two in advance that you are going to be doing this, and that you expect them to be able to do this.
  • Discuss in the Retrospective how well this worked, and what they can do to make it work better.
  • A few stubborn teams may say that this is stupid and a waste of their time. You may have to remind them that the Product Owner is responsible for the priority of stories in the backlog, and the team is responsible for committing only to what they can accomplish a Sprint.

Even teams that already buy into the idea of being able to change direction at sprint boundaries may discover impediments to doing so that they didn’t see before.

Just as Netflix exercises their recovery software so they know it will work when they get a real failure, teams should exercise their agility regularly, so that when a critical customer demand comes along, they are practiced and ready for it.

Posted in Uncategorized | Leave a comment

The Andon Light (or Stop the line, I want to get off)

One oft-mentioned feature of Lean manufacturing is the andon light or the andon cord. The idea is that any employee on the assembly line who encounters a problem pulls the andon cord, the line is stopped, and the light comes on to indicate where the problem is.

By the way, andon is the Japanese word for paper lantern, which has apparently been generalized to mean any lantern. Here are a couple of good references about the history of andon lanterns:

Although andon lights are frequently mentioned in the Agile development literature, some of their most important points are sometimes glossed over.

In A Study of the Toyota Production System, Shigeo Shingo, an industrial engineer noted in connection with Toyota’s SMED (Single Minute Exchange of Dies) program that reduced setup times for punch presses from many hours to less than a minute, said:

The andon is a visual control that communicates important information and signals the need for immediate action by supervisor. There are some managers who believe that a variety of production problems can be overcome by implementing Toyota’s visual control system. At Toyota, however, the most important issue is not how quickly personnel are alerted to a problem, but what solutions are implemented. Makeshift or temporary measures, although they may restore the operation to normal most quickly, are not appropriate.

The key point is that each time the line is stopped because of the andon, the team strives to make sure that the same error does not happen again. Shingo states this more forcefully when he says “At Toyota, there is only one reason to stop the line—to ensure that it won’t have to stop again.”

The andon lights used by continuous integration teams come close to this philosophy. The light comes on when a build fails, or the automated tests that run as part of the build fail. Everyone on the team stops what they are doing and works on fixing the build.

What is missing sometimes is the idea of making sure that it doesn’t happen again, perhaps by implementing a Poka-yoke solution that will prevent the error from happening again.

When you read about the Toyota production system, what is striking is the commitment to doing whatever it takes to eliminate waste and errors, even at the expense of some short-team pain.

This is difficult to do in a mature company that is trying to adapt to Agile development, because it is a major change, and various departments are not used to working closely together. But it is essential to eliminate waste.

Posted in Uncategorized | Leave a comment

Poka-yoke and software development

Poka-yoke (the final “e” is pronounced like “eh?” in English) is the Japanese term for “error proofing”, formalized by industrial engineer Shiego Shingo as part of the Toyota Production System. (He is said to have picked the term “error proofing” rather than “fool proofing” [baka-yoke] to underscore that the problem was not foolish workers, but the fact that everyone makes mistakes from time to time when given a chance.)

In the Toyota Production System, poka-yoke deals with designing vehicles and their assembly processes in such a way that it is difficult to assemble them incorrectly.

Here is an example from the 1940s of the need for poka-yoke: My father worked at Kaiser-Frazer when they used to make a car called the Frazer. It had “FRAZER” spelled out in individual chrome letters on the front of the hood, like this.

Posts that stuck out the back of each letter went through holes in the hood to hold the letters in place. The positions of the posts were standardized, making all the letters interchangeable. Sometimes, when an assembly worker was having a bad day, he would end up making ZEFFERs instead of FRAZERs. Automakers subsequently changed the letters so they could not be interchanged, either by putting the posts in a different position on each letter, or by making the logo a single unit, like this.

A more modern example is the floppy disk drives that used to be in PCs. They used an unkeyed four-pin power connector that could be installed the right way, which would make the drive work, or the wrong way, which would make it go up in smoke. You had to remember that the red wire went away from the data connector, unless you were working with the odd brand where the red wire had to go toward the data connector. (I fried several floppy disk drives this way.) That was the last unkeyed connector I remember seeing on a PC, so the PC industry evidently adopted poka-yoke.

There are all kinds of modern examples, like polarized power plugs, and IKEA furniture, which is difficult to assemble wrong, and the same concept applies to software development.

For example, suppose you have a routine that performs several different functions. (That, in itself, is probably a violation of the Single Responsibility Principle, but suppose there is a good reason for it to be that way.) Some of the functions require few parameters, while others require many. Callers have to remember to pass the right number of parameters, including dummy parameters. So you get something like this:

CALL BADRTN,(A,B,,,,,,,C,,,D),MF=(E,PARMLIST)

If you miscount the commas, you have a problem. It would be much less error-prone to have multiple entry points, each with a fixed number of parameters.

The same concept applies to development tools. In one build system I worked with, you have to type in the names of all the modules that are part of your change. If the build fails because you left out one of the changed modules, you have to re-submit the build request, again typing in the names all the modules. If you have 30 or 40 modules in your change, you might mistype or leave out one of the names you got right in the first request, causing the build to fail again. If you could just call up the first request and say you wanted to add a module, there would be much less chance of error.

Another case is anywhere that you have to enter the same information into two different places. Eventually, you will forget to update one of them, or will mistype the information while entering it the second time. If the systems can be made to talk to each other, this greatly lessens the chance for error.

Posted in Uncategorized | Leave a comment

Customer feedback: What we can learn from game developers

Online game companies are frequently on the forefront of technology, both the technology of the games, as well as how they are developed. For example, IMVU, a 3D online chat website, has been a leader in continuous deployment, deploying as many as 50 changes a day.

Another development leader is Cmune, a Chinese company that produces the MMO (Massively Multiplayer Online) first-person shooter game, UberStrike.

In case you are not familiar with first-person-shooter games, there are various levels in a game, each one more difficult than the previous. As a player proceeds through the game, gaining more points (however this is achieved in the mechanics of the particular game), the player ascends to harder and harder levels.

Each level has a map, which defines the terrain and buildings that the player has to negotiate while playing the level. Designing a map is a two-pronged affair. First the terrain and buildings have to be defined in such a way as to be fun to play. Next they have to be modeled and textured, so they look realistic.

Traditionally, both steps are completed before the level is made available to players. If the majority of players decide that the level is too easy or too difficult, then all of the effort in modeling and texturing it is wasted.

Cmune has decoupled these two steps for UberStrike with their Bluebox Maps program. Proposed level maps are made available to interested customers. They are not textured (they have a uniform blue color, thus the name of the program), and high-quality modeling has not been completed. Also, game mechanics, such as shooting, are not implemented. Here is an example of a Bluebox map.

Participating customers can download a Bluebox map and try it out, in order to determine whether it will be fun to play. Based on the feedback Cmune receives on a map, they either continue with the high-quality modeling and texturing, or they discard the map.

Developers outside of the game world can learn from the Bluebox program. When we think about getting feedback from customers, we usually think of showing them completed features. Since we break even large epics down into sprint-sized pieces, the feature we are demonstrating to the customer may be a small, incremental change, but it is generally complete.

In some cases, it may be beneficial to break changes into even smaller pieces, large enough that the customer can see if we are going in the right direction, but not polished enough to actually release.

This must be done with caution, particularly if we are using continuous integration, or other SCM methodologies where everything gets checked into the main branch. (For some hints, see my previous post, Small Stories, Legacy Code, and Scaffolding.) Perhaps a feature flag can be added, so when it is turned on, it lets the customer go as far as the part being demonstrated, and then stops.

One of the buzzwords of Agile development is failing fast. The sooner you can find out that what you are developing is not what the customer wants, the sooner you can change course, without a lot of wasted development time.

Posted in Uncategorized | Leave a comment