Blue Flower

Bob Schatz from Agile Infusion is visiting our facility this week, as he does from time to time.

(If you hire someone to train your developers about agile development, I highly recommend that you have them come back periodically. Teams forget some things that they had learned, and also, as they understand more about agile development, they can ask better questions, to improve their understanding even more.)

Bob was talking about high-performance teams, and he asked if we knew what it felt like to work in a team like that. The best analogies I can think of come from outside the software business:

  • When I lived in Detroit, I used to have a lot of friends involved in SCCA car racing, and we used to get together and work on our cars. When you are working on a car with someone, and just as you realize you need a 15 mm wrench, your partner hands it to you, that is a high-performance team. He is watching what you are doing, he knows that the nut you want to tighten is a 15 mm, and he sees that you are going to be ready to tighten it in a moment, so with no words spoken, he has the wrench ready for you.
  • Another example is a jazz band I used to go see in nightclubs around the Detroit area. They had two keyboard players, one who sang and played keyboards, and one who doubled on sax and keyboards. One night one of them was in the middle of a solo on the keyboard when a microphone stand started to fall over. He stopped playing and grabbed the stand, and the other keyboard player finished his solo, without missing a note.

I have been on emergency teams with a number very impressive colleagues. Usually, we would divide the work up based on our areas of expertise, and then go off in our corners and work on our parts, periodically getting back together to check our status. It was a great experience, but it still wasn't a high-performance team, because we were not working closely together on the same thing, and had not been working that way over a long period of time.

This is one of the bad points about shifting resources around between teams frequently. It changes the team dynamics. Everyone has to learn how to work with the new person, or without the person who was pulled off the team. When teams are always in flux, it is difficult to foster the working relationships that result in high-performance teams.

It has long been known in engineering circles that much can be learned from failure. Claude Albert Claremont, in his 1937 book on bridge building, "Spanning Space," wrote:

The history of engineering is really the history of breakages, and of learning from those breakages. I was taught at college "the engineer learns most on the scrapheap."

In one of my past lives, I was involved in performance car rallying. This involved racing in beefed-up cars over forest logging roads. In the ’70s, there was a driver named John Buffum. His day job was running a car dealership in Vermont, but he was also the top rally driver in the U.S. and had factory support from British Leyland, who supplied him with Triumph TR-7s, like this one:

John Buffum, Libre Racing © 1979, Lynn Grant

He was very fast, but he crashed a lot. Other drivers nicknamed him “Stuff 'em Buffum,” because he stuffed his car into the ditch so often. Each time, British Leyland would ship him a new TR-7 for the next race.

As time went on, he stopped crashing, but he was still very fast. Since he had crashed so many times, he knew exactly what the car felt like when it was right on the edge, so he knew when to back off. Other drivers who hadn’t had the luxury of getting a new car every race didn’t have this experience, so they had to be more cautious, and were thus slower. Buffum went on to win 11 National Pro Rally Championship titles and 117 Pro Rallies.

Tony Dismukes, a martial artist whose blog, BJJ Contemplations, I follow said this about practicing failure:

On the mats we have the opportunity to fail over and over and over again. This is the only way to learn the limits of our techniques and of ourselves. As Rener and Ryron Gracie are fond of saying: every technique can work some of the time, no technique works all the time. Only by testing our techniques to failure can we learn exactly when and how much we can rely on each one. Only by testing them to failure can we truly understand which details are crucial for success and why. Only by testing ourselves to failure can we understand exactly where our personal limitations are and begin to learn how to improve upon them.

Engineers, racers, and martial artists all see the benefits of learning from failure, but too often in software development we consider failure a luxury we cannot afford.

Much of this is because we are always under the gun, trying to develop software to meet some too-early deadline. This causes us to stick to things that worked in the past, rather than trying something new, something that, if it works out, could make the team much more efficient or, even if it fails, might teach us valuable lessons.

In his book, Kanban: Successful Evolutionary Change for Your Technology Business, David J. Anderson wrote:

You need slack to enable continuous improvement. You need to balance demand against throughput and limit the quantity of work-in-progress to enable slack.

Too often, we completely fill our release schedules with work, so that any failure that delays something by a sprint is a catastrophe. If we allowed ourselves enough slack to have experiments that failed from time to time, those failures would be made up for by the improved efficiency that comes from continuous improvement.

Chaos Monkey is tool developed by Netflix to test the resiliency of their servers on the Amazon cloud when faced with failures. It periodically a terminates a random virtual machine that is running their application. Their automated error recovery is supposed to spin up a new virtual machine to replace the one that failed, and do so in a manner that appears seamless to customers.

Rather than just implementing the error recovery code, testing it once, and assuming that it will do the job, they are constantly testing it, figuring that if it really can seamlessly recover from failures, there should be no problem with randomly blowing away virtual machines.

Realizing that there is the possibility that the recovery could fail, they run Chaos Monkey between 9 AM and 3 PM on weekdays, so if a problem does occur, there will be people present who can deal with it. They also have a way for applications that they know are not ready for this to opt out.

This got me thinking about testing the agility of our teams.

One of the big reasons we do Agile development is so that we can change direction at sprint boundaries, if the priorities for delivering particularly stories changes. By finishing all their work by the end of the sprint, the team is able to change direction immediately.

Some teams have trouble understanding this. They resist breaking large stories into sprint-sized pieces, because they say it will increase the overall elapsed time to implement the change. This overlooks several things:

  • If it takes you six months to implement the change, the customer's needs may have changed by the time you finish.
  • If you try to test six-months-worth of coding and it doesn't work, you have to wade through all that code to find the error. If you are implementing it in Sprint-sized stories, you only have one-Sprint's-worth of code to look through.
  • Priorities may change. The Product Owner may need to have the team implement some feature ahead of time to keep from losing a customer. If you are two months into a six-month product and you have dozens of modules open, it is very difficult to change direction.

For teams that cling to elapsed time as the only viable metric, I would propose engaging the Product Owner in the following exercise:

If you have several epics that have similar priorities, mix them up each Sprint. If the first epic has stories A, B, and C, and the second epic has stories P, Q, and R, and the team is currently working on story A, they will expect that in the next sprint they will be working on story B, and in the next, story C. Instead, have them work on story A, then P, then B, then Q, etc.

This will make transparent how agile they are, and will help get them out of the habit of assuming that if they don't finish a story in a sprint, they can just roll it over to the next with no consequences.

Of course, just as Netflix runs Chaos Monkey only during weekdays when people are present, you would want to be careful about how you do this exercise:

  • Don't do it during a deadline crunch.
  • Let the team know a sprint or two in advance that you are going to be doing this, and that you expect them to be able to do this.
  • Discuss in the Retrospective how well this worked, and what they can do to make it work better.
  • A few stubborn teams may say that this is stupid and a waste of their time. You may have to remind them that the Product Owner is responsible for the priority of stories in the backlog, and the team is responsible for committing only to what they can accomplish a Sprint.

Even teams that already buy into the idea of being able to change direction at sprint boundaries may discover impediments to doing so that they didn't see before.

Just as Netflix exercises their recovery software so they know it will work when they get a real failure, teams should exercise their agility regularly, so that when a critical customer demand comes along, they are practiced and ready for it.

At most companies, there seems to be a never-ending supply of meetings to attend, and complaining about their sheer volume is popular water-cooler conversation. We may not be able to reduce the number of meetings, but can we make them more effective (and thus, shorter)? Certainly!

For example, many people still start and end their meetings on the hour. This ensures that the people in the meeting have to wait for everyone who had a previous meeting that ended on the hour to arrive. And if that meeting slopped over its end time by a few minutes, the delay is even more. If you start your meetings five minutes after the hour and end them five minutes before the hour, it gives people time to get from one conference room to another, and maybe even stop for a coffee or a bathroom break.

And about meetings that go over their scheduled time: When you do that, you are taking time that does not belong to you. It may be convenient for you to extend the meeting on-the-fly, but if the attendees have subsequent meetings that you are making them late for, you are wasting the time of the people in those meetings, which is terribly rude. On top of that, Agile development is all about delivering on time. If we can't even bring a meeting in on time, how can we hope to deliver software on time?

Another example: Many Scrum teams use what Mike Cohn calls commitment-driven iteration planning. (Agile Estimating and Planning, p158). The team first figures out the number of hours available in the sprint, based on the percentage of their time they expect to be able to spend on stories and any days off that team members have. Then they take stories one at a time, break them into tasks, estimate how long each task will take, and subtract the time from the available hours. They continue to do this until they run out of hours. At that point, their sprint is loaded.

I have been in sprint planning meetings where figuring out the days off can take 15 to 20 minutes. The team goes around the table, with each team member saying something like "OK, I was going to take a few days off, maybe this sprint, or maybe in July. Or maybe I could take them in April." Eventually, after a two- or three-minute monologue, the team member figures out how may days he or she is going to be out during the sprint. Then they move on to the next person.

It doesn't have to be this way. They all know that they are going to be asked about days off in the sprint; this happens every planning meeting. Couldn't they figure this out in advance, rather than making everyone sit around twiddling their thumbs while they muse about when it would be good to take vacation? In some cases, there are coverage issues. This can be handled by announcing in the daily standup that you would like to take some time off in the next sprint, and if there is a problem, working it out before the planning meeting.

And in spite of all that has been written in the literature about not keeping a room full of people waiting while a poor typist updates a story in the sprint-tracking tool, we are still doing it regularly. Write it down and do it later!

I have seen lots of stories in the Agile tracking tool that refer to issues in the problem tracking system, or to documents in documentation repositories. It is frequently possible to get a URL for these issues or documents, and this should be put in the story, rather than just saying "Issue 12345" or "the code spec is in the repository under this project". If you make people search for information that you must of had a link to at the time you were looking at it, you waste the time of everyone who has to look at your story.

Wasting people's time in meetings isn't just rude; it severely impacts the flow of the meeting. While you are searching through the document repository to try to find the referenced code specification, the rest of the team starts checking their phone for messages, or talking about last night's ballgame, or they decide this would be a good time to get a coffee or take a bathroom break. It may take you five or ten minutes to get the meeting back on track.

It is easy to gripe about meetings, but it is not much harder to make them a lot less wasteful, which will go a long way towards making them less irritating.

One oft-mentioned feature of Lean manufacturing is the andon light or the andon cord. The idea is that any employee on the assembly line who encounters a problem pulls the andon cord, the line is stopped, and the light comes on to indicate where the problem is.

By the way, andon is the Japanese word for paper lantern, which has apparently been generalized to mean any lantern. Here are a couple of good references about the history of andon lanterns:

Although andon lights are frequently mentioned in the Agile development literature, some of their most important points are sometimes glossed over.

In A Study of the Toyota Production System, Shigeo Shingo, an industrial engineer noted in connection with Toyota’s SMED (Single Minute Exchange of Dies) program that reduced setup times for punch presses from many hours to less than a minute, said:

The andon is a visual control that communicates important information and signals the need for immediate action by supervisor. There are some managers who believe that a variety of production problems can be overcome by implementing Toyota’s visual control system. At Toyota, however, the most important issue is not how quickly personnel are alerted to a problem, but what solutions are implemented. Makeshift or temporary measures, although they may restore the operation to normal most quickly, are not appropriate.

The key point is that each time the line is stopped because of the andon, the team strives to make sure that the same error does not happen again. Shingo states this more forcefully when he says “At Toyota, there is only one reason to stop the line—to ensure that it won’t have to stop again.”

The andon lights used by continuous integration teams come close to this philosophy. The light comes on when a build fails, or the automated tests that run as part of the build fail. Everyone on the team stops what they are doing and works on fixing the build.

What is missing sometimes is the idea of making sure that it doesn’t happen again, perhaps by implementing a Poka-yoke solution that will prevent the error from happening again.

When you read about the Toyota production system, what is striking is the commitment to doing whatever it takes to eliminate waste and errors, even at the expense of some short-team pain.

This is difficult to do in a mature company that is trying to adapt to Agile development, because it is a major change, and various departments are not used to working closely together. But it is essential to eliminate waste.