Blue Flower

Poka-yoke (the final "e" is pronounced like "eh?" in English) is the Japanese term for "error proofing", formalized by industrial engineer Shiego Shingo as part of the Toyota Production System. (He is said to have picked the term "error proofing" rather than "fool proofing" [baka-yoke] to underscore that the problem was not foolish workers, but the fact that everyone makes mistakes from time to time when given a chance.)

In the Toyota Production System, poka-yoke deals with designing vehicles and their assembly processes in such a way that it is difficult to assemble them incorrectly.

Here is an example from the 1940s of the need for poka-yoke: My father worked at Kaiser-Frazer when they used to make a car called the Frazer. It had "FRAZER" spelled out in individual chrome letters on the front of the hood, like this.

Posts that stuck out the back of each letter went through holes in the hood to hold the letters in place. The positions of the posts were standardized, making all the letters interchangeable. Sometimes, when an assembly worker was having a bad day, he would end up making ZEFFERs instead of FRAZERs. Automakers subsequently changed the letters so they could not be interchanged, either by putting the posts in a different position on each letter, or by making the logo a single unit, like this.

A more modern example is the floppy disk drives that used to be in PCs. They used an unkeyed four-pin power connector that could be installed the right way, which would make the drive work, or the wrong way, which would make it go up in smoke. You had to remember that the red wire went away from the data connector, unless you were working with the odd brand where the red wire had to go toward the data connector. (I fried several floppy disk drives this way.) That was the last unkeyed connector I remember seeing on a PC, so the PC industry evidently adopted poka-yoke.

There are all kinds of modern examples, like polarized power plugs, and IKEA furniture, which is difficult to assemble wrong, and the same concept applies to software development.

For example, suppose you have a routine that performs several different functions. (That, in itself, is probably a violation of the Single Responsibility Principle, but suppose there is a good reason for it to be that way.) Some of the functions require few parameters, while others require many. Callers have to remember to pass the right number of parameters, including dummy parameters. So you get something like this:

CALL BADRTN,(A,B,,,,,,,C,,,D),MF=(E,PARMLIST)

If you miscount the commas, you have a problem. It would be much less error-prone to have multiple entry points, each with a fixed number of parameters.

The same concept applies to development tools. In one build system I worked with, you have to type in the names of all the modules that are part of your change. If the build fails because you left out one of the changed modules, you have to re-submit the build request, again typing in the names all the modules. If you have 30 or 40 modules in your change, you might mistype or leave out one of the names you got right in the first request, causing the build to fail again. If you could just call up the first request and say you wanted to add a module, there would be much less chance of error.

Another case is anywhere that you have to enter the same information into two different places. Eventually, you will forget to update one of them, or will mistype the information while entering it the second time. If the systems can be made to talk to each other, this greatly lessens the chance for error.

Online game companies are frequently on the forefront of technology, both the technology of the games, as well as how they are developed. For example, IMVU, a 3D online chat website, has been a leader in continuous deployment, deploying as many as 50 changes a day.

Another development leader was Cmune, a Chinese company that used to make the MMO (Massively Multiplayer Online) first-person shooter game, UberStrike. (UberStrike was "sunsetted" in June 2016.)

In case you are not familiar with first-person-shooter games, there are various levels in a game, each one more difficult than the previous ones. As a player proceeds through the game, gaining more points (however this is achieved in the mechanics of the particular game), the player ascends to harder and harder levels.

Each level has a map, which defines the terrain and buildings that the player has to negotiate while playing the level. Designing a map is a two-pronged affair. First the terrain and buildings have to be defined in such a way as to be fun to play. Next they have to be modeled and textured, so they look realistic.

Traditionally, both steps are completed before the level is made available to players. If the majority of players decide that the level is too easy or too difficult, then all of the effort in modeling and texturing it is wasted.

Cmune decoupled these two steps for UberStrike with their Bluebox Maps program. Proposed level maps were made available to interested customers. They were not textured (they had a uniform blue color, thus the name of the program), and high-quality modeling had not been completed. Also, game mechanics, such as shooting, were not implemented. Here is an example of a Bluebox map.

Participating customers could download a Bluebox map and try it out, in order to determine whether it would be fun to play. Based on the feedback Cmune receives on a map, they either continued with the high-quality modeling and texturing, or they discarded the map.

Developers outside of the game world can learn from the Bluebox program. When we think about getting feedback from customers, we usually think of showing them completed features. Since we break even large epics down into sprint-sized pieces, the feature we are demonstrating to the customer may be a small, incremental change, but it is generally complete.

In some cases, it may be beneficial to break changes into even smaller pieces, large enough that the customer can see if we are going in the right direction, but not polished enough to actually release.

This must be done with caution, particularly if we are using continuous integration, or other SCM methodologies where everything gets checked into the main branch. (For some hints, see my previous post, Small Stories, Legacy Code, and Scaffolding.) Perhaps a feature flag can be added, so when it is turned on, it lets the customer go as far as the part being demonstrated, and then stops.

One of the buzzwords of Agile development is failing fast. The sooner you can find out that what you are developing is not what the customer wants, the sooner you can change course, without a lot of wasted development time.

Continuous Deployment (CD), where changes are released several times a day, is popular among online game sites. IMVU for example, is a very strong champion of CD, and they use it to 50 times a day.

I used to use IMVU, and as far as I could tell, the releasing of changes was completely invisible to the user. But I have tried some other online games the use CD, and was not the case for them. What frequently happens is that you are doing the game thing, and a window pops up that says "The game has been enhanced. Please refresh your browser." And when you refresh, the game takes two minutes to load in all of its assets, which really disrupts the game play.

An even worse situation is a Real-Time Strategy wargame that a friend was playing. The game deployed a change in the battle rules, regarding how much food had to be sent along with troops in battle. The change took effect immediately, right in the middle of a battle, and all her troops starved to death because the amount of food she had sent was no longer sufficient.

In case some of you are thinking "These are just games. What's the big deal if your imaginary troops starve to death?" there are a couple of things to keep in mind. One is that there is a lot of money in online games, and things like this make your users get angry and go play somebody else's game. Another is that the same thing could apply to any online system. Suppose someone is half way through booking a hotel room, and you deploy a change that alters the quoted room rate. They will not be happy when they get their bill, and it does not match what you quoted them.

It's probably not possible to come up with a universal design pattern to make live deployments transparent to users, but as a minimum, the change should not completely lose the user's state, should not require a bunch of time to reload assets that were already loaded, and should not change the outcome of transactions that are in process (like battles in a game, or hotel rooms that are being booked).