The overlooked failure mode where technical strategy documents optimize for internal legibility—clean diagrams, neat swim lanes, phased roadmaps—instead of capturing the actual messy constraints that will determine success or failure

I spent last weekend rereading the Columbia Accident Investigation Board report, specifically how the foam strike risk got communicated up the chain. The Board's diagnosis stayed with me. They wrote that the uncertainties and assumptions which signaled danger dropped out when one manager condensed a formal engineering presentation into a verbal brief. The dangerous parts did not survive the condensation.

I think about that every time someone hands me a strategy deck.

There is a strange ritual in our industry. A team spends weeks wrestling with a hard problem. They debate what they do not know: whether the vendor will deliver, whether the migration assumption holds, whether the team has the skills required. Then, for the review, it all gets compressed into swim lanes and quarterly milestones. The diagram looks great. It no longer contains the actual reasons the plan might fail.

The compression is not accidental. Presentability has its own gravity. Conditional logic looks ugly on a slide. Risk factors make readers nervous. Honest assumptions sound like excuses prepared in advance. So we round them off and call it simplification. What we have really done is remove the parts that mattered most.

The military worked this out long ago. An operations order has a required Assumptions paragraph, and Air Force doctrine is blunt: if an assumption being wrong would not require a branch plan, it does not deserve to be called an assumption. Plans come with branches and sequels, each with named execution criteria, so "what do we do if reality disagrees with us" has an answer written down before contact. H. R. McMaster, who banned PowerPoint in his regiment, called the clean slide an illusion of understanding and control.

Infrastructure planners do something similar. A serious capital project does not promise a date. It reports a P50 and a P90, with contingency reserves for risks you can name and management reserves for ones you cannot. Bent Flyvbjerg's database of 16,000 megaprojects shows roughly 1 in 200 hits cost, time, and benefit targets together. Honest planners write the distribution down.

Compare that to the average quarterly roadmap, which expresses deadlines as points, dependencies as arrows, and uncertainty as a small section near the back that everyone scrolls past.

When I review a strategy document now, I ask three questions. Where does the author tell me what must be true for this plan to work? Where is the branch we execute if the central assumption proves wrong? Which of these milestones is a date, versus a probability dressed up as a date?

When the answers are not in the document, what I am holding is a picture of a strategy rather than the strategy itself.

A clean-looking deck should make you suspicious. Cleanliness usually means the messy parts were removed for clarity, and in technical work, those messy parts are where the strategy actually lives.

The legibility trap in technical strategy: a research overview

This brief synthesizes evidence across six domains for a senior engineering leadership audience exploring the thesis that clean strategy documents (decks, swim lanes, roadmaps, OKR cascades) systematically strip out the conditional logic, assumptions, and uncertainty that determine whether strategies actually succeed. Where evidence is strong vs. weak is flagged explicitly. The report avoids em-dashes and en-dashes per instructions.

1. The legibility / presentability trap

1a. James C. Scott, "Seeing Like a State" (Yale University Press, 1998)

Scott's framing is the most-cited intellectual foundation for the "presentable plan" critique outside management literature.

Core concept of legibility. Scott argues that states build "abridged maps" that "did not successfully represent the actual activity of the society they depicted, nor were they intended to" (p. 3). "Certain forms of knowledge and control require a narrowing of vision. The great advantage of such tunnel vision is that it brings into sharp focus certain limited aspects of an otherwise far more complex and unwieldy reality" (p. 11). [1][1]

Scientific forestry (the canonical example, pp. 11 to 22). German and Prussian "wissenschaftliche Forstwirtschaft" replaced multi-use mixed forests with rectilinear monoculture plantations of Norway spruce or Scotch pine, organized around the "Normalbaum" (a standardized tree of given size class whose saleable volume could be tabulated). The first rotation was a commercial success; the second rotation collapsed. Scott quotes the forestry literature (p. 20): "Many of the pure stands grew excellently in the first generation but already showed an amazing retrogression in the second generation. ... a production loss of 20 to 30 percent." The German vocabulary acquired a new noun, Waldsterben (forest death). The model "succeeded" only because it was parasitic on accumulated ecological capital the new system could not regenerate. [2]

Metis vs. techne (Chapter 9, pp. 309 to 341). Techne is codified, universal, deductive knowledge. Metis is practical, local, experience-based cunning: "the kind of knowledge that can be acquired only by long practice at similar but rarely identical tasks, which requires constant adaptation to changing circumstances." Scott: "any formula that excludes or suppresses the experience, knowledge, and adaptability of metis risks incoherence and failure" (p. 316). [3][4]

Direct organizational analogy. Venkatesh Rao's "A Big Little Idea Called Legibility" (Ribbonfarm) is the canonical pop-management adaptation; the book has been widely imported into software and management writing by Patrick McKenzie and the Stripe Press milieu. The Normalbaum analogy maps cleanly to OKR dashboards, traffic-light portfolios, and one-page strategies: artifacts that "grow well" in the first cycle because they rest on organizational metis they did not capture and cannot regenerate.

1b. Edward Tufte's PowerPoint critique

Sources. "PowerPoint Is Evil," Wired, Issue 11.09, September 2003; "The Cognitive Style of PowerPoint: Pitching Out Corrupts Within," Graphics Press, 2003 (2nd ed. 2006), also reprinted in "Beautiful Evidence" (2006).

Specific arguments. A typical business slide holds about 40 words, "about eight seconds' worth of silent reading material." NASA technical slides commonly had "4 to 6 levels of hierarchy" of nested bullets. Bullets dissolve causal logic: they show "effects without causes, actions without actors, verbs without subjects, and nouns without predicates." Units of measurement and uncertainty are routinely dropped ("dequantification"). His epigram: "Power corrupts. PowerPoint corrupts absolutely." Also (2nd ed., p. 25): "PowerPoint allows speakers to pretend that they are giving a real talk, and audiences to pretend that they are listening." [5]

The Columbia foam-strike slide. Boeing's Debris Assessment Team slide, titled "Review of Test Data Indicates Conservatism for Tile Penetration" (Feb 21, 2003), used six levels of bullet hierarchy. The most consequential information ("Flight condition is significantly outside of test database"; foam volume of 1,920 cubic inches vs. 3 cubic inches in the calibration test, a roughly 640x ratio) was buried at the lowest hierarchy level in the smallest type. The word "significant/significantly" appeared with five conflicting meanings on the single slide. Tufte's verdict: "a PowerPoint festival of bureaucratic hyper-rationalism," and a more honest title would have been "Review of Test Data Indicates Irrelevance of Two Models." [6]

The Columbia Accident Investigation Board (CAIB), Volume I, August 2003, p. 191 and p. 201. Direct quotes:

  • "The Board views the endemic use of PowerPoint briefing slides instead of technical papers as an illustration of the problematic methods of technical communication at NASA." [7]
  • "When engineering analyses and risk assessments are condensed to fit on a standard form or overhead slide, information is inevitably lost. In the process, the priority assigned to information can be easily misrepresented by its placement on a chart and the language that is used." [8]
  • "As information gets passed up an organization hierarchy ... key explanations and supporting information is filtered out. ... The uncertainties and assumptions that signaled danger dropped out of the information chain when the Mission Evaluation Room manager condensed the Debris Assessment Team's formal presentation to an informal verbal brief at the Mission Management Team meeting." [9]

The last sentence is the operational money quote: condensation upward eliminated exactly the uncertainty markers that signaled lethal risk.

Tufte also reconstructed the Challenger decision in "Visual Explanations" (1997): Morton Thiokol engineers had the O-ring data the night before launch, but plotted damage by launch date rather than by temperature. When replotted against temperature, the correlation between cold and erosion is immediate. Tufte: the engineers were "thinking causally" but "were not displaying causally." [10][11]

1c. Military critiques of PowerPoint

Primary source: Elisabeth Bumiller, "We Have Met the Enemy and He Is PowerPoint," NYT, April 26, 2010.

  • H.R. McMaster (then Brig. Gen., banned PowerPoint at 3rd Armored Cavalry Regiment in Tal Afar, 2005): "It's dangerous because it can create the illusion of understanding and the illusion of control. Some problems in the world are not bullet-izable." [12][13]
  • James Mattis (then Gen., Joint Forces Command): "PowerPoint makes us stupid" (Bumiller noted parenthetically: "He spoke without PowerPoint."). [14]
  • Stanley McChrystal in Kabul, summer 2009, shown the famous Afghanistan COIN "spaghetti slide": "When we understand that slide, we'll have won the war." Note: the slide was actually a sophisticated causal-loop diagram from PA Consulting Group, slide 22 of a 31-page progressive build; the standalone image, leaked by NBC's Richard Engel, became the meme. This actually strengthens the legibility argument: the artifact that travels is the artifact that gets read, and what travels well is usually thin. [15]
  • T.X. Hammes, "Dumb-Dumb Bullets," Armed Forces Journal, July 2009: PowerPoint is "the antithesis of thinking" and "actively hostile to thoughtful decision-making." [16][16]
  • Wong and Gerras, US Army War College, "Lying to Ourselves" (2015): officers commonly falsified PowerPoint reports because they were seen as administrative theater detached from operational reality. [17]

1d. Organizational pressures that drive clean-but-empty documents

  • One-pager culture. Sull, Homkes, and Sull (HBR, March 2015) found only 55% of middle managers can name even one of their company's top five strategic priorities, driving structural demand for ever-shorter strategy artifacts.
  • Board reporting. Quarterly board pre-reads of 50 to 200 pages incentivize compression to slides; uncertainty migrates to appendices or vanishes (PwC and NACD governance surveys).
  • RAG status reports. The categorical compression "watermelon" pattern (green on the outside, red inside) is well documented in UK Major Projects Authority reviews 2012 to 2016 and in Flyvbjerg's "strategic misrepresentation" thesis.
  • Counter-prescription: Amazon's six-page memo. Bezos's 2017 shareholder letter: "We don't do PowerPoint ... we write narratively structured six-page memos. We silently read one at the start of each meeting in a kind of 'study hall'." His 2004 internal email: "Powerpoint-style presentations somehow give permission to gloss over ideas, flatten out any sense of relative importance, and ignore the interconnectedness of ideas." [18]
  • Academic frame. Chris Argyris, "Skilled Incompetence" (HBR 1986) and "Teaching Smart People How to Learn" (HBR 1991): organizations train executives in routines that protect them from threatening information. Karl Weick (1993) on Mann Gulch: thin formal communications fail to preserve crucial context. Diane Vaughan (1996) on Challenger: "normalization of deviance" causes danger signals to be recategorized as acceptable variance until they vanish from the formal record.

2. How military planning explicitly captures uncertainty

The military is the cleanest counter-example: doctrine is built around the assumption that plans will fail.

2a. Commander's Intent and mission command

Core doctrine: ADP 6-0 (Mission Command, July 2019). "Friction and unforeseeable combinations of variables impose uncertainty in all operations and require an approach to command and control that does not attempt to impose perfect order, but rather accepts uncertainty and makes allowances for unpredictability." [19]

Mission command's seven principles include explicitly: competence, mutual trust, shared understanding, commander's intent, mission orders, disciplined initiative, risk acceptance. [20]

Commander's Intent is the doctrinally-required paragraph (3.a. of an OPORD) capturing purpose, key tasks, and end state. It exists specifically so subordinates can "act to achieve the commander's desired results without further orders, even when the operation does not unfold as planned." [21][22]

Auftragstaktik lineage. Capt. Adolf von Schell, German exchange officer at Fort Benning, 1930s: "In the German Army we use what we term 'mission tactics'; orders are not written out in the minutest detail, a mission is merely given to a commander. How it shall be carried out is his problem. This is done because the commander on the ground is the only one who can correctly judge existing conditions and take proper action if a change occurs in the situation." [23]

2b. The five-paragraph OPORD and explicit assumptions

The standard OPORD (Situation, Mission, Execution, Sustainment, Command/Signal) requires an explicit Assumptions subparagraph in Paragraph 1 (Situation). Doctrine: "An assumption is a supposition on the current situation or a presupposition on the future course of events, either or both assumed to be true in the absence of facts." [24]

AFDP 5-0 adds a critical disciplinary rule: "Potential plan changes due to incorrect assumptions should be addressed as branches or sequels ... An assumption, proved invalid, may lead to substantial changes in the approved plan. If an issue does not have this level of impact, it should not be an assumption." This is the discipline that most strategy decks lack: a forcing function that every named assumption either matters enough to spawn a branch plan or does not deserve to be called an assumption. [25]

2c. MDMP, MCPP, branches and sequels

MDMP (seven steps including Mission Analysis, COA Development, COA Analysis/Wargaming, COA Comparison, COA Approval, Orders Production) and the Marine Corps Planning Process (MCPP, six steps) both make conditional logic first-class. [26][27]

Branches are contingency options built into the base plan in case of changed conditions; sequels are follow-on operations predicated on the current operation's outcome. JP 5-0: "the CONOPS to describe how component actions will be integrated, synchronized, and phased to accomplish the mission, including potential branches and sequels." [28]

FM 3-0: "Comprehensive planning may be feasible only for the first engagement or phase of a battle; succeeding actions depend on enemy responses and circumstances. The art of tactical planning lies in anticipating and developing sound branches and sequels." Both branches and sequels have execution criteria: observable triggers tied via the Decision Support Matrix/Template (DSM/DST) to specific decisions with named owners. This is the structural equivalent of an engineering "if/then" gate. [29][29]

2d. "No plan survives contact with the enemy"

Helmuth von Moltke the Elder, "Uber Strategie" (1871). The accurate translation (Hughes and Bell, 1993, p. 92): "no plan of operations extends with any certainty beyond the first contact with the main hostile force." Paired in US doctrine with Eisenhower (1957): "Plans are worthless, but planning is everything." Both are operationalized as: produce the order, but design the system (intent, branches, sequels, decision points, CCIRs, AAR loops) so plan failure is anticipated and absorbed. [30][31]

2e. Red teaming

The University of Foreign Military and Cultural Studies (UFMCS) at Fort Leavenworth (the "Red Team University") was named with deliberate blandness because the Army Chief of Staff "did not want our adversaries to know that we had invested in red teaming education." Mission, per the UFMCS Red Team Handbook: provide commanders "an independent capability to fully explore alternatives to plans, operations, concepts, organizations, and capabilities." Techniques include Key Assumptions Check, Analysis of Competing Hypotheses, Devil's Advocacy, Team A/Team B, Liberating Structures, and pre-mortem. Red Team Leader's Course is 18 weeks (720 hours). [32]

2f. OODA loop

John Boyd (1927 to 1997), USAF Col., Pentagon reformer. Primary biography: Robert Coram, "Boyd: The Fighter Pilot Who Changed the Art of War" (2002). The OODA loop (Observe, Orient, Decide, Act, with feedback) is not a simple circle: "Orient" is the dominant box, containing genetic heritage, cultural traditions, prior experience, new information, and the destruction-and-creation engine. Boyd's strategic claim: victory comes from operating "inside" the adversary's OODA loop. Doctrinally adopted in MCDP 1, Warfighting (1989, signed by Gen. Al Gray). [33][34]

2g. Wargaming and the Millennium Challenge 2002 cautionary tale

Wargaming is built into MDMP Step 4 and MCPP Step 3 specifically to surface assumptions and identify branches/sequels.

Millennium Challenge 2002, a $250M joint exercise (24 July to 15 August 2002, ~13,500 participants). The Red OPFOR led by retired Marine Lt. Gen. Paul Van Riper used motorcycle couriers, mosque-loudspeaker coded signals, and swarming explosive speedboats to sink approximately 16 Blue warships, including a carrier and ten cruisers, preempting the Blue invasion plan. JFCOM refloated the ships and re-ran the exercise on rails. The post-mortem (declassified Nov 2024): "As the exercise progressed, the OPFOR free-play was eventually constrained to the point where the end state was scripted." Van Riper: "Nothing was learned from this. A culture not willing to think hard and test itself does not augur well for the future." Lesson for senior leaders: wargames have value only to the extent the red team is empowered to win. [35]

2h. AARs and pre-mortems

After-Action Reviews were codified in TC 25-20 (Sept 1993), now FM 7-0 Appendix K. Four canonical questions: What was supposed to happen? What actually happened? Why was there a difference? What can we learn? TC 25-20 explicitly: "An AAR is not a critique. ... An AAR does not grade success or failure." Peter Senge: "The Army's AAR is arguably one of the most successful organizational learning methods yet devised. Yet, most every corporate effort to graft this truly innovative practice into their culture has failed because, again and again, people reduce the living practice of AARs to a sterile technique." [36]

Pre-mortems: Gary Klein, "Performing a Project Premortem," HBR, September 2007. Grounded in Mitchell, Russo, and Pennington (1989): "prospective hindsight, imagining that an event has already occurred, increases the ability to correctly identify reasons for future outcomes by 30%." Klein: "Unlike a typical critiquing session ... the premortem operates on the assumption that the patient has died, and so asks what did go wrong." Kahneman in "Thinking, Fast and Slow" (2011) calls pre-mortem the "single greatest cure" for excessive optimism in groups because it "legitimizes doubts." [37]

2i. Specific operational examples

  • Tal Afar 2005, McMaster's 3rd ACR: the doctrinal exemplar of disciplined initiative under intent, cited in FM 3-24 Chapter 5 (Petraeus/Mattis, 2006). [38]
  • Inchon 1950: MacArthur accepted the operational risk of an Inchon landing; the discarded Kunsan COA was repurposed as the deception story. A rejected branch became a decisive asset (MCWP 5-1). [39]
  • Air Force Webb (9/11 air ops): AFDP 1-1 (2023) uses Lt. Col. Webb's WTC/Pentagon response missions as an exemplar of executing on intent in the total absence of follow-on orders. [40]

3. Infrastructure and engineering practices for documenting uncertainty

Mature engineering disciplines treat plans as probability distributions, not point estimates. Conditional logic lives in dedicated artifacts.

3a. Risk registers, contingency reserves, Monte Carlo

Risk registers (PMI/PMBOK): risk ID, description, cause/trigger, probability, impact, affected objectives, owner, response strategy (avoid, transfer, mitigate, accept), planned actions, residual risk, status. PMI definition: a risk is "an uncertain event or condition that, if it occurs, has a positive or negative effect on a project's objectives." [41]

Contingency reserve vs. management reserve. Contingency reserve is calculated for known unknowns and lives inside the cost/schedule baseline; management reserve is held above the baseline for unknown unknowns and requires governance action to release. [42]

Monte Carlo schedule risk analysis (QSRA). Probability distributions on activity durations and cost line items, thousands of iterations, S-curves at P10/P50/P80/P90 with sensitivity tornado charts. Tools: Oracle Primavera Risk Analysis (Pertmaster), @RISK, Safran Risk, Acumen Risk. [43]

Crossrail (London Elizabeth line). Documented use of an Active Risk Manager (ARM) database, four-weekly qualitative reviews, quarterly Quantitative Cost Risk Assessments, reporting "Anticipated Final Crossrail Direct Cost (AFCDC) at P50 and P95" (Crossrail Learning Legacy). Strategic Risk Register of "around 10 to 12 high level risk areas" reviewed monthly by the Crossrail Board. Despite the discipline, Crossrail still missed milestones substantially and overran by roughly 27% (final cost about £18.8B vs. original £14.8B), a reminder that quantitative risk analysis on a "squeezed" baseline schedule still understates risk. [44]

3b. Bent Flyvbjerg and reference class forecasting

Bent Flyvbjerg (BT Professor of Major Programme Management, Oxford Saïd; KPMG calls him "the world's leading megaproject expert"). [45][46]

Iron Law of Megaprojects: "Over budget, over time, under benefits, over and over again." Also: "Best practice is an outlier, average practice a disaster." [47][47]

Headline statistics (Flyvbjerg 2014, "How Big Things Get Done" 2023):

  • "Nine out of ten such projects have cost overruns." [47][48]
  • Channel Tunnel: 80% real-terms overrun. Denver International Airport: 200%. Big Dig: 220%. Canadian firearms registry: 590%. Sydney Opera House: 1,400%. [49][48]
  • From a database of 16,000+ projects: only 8.5% of projects hit both cost and time targets; just 0.5% achieve cost, time, AND benefits targets. 99.5% miss at least one. [50][51]
  • "92% of megaprojects come in over budget or over schedule, or both." [46][52]
  • Cost forecast inaccuracy: 44.7% for rail; 33.8% for bridges and tunnels; 20.4% for roads. "For the 70 year period for which cost data are available, accuracy in cost forecasts has not improved." [53][53]
  • IT projects exhibit fat tails: 18% have cost overruns exceeding 50%, with an average overrun within that tail of 447%. [51]

Reference Class Forecasting (RCF). Identify a reference class of past similar projects; establish their distribution; compare your project to derive most likely outcome and uplift. Rooted in Kahneman and Tversky's "inside view vs. outside view." First operational use: HM Treasury and UK Department for Transport, 2004 to 2005, producing the official UK "Optimism Bias Guidance" with mandated percentage uplifts. American Planning Association endorsed RCF in April 2005. [54]

Flyvbjerg's two diagnoses: optimism bias (psychological) and strategic misrepresentation (political; deliberate underestimation to win approval), producing a "break-fix model" of project delivery. [55][56]

3c. IPA and Edward Merrow

Independent Project Analysis (IPA), founded 1987 by Edward Merrow after 14 years at RAND. Database: 24,000+ capital projects, including 300+ megaprojects averaging USD 2.6 billion. [57][58]

Headline findings (Merrow, "Industrial Megaprojects," Wiley 2011/2024; SPE 153695, 2011):

  • 65% megaproject failure rate against cost, duration, quality. [59]
  • Upstream oil and gas: only 22% of megaprojects successful; unsuccessful 78% had 33% real cost overruns, 30% execution schedule slip, and 64% had serious production attainment problems in the first two years after first oil. [60][61]
  • Nuclear: median cost overrun 110%, median schedule slip 65%; about 10% success rate vs. ~33% for non-nuclear megaprojects. [62][62]

Merrow: "The saddest aspect of these big project failures is that we almost always knew the right things to do, but for a variety of (bad) reasons, we failed to do them." [59]

Front-End Loading (FEL). IPA's stage-gated model: FEL-1 (Appraise, +/-50%), FEL-2 (Select, +/-30%), FEL-3 (Define, +/-10 to 15%, basis for Final Investment Decision). IPA's FEL Index measures Site Factors, Engineering Status, and Project Execution Planning. "IPA research across more than 24,000 capital projects shows that the completeness of FEL is the single best predictor of safety, cost, schedule, and operability outcomes." [63]

3d. Probabilistic estimates: P10/P50/P90

SPE Petroleum Resources Management System (PRMS):

  • P90 = "Proved" (1P): ≥90% probability actuals equal or exceed estimate. [64][65]
  • P50 = "Proved + Probable" (2P, best estimate). [64]
  • P10 = "Proved + Probable + Possible" (3P, high estimate). [64]

Convention warning. In oil and gas reserves, P90 is the "low" volumetric estimate (probability that actuals exceed it). In cost and schedule risk, P90 is typically the "high" cost (only 10% chance of being exceeded). Senior leaders must clarify which convention is in use.

3e. Aviation, nuclear, aerospace

Probabilistic Risk Assessment (PRA) originated in WASH-1400 (Rasmussen Report, NUREG-75/014, 1975), led by Prof. Norman Rasmussen of MIT. First major commercial-reactor PRA using fault trees and event trees. Vindicated by Three Mile Island (1979), which matched its warning that "small breaks in piping were much more significant than the large break accident." Modern: NRC Levels 1, 2, 3 PRA (core damage frequency, containment performance, offsite consequences); standard ASME/ANS RA-S-1.1. [66]

INPO pre-job brief (INPO 06-002, 06-003, 07-006). The pre-job brief explicitly covers critical steps, error-likely situations, worst case scenarios, abort/hold criteria, contingencies, and operating experience. Tools include STAR (Stop, Think, Act, Review), peer check, three-way communication, place-keeping. INPO: "Conservative actions are taken when understanding is incomplete." [67]

Aviation Crew Resource Management (CRM), post-United 173 (1978) and Tenerife (1977), formalized at a 1979 NASA workshop. Threat and Error Management (TEM), first full-scale LOSA at Continental Airlines 1996, categorizes external threats, errors, and undesired aircraft states. [68][69]

Bowtie analysis (developed at Shell from ICI's Piper Alpha aftermath): unwanted event at center; threats and preventive barriers left; consequences and recovery barriers right; escalation factors degrading barriers. Adopted in ICAO Annex 19 and the ICAO Safety Management Manual. [70]

NASA NPR 7123.1D life cycle reviews: SRR, MDR/SDR, PDR, CDR, PRR, SIR, TRR, SAR, ORR, FRR, etc. Each has formal Entrance and Success Criteria plus a Standing Review Board of independent experts. TRL 6 (technology demonstrated in a relevant environment) is desirable before integration. [71][72]

Assumption logs and decision logs. PMBOK 6th edition formally introduced both as distinct project documents. The assumption log captures: assumption ID, statement, owner, basis, validity date, status, and action if invalidated. This makes conditional reasoning auditable.

4. Software engineering parallels: ADRs, RFCs, pre-mortems, the cone of uncertainty

4a. Architecture Decision Records (ADRs)

Origin: Michael Nygard, "Documenting Architecture Decisions," Cognitect blog, November 15, 2011. Nygard's framing: "A new person coming on to a project may be perplexed, baffled, delighted, or infuriated by some past decision. Without understanding the rationale or consequences, this person has only two choices: Blindly accept the decision [or] Blindly change it." [73]

Standard template:

  • Title: short noun phrase.
  • Context: "describes the forces at play, including technological, political, social, and project local. These forces are probably in tension, and should be called out as such. The language in this section is value-neutral." [74]
  • Decision: stated in full sentences, active voice: "We will ..." [74]
  • Status: proposed / accepted / deprecated / superseded. [74]
  • Consequences: "All consequences should be listed here, not just the 'positive' ones." [74]

ThoughtWorks Technology Radar moved Lightweight ADRs to Adopt in November 2017 (Vol. 17). ThoughtWorks: "Lightweight Architecture Decision Records is a technique for capturing important architectural decisions along with their context and consequences. ... For most projects, we see no reason why you wouldn't want to use this technique." [75]

Critically: superseded ADRs are not deleted. The historical context remains visible. This is the structural opposite of a strategy deck that replaces its previous version.

4b. RFC processes

Rust RFCs (rust-lang/rfcs): Summary, Motivation, Guide-level explanation, Reference-level explanation, Drawbacks, Rationale and alternatives, Prior art, Unresolved questions, Future possibilities. The Rust process states: "RFCs that do not present convincing motivation, demonstrate lack of understanding of the design's impact, or are disingenuous about the drawbacks or alternatives tend to be poorly-received." [76]

Python PEPs (since PEP 1, 2000): Abstract, Motivation, Rationale, Specification, Backwards Compatibility, Security Implications, How to Teach This, Reference Implementation, Rejected Ideas, Open Issues.

IETF RFCs (since RFC 1, 7 April 1969, Steve Crocker, "Host Software"): named "Request for Comments" specifically to avoid asserting authority. RFC 2119 standardized MUST/SHOULD/MAY semantics.

Internal tech RFCs: Google design docs, Stripe RFCs, Cloudflare/Squarespace public posts, Oxide Computer's fully public RFDs (rfd.shared.oxide.computer) with status states Prediscussion / Ideation / Discussion / Published / Committed / Abandoned.

4c. Pre-mortems in tech

See Section 2h above. Tech adopters include Atlassian (formal "Play" library), Google SRE ("disaster role-playing"), Amazon (PR-FAQ failure narratives), Shopify, Stripe, Airbnb.

4d. The cone of uncertainty

Steve McConnell, "Software Project Survival Guide" (1997) and "Software Estimation: Demystifying the Black Art" (2006). At Initial Concept, accurate estimates have a range of 0.25x to 4x, a 16x ratio. At Approved Product Definition, 0.5x to 2x. By Detailed Design, 0.9x to 1.1x. [77]

McConnell: "The Cone of Uncertainty represents the best-case accuracy ... it is impossible to have more accuracy; it's possible to have less." And: "The primary purpose of software estimation is not to predict a project's outcome; it is to determine whether a project's targets are realistic enough to allow the project to be controlled to meet them." [78][79]

Todd Little (IEEE Software, 2006) showed real projects exhibit a much wider, slower-narrowing cone than McConnell's idealized version. The implication: most tech strategy decks and roadmaps express deadlines as points, not distributions, which inverts the actual epistemics of software work.

5. Strategy execution research: what the data actually says

5a. The "70% fail" claim, scrutinized

The 70% figure is widely repeated but not traceable to a single rigorous study. The most consistent original sources are:

  • Ram Charan and Geoffrey Colvin, "Why CEOs Fail," Fortune, June 1999: in failed strategies, "the real problem isn't bad strategy ... it's bad execution" in roughly 70% of cases. Not "70% of all strategies fail"; rather "70% of failed strategies are caused by execution." [80]
  • Walter Kiechel III, "Corporate Strategists Under Fire," Fortune, 27 Dec 1982: only "less than 10% of formulated strategies get successfully implemented." This is the lineage for Kaplan and Norton's "fewer than 10%" claim.

Cândido and Santos (2015), "Strategy implementation: What is the failure rate?", Journal of Management and Organization: reviewed the literature and concluded no single defensible point estimate exists, but central tendency clusters between 50% and 70%.

5b. Mankins and Steele: the strategy-to-performance gap

Michael C. Mankins and Richard Steele, "Turning Great Strategy into Great Performance," HBR, July to August 2005. Marakon Associates and Economist Intelligence Unit surveys of senior executives at 197 large companies.

Headline: "Companies on average deliver only 63% of the financial performance their strategies promise." [81]

Causes of the 37% gap (in order):

  • Inadequate or unavailable resources (7.5%)
  • Poorly communicated strategy (5.2%)
  • Actions required to execute not clearly defined (4.5%)
  • Unclear accountabilities for execution (4.1%)
  • Organizational silos and culture (3.7%)
  • Inadequate performance monitoring (3.0%)
  • Inadequate consequences/rewards (3.0%)
  • Poor senior leadership (2.6%)
  • Uncommitted leadership (1.9%)
  • Unapproved strategy (0.7%)

Their Rule 2 is the most directly relevant to assumption documentation: "Debate assumptions, not forecasts." [81]

5c. Donald Sull (MIT Sloan)

"Why Strategy Execution Unravels, and What to Do About It," Sull, Homkes, and Sull, HBR, March 2015. Survey of 7,600+ managers in 262 companies across 30 industries and 30+ countries.

Key statistics:

  • "Two-thirds to three-quarters of large organizations struggle to implement their strategies." [82]
  • Only 55% of middle managers can name even one of their company's top five priorities.
  • 84% can rely on their boss; only 9% can rely on colleagues in other functions all the time (the real execution problem is lateral coordination, not vertical alignment).

The five myths debunked: execution equals alignment (wrong: it's coordination); execution means sticking to the plan (wrong: agility); communication equals understanding; performance culture drives execution; execution should be top-down. [83]

"Promise-Based Management: The Essence of Execution," Sull and Spinosa, HBR, April 2007. Good promises are Public, Active, Voluntary, Explicit, and Mission-based. "Explicitness is crucial especially when parties have different cultural backgrounds or the promise involves an abstract construct ('optimization,' 'innovation') subject to multiple interpretations." [84]

"Simple Rules" (2015), with Kathleen Eisenhardt: in complex, fast-changing environments, a few simple, transparent rules outperform elaborate plans. Six rule types: boundary, prioritizing, stopping, how-to, coordination, timing. [85]

5d. Kaplan and Norton

Robert S. Kaplan and David P. Norton, "The Balanced Scorecard," HBR Jan to Feb 1992; "The Strategy-Focused Organization" (2001).

The "9 out of 10" claim: "fewer than 10% of strategies effectively formulated are effectively executed." Traces to Kiechel (1982); widely cited and widely disputed.

1996 Renaissance Solutions follow-up survey:

  • Only 5% of the workforce understands the strategy.
  • Only 25% of managers have incentives linked to strategy.
  • 60% of organizations don't link budgets to strategy.
  • 86% of executive teams spend less than one hour per month discussing strategy. [80]

Strategy maps: causal models linking financial / customer / internal process / learning-and-growth perspectives. Functionally an ADR for management strategy: they make the conditional logic explicit. [86]

5e. Roger Martin

Former Dean (1998 to 2013), Rotman School of Management. Named #1 management thinker by Thinkers50 in 2017. [87][88]

"Playing to Win" (with A.G. Lafley, HBS Press, 2013): strategy as "a coordinated and integrated set of five choices: a winning aspiration, where to play, how to win, core capabilities, and management systems." [89]

"The Big Lie of Strategic Planning," HBR, January to February 2014: "Strategy making forces executives to confront a future they can only guess at. It's not surprising, then, that they try to make the task less daunting by preparing a comprehensive plan ... But good strategy is not the product of endless research and modeling." [90][91]

Martin: "Strategy isn't about finding answers. It's about placing bets and shortening odds. Make the logic explicit. Be clear about what must be true for the choices to make sense." "What must be true" testing is among the cleanest engineering analogs to a software RFC's alternatives section. [92][93]

5f. Richard Rumelt

Richard P. Rumelt, "Good Strategy/Bad Strategy" (Crown Business, 2011). Professor Emeritus, UCLA Anderson.

Kernel of good strategy: Diagnosis ("What's going on here?"), Guiding Policy ("Like the guardrails on a highway, the guiding policy directs and constrains action without fully defining it"), Coherent Actions. [94][95]

Four hallmarks of bad strategy:

  1. Fluff: "a form of gibberish masquerading as strategic concepts or arguments. It uses 'Sunday' words (words that are inflated and unnecessarily abstruse) and apparently esoteric concepts to create the illusion of high-level thinking." His famous bank example: "Our fundamental strategy is one of customer-centric intermediation" reduces to "Our bank's fundamental strategy is being a bank." [96]
  2. Failure to face the challenge: "If you fail to identify and analyze the obstacles, you don't have a strategy. Instead, you have either a stretch goal, a budget, or a list of things you wish would happen." [95]
  3. Mistaking goals for strategy: "statements of desire rather than plans for overcoming obstacles." [96]
  4. Bad strategic objectives: the "dog's dinner" of mislabeled lists.

Quotable for senior leaders: "A hallmark of true expertise and insight is making a complex subject understandable. A hallmark of mediocrity and bad strategy is unnecessary complexity, a flurry of fluff masking an absence of substance." And: "Bad strategy flourishes because it floats above analysis, logic, and choice, held aloft by the hot hope that one can avoid dealing with these tricky fundamentals." [97]

5g. The planning fallacy

Daniel Kahneman and Amos Tversky, "Intuitive Prediction: Biases and Corrective Procedures," TIMS Studies in Management Science 12 (1979), 313 to 327; reprinted in "Judgment Under Uncertainty" (1982).

Definition: "to underestimate the time required to complete a project, even when they have considerable experience of past instances in which they have underestimated such times." An optimism bias rooted in dominance of the inside view (narrative built from features of this project) over the outside view (statistical base rates from analogous past projects). [98]

Empirical magnitude: Buehler, Griffin, and Ross (1994) thesis-completion experiments find actuals 1.5x to 3x predictions; deviations persist even when subjects are given their prior track record. [99]

Operationalized for management by Lovallo and Kahneman, "Delusions of Success: How Optimism Undermines Executives' Decisions," HBR, July 2003. Kahneman won the 2002 Nobel for the broader judgment-under-uncertainty research program. [53]

6. Mechanisms by which presentability strips honesty

6a. Compression and the format constraint

CAIB's diagnosis is the cleanest formulation: "When engineering analyses and risk assessments are condensed to fit on a standard form or overhead slide, information is inevitably lost. ... the priority assigned to information can be easily misrepresented by its placement on a chart and the language that is used." [8]

Tufte's information-density measurement: a typical PowerPoint statistical graphic shows about 12 numbers, "below every major world publication except Pravda." His prescription: distribute a brief written report read in the first 5 to 10 minutes of the meeting (the Amazon six-pager is the most prominent adoption). [100][101]

6b. Risk-aversion in upward communication

The MUM effect. Rosen and Tesser, "On Reluctance to Communicate Undesirable Information: The MUM Effect," Sociometry 33 (1970), 253 to 263. Subjects readily transmitted neutral content but delayed or withheld negative content. Subsequent organizational research (Read 1962; Athanassiades 1973; Roberts and O'Reilly 1974; Milliken et al. 2003) confirmed subordinates "distort the information that they convey to their superiors, communicating upward in a way that minimizes negative information." [102]

Festinger (1950): "structuring groups into hierarchies automatically introduces restraints against free communication, particularly criticisms by low-status members toward those in higher-status positions." [103]

Morrison (Annual Review of Organizational Psychology, 2014): "Employees are also cognizant of the social discomfort created by difficult conversations and the transmittal of bad news. The desire to avoid such discomfort and to maintain social harmony often gives rise to the well-known MUM effect." [104]

Watermelon status reporting. Practitioner term of art (no single founding academic citation): a project shows green RAG status externally while internally red. The structural mechanism: RAG ratings are subjective; red is punished as blame rather than as a request for help; reporting cadences reward calm. Healthcare.gov's status pre-launch is the canonical case: the OIG and the Grassley Senate report both document that 18 written warnings and a McKinsey review were "swept under the rug" while top-line status remained green into October 2013. [105]

6c. Visual conventions that imply false precision

  • Bullet hierarchies (Tufte/CAIB): "The slide created six levels of hierarchy ... These levels prioritized information that was already contained in 11 simple sentences" - misallocating salience. [8]
  • Gantt charts and phased roadmaps: John Cutler (TBM 350): linear timelines "resist the mental model of a cascade. Once you factor in causality, feedback loops, and the time it takes for outcomes to emerge, most cascade models break down. Product work is not factory work. Resist the metaphors." Cutler's alternative: a "causal, exploratory DAG, not a linear execution plan ... capturing the uncertainty and portfolio thinking often missing in deterministic planning." [106][106]
  • Swim lanes: imply clean functional handoffs that rarely exist in practice; they encode an ideal-state process and erase the actual coordination work, the relationships, and the slack that make execution possible.
  • Tufte's "lie factor" (Visual Display of Quantitative Information, 1983): when chart area, axis, or hierarchy departs from the proportional structure of the underlying data, the chart manufactures false precision.
  • Challenger as the textbook example: arranging O-ring damage by launch date concealed a perfect temperature correlation; the data was present, but the chosen visualization "displayed poor reasoning and furthered it." [11]

6d. Sequential certainty vs. conditional reality

Roadmaps encode "do A, then B, then C." Real engineering is "if A succeeds, do B1; if A reveals X, do B2." Frameworks that better represent conditionality:

  • Decision trees (Howard Raiffa, "Decision Analysis," 1968).
  • Scenario planning (Pierre Wack, Shell, HBR 1985, "Scenarios: Uncharted Waters Ahead").
  • Real options (Trigeorgis; Dixit and Pindyck, "Investment Under Uncertainty," 1994), which formalizes the value of "wait and learn" branches that linear roadmaps eliminate.
  • Military branches and sequels with execution criteria (Section 2c).
  • SpaceX's build-test-learn with explicit failure budgets converted to design data per cycle.

6e. OKR cascade critiques

Christina Wodtke ("Radical Focus" 2nd ed.; "Cascading OKRs at Scale"): "When the organization has only one or maybe two levels of hierarchy, a straight cascade might make sense ... But when a company grows, it changes." On hard cascades: "Cascading is you take your boss' Key Result and make it your Objective (I really hate this one)." And: "OKRs are NOT for Command and Control ... OKRs ONLY work for empowered teams, otherwise they are a travesty." [107]

John Cutler: cascades impose a "deterministic planning" frame and obscure the causal DAG; "efforts to visualize reality across teams will often trigger a threat response." [106]

Documented failure modes:

  • Cascades hide cross-team dependencies because each team's KRs appear self-contained.
  • Cascades hide conflicts: when KR A for one team requires resources committed to KR B for another, the alignment diagram presents harmony while the calendar shows collision.
  • KRs framed as deliverables ("launch X") rather than outcomes preserve the appearance of progress when the underlying hypothesis is invalidated.
  • Watermelon dynamics apply: quarterly check-ins reward green grading.

7. Case studies: clean plans, hidden uncertainty

7a. The Big Dig (Boston Central Artery/Tunnel)

  • Original 1982 estimate: ~$2.2 to $2.8 billion. Congressional approval in 1991: $2.5 billion, 1998 completion. [108][109]
  • Final construction cost: ~$14.6 to $14.8 billion (substantially complete December 2007). [110][111]
  • All-in with bond interest by 2038: roughly $24 billion (Massachusetts officials, 2012). [112][113]
  • Five deaths during construction; Milena Del Valle killed by a falling concrete ceiling panel in July 2006; Bechtel/Parsons Brinckerhoff paid $407 million in restitution. [114][111]

How the plan hid risk. US DOT IG report (Oct 7, 1999): the 1998 project finance plan "did not disclose significant cost information about the project, such as construction cost increases or that contract awards were exceeding budget." The project tracked these internally in its Cost and Information System but did not report them to FHWA or GAO. Massachusetts Turnpike Authority chair James Kerasiotes capped the budget publicly at $10.8 billion in 1994 and concealed a $1.4 billion overrun until February 1, 2000, when he was forced into a "preemptive strike" disclosure to the Boston press; he was fired shortly after. Mass IG: "Employing staff from Bechtel Corporation and Parsons Brinckerhoff to perform value engineering analyses compromised the program's independence." The federal DOT IG called the project one of "the most flagrant breaches of integrity in the history of the 85 year old Federal-aid highway program." This is Flyvbjerg's canonical example of strategic misrepresentation, deliberate concealment rather than honest optimism error. [115]

7b. Healthcare.gov

  • Launched October 1, 2013; collapsed within roughly two hours under ~250,000 concurrent users (about 5x expected). [116]
  • CGI Federal contract grew from $93.7M to ~$292M pre-launch; total program cost ~$1.7B (HHS OIG, August 2014). [117][117]

How status reports hid problems. HHS OIG (Feb 2016) case study OEI-06-14-00350: "CMS continued on a failing path to developing HealthCare.gov despite signs of trouble, making rushed corrections shortly before the launch that proved insufficient." IG Daniel Levinson: "Most critical was the absence of clear leadership, which caused delays in decision-making, lack of clarity in project tasks, and the inability of CMS to recognize the magnitude of problems as the project deteriorated." [118][118]

GAO-14-694: CMS "delayed key governance reviews, moving an assessment of FFM readiness from March to September 2013, just weeks before the launch, and did not receive required approvals." Washington Post (Goldstein, Feb 22, 2016): "During the two years before the disastrous opening of HealthCare.gov, federal officials in charge of creating the online insurance marketplace received 18 written warnings that the mammoth project was mismanaged and off course but never considered postponing its launch." A March 2013 McKinsey review warned senior White House and CMS officials of critical risks; its findings were not shared with subordinates until after launch. At launch, only 23% of website code had been tested per TurningPoint independent verification. [119]

Mikey Dickerson rescue. Called in via White House CTO Todd Park, October 11, 2013. Twice-daily stand-ups from October 24, 10:00 a.m. and 6:30 p.m., approximately 45 minutes each, with three explicit rules (Steven Brill, "Code Red," Time, March 10, 2014): [120]

  • Rule 1: no finger-pointing. [121]
  • Rule 2: "The ones who should be doing the talking are the people who know the most about an issue, not the ones with the highest rank. If anyone finds themselves sitting passively while managers and executives talk over them with less accurate information, we have gone off the rails."
  • Rule 3: "We need to stay focused on the most urgent issues, like things that will hurt us in the next 24 to 48 hours."

Over ~6 weeks, ~400 defects fixed; concurrency raised to ~25,000 users; page response to ~1 second. The episode led to creation of the U.S. Digital Service in August 2014. [122]

7c. NHS National Programme for IT (NPfIT)

  • Launched 2002, dismantled September 2011. [123][124]
  • Original cost: £2.3 billion over three years. NAO (June 2006): £12.4 billion over ten years. Department of Health (2013): £9.8 billion. PAC (Sept 2013): one of the "worst and most expensive contracting fiascos" in UK public-sector history. [125]
  • Fujitsu southern care-records contract termination: £31.5 million in legal costs over four years. Lorenzo deployed to only ~22 trusts vs. the 160 originally promised. [126][126]

How the clean plan hid complexity. The Wanless Review (2002) had warned the programme needed "clear and well developed views about the benefits ... and how they will be delivered, with patients at the core of the system." Ignored. The mega-contract structure (five Local Service Provider clusters) was locked in before requirements were stable, so the contract effectively defined the solution rather than emerging from validated clinical workflow. When Accenture withdrew in September 2006, the contract entitled the government to up to £1 billion in compensation; Director-General Richard Granger settled for £63 million, an indicator of the actual leverage problem behind the originally clean cluster contracting model. PAC (2013) found benefits to March 2012 totaled £3.7 billion against £6.4 billion of cost; two-thirds of forecast benefits unrealized. [127]

7d. Columbia STS-107 (engineering by viewgraphs)

See Section 1b. The CAIB sidebar "Engineering by Viewgraphs" (Volume I, p. 191) and the Boeing slide "Review of Test Data Indicates Conservatism for Tile Penetration" remain the single best-documented case of a presentable engineering artifact directly causing lethal misjudgment.

7e. Other notable cases

Denver International Airport baggage system: BAE Systems integrated automated baggage handling grew from ~$193M to over $400M; abandoned for normal operations and decommissioned by 2005. Airport opening delayed 16 months; delay cost ~$1.1M per day (GAO/RCED-95-35BR). Breier Neidle Patrone Associates' 1990 commissioned study had warned the proposed system was "too complex" and required prior R&D commitment. The warning was ignored. The Munich comparator required two years of build plus six months of 24x7 testing; Denver's plan provided neither.

FBI Virtual Case File / Sentinel: Virtual Case File abandoned April 2005 after ~$170M (Aerospace Corp Jan 2005 review: SAIC software "incomplete, inadequate and so poorly designed that it would be essentially unusable"). Sentinel: original $425M budget (2006). After IG criticism and a benchmark test failure in October 2010, the FBI brought development in-house, switched to Agile, cut the team from ~400 to 40, and delivered agency-wide on July 1, 2012, within the $451M ceiling. Lesson: replacing the waterfall plan with smaller incremental segments and honest defect telemetry rescued the program. [128])

UK Post Office Horizon precursor: cancelled benefits-card precursor (1999) was called by the PAC "one of the biggest IT failures in the public sector"; ~£700M lost. The optimistic procurement narrative that followed (Horizon as the largest non-military IT contract in Europe) compounded an already-troubled foundation, contributing to the later Horizon scandal. [129]

7f. Counter-examples: honest uncertainty led to success

  • Apollo program: George Mueller's "all-up testing" plus a program-office structure forced explicit risk capture. After Apollo 1, NASA institutionalized an "engineering culture that encouraged an environment of open communications, attention to detail, and ability to challenge technical assumptions" (NASA, Harry Jones, NTRS 20190002249). The CAIB later contrasted this with the Shuttle program's safety-culture deterioration. [130][131]
  • SpaceX: Musk: "Failure is an option here. If things are not failing, you are not innovating enough." NASA Commercial Crew comparative review: "SpaceX focuses on rapidly iterating through a build-test-learn approach that drives modifications toward design maturity. Boeing utilizes a well-established systems engineering methodology targeted at an initial investment in engineering studies and analysis to mature the system design prior to building and testing the hardware." [132][133]
  • Heathrow Terminal 5: BAA used an integrated risk register, owner-carried risk, and reference-class uplifts; opened on time and on budget in 2008.
  • Madrid Metro extensions: built on time and to budget under a sustained portfolio approach (Flyvbjerg 2005). [134][135]
  • Norwegian KS2 Quality Assurance scheme: mandated reference class forecasting since 2000; improved cost outcomes vs. pre-mandate baseline.
  • Inchon (1950): see Section 2i. A rejected branch became a decisive deception asset.
  • Tal Afar (2005): see Section 2i. McMaster's discarded NTC scenarios and substitution of realistic COIN scenarios with Arab-American role players is the doctrinal exemplar of disciplined initiative under intent. [136]

8. Synthesis: the convergent prescription

Across military doctrine, capital project disciplines, aerospace safety, nuclear operations, aviation, and software engineering, the mature practices converge on a few principles for documenting strategy under uncertainty:

  1. Plans are probability distributions, not point estimates. P10/P50/P90 (oil and gas, infrastructure, Crossrail's P50/P95 AFCDC); JCL (NASA); the cone of uncertainty (McConnell); reference class forecasting (Flyvbjerg).
  2. Conditional logic lives in dedicated artifacts. Branches and sequels with execution criteria (JP 5-0, FM 3-0); risk registers (PMI); assumption logs and decision logs (PMBOK 6th); event trees and fault trees (PRA); bowties (ICAO); decision trees with EMV; ADRs (Nygard); RFCs (Rust, Python, IETF, Oxide). Each is a structured place where conditionality is required, not buried. [29]
  3. The reserve structure distinguishes types of uncertainty. Contingency reserve for known unknowns inside the baseline; management reserve for unknown unknowns above the baseline with governance gating. Strategy decks rarely make this distinction. [137][138]
  4. Front-end quality dominates. IPA's 24,000+ project database: "the completeness of FEL is the single best predictor of safety, cost, schedule, and operability outcomes." Merrow: "the saddest aspect of these big project failures is that we almost always knew the right things to do, but for a variety of (bad) reasons, we failed to do them." [139][59]
  5. Adversarial assumption testing is institutionalized. Red teaming (UFMCS), wargaming (MDMP/MCPP), pre-mortems (Klein), Team A/Team B, devil's advocacy, structured analytic techniques.
  6. Post-event learning is structured and blameless. AARs (TC 25-20 / FM 7-0 App K); INPO post-job critiques; aviation LOSA; superseded ADRs preserved rather than deleted.
  7. Intent is mandatory and separated from execution detail. Commander's Intent (ADP 6-0 paragraph 3.a.): purpose + key tasks + end state, with the "how" delegated.
  8. The artifact format is engineered to admit uncertainty. Amazon six-pagers force prose. Rust RFCs require Drawbacks, Alternatives, Unresolved questions. ADR templates require Consequences (both positive and negative). NASA review boards require Entrance Criteria. INPO pre-job briefs require abort/hold criteria. Slides, swim lanes, and Gantt charts do none of this.

The unifying empirical finding is Flyvbjerg's Iron Law: "Over budget, over time, under benefits, over and over again." Best practice is an outlier; average practice is a disaster. Strategy documents that look clean almost certainly resemble the average practice: they look clean precisely because the messy parts have been removed. The CAIB's diagnosis is the clearest single sentence: "The uncertainties and assumptions that signaled danger dropped out of the information chain when the Mission Evaluation Room manager condensed the Debris Assessment Team's formal presentation to an informal verbal brief at the Mission Management Team meeting." The same dynamic operates in nearly every executive review, board pre-read, and quarterly OKR cascade. [9]

Strength of evidence: a quick guide

Very strong evidence (primary documents, multiple independent sources):

  • CAIB findings on PowerPoint and Columbia (CAIB Vol. I, August 2003).
  • Big Dig cost progression and concealment (US DOT IG, Mass IG, GAO).
  • Healthcare.gov failure mechanics (HHS OIG OEI-06-14-00350; GAO-14-694; Senate report).
  • NPfIT cost and failure (NAO 2006; PAC 2007, 2013).
  • Mankins and Steele's 63% gap (single study, but rigorously documented in HBR 2005).
  • Sull et al.'s 7,600-manager survey on coordination vs. alignment.
  • Klein's pre-mortem mechanism (Mitchell, Russo, Pennington 1989 underlying study).
  • Flyvbjerg's database statistics (16,000+ projects; the underlying database is proprietary but sub-samples are peer-reviewed).
  • Rosen and Tesser MUM effect (1970 Sociometry paper).
  • Scott's "Seeing Like a State" arguments (the book itself is the source; widely cited and influential).

Moderate evidence (well-known but with caveats):

  • The "70% of strategies fail" claim: widely cited; not traceable to a single rigorous study; central tendency 50% to 70% per Cândido and Santos 2015 literature review.
  • Kaplan and Norton's "fewer than 10% executed" figure: traces to Kiechel (Fortune 1982), not a peer-reviewed source.
  • Klein's "30% improvement" from pre-mortem: based on Mitchell, Russo, Pennington (1989) prospective hindsight studies, not a separate trial of pre-mortem itself.
  • "Watermelon" reporting: well documented in practitioner literature; no single founding academic citation.
  • Boyd's specific influence on Desert Storm planning: Coram's account is the most-cited but partisan; contested in some Air Force historiography.

Specific corrections to common misattributions:

  • "PowerPoint makes us stupid" is most accurately Mattis (April 2010, confirmed by Bumiller). McMaster's documented quote is "illusion of understanding and illusion of control."
  • The Afghanistan "spaghetti slide" was a system-dynamics causal-loop diagram by PA Consulting, slide 22 of a 31-page progressive build; the meme treats it as incompetent PowerPoint, which strengthens rather than weakens the legibility argument because the artifact that traveled was the static image, not the build.
  • Moltke's "no plan survives contact with the enemy" is actually "no plan of operations extends with any certainty beyond the first contact with the main hostile force" ("Uber Strategie," 1871). [30]
  • Charan and Colvin's Fortune (1999) "70%" referred to execution being the cause among failed strategies, not 70% of all strategies failing.
  1. The Anarchist Library — https://theanarchistlibrary.org/library/james-c-scott-seeing-like-a-state.html
  2. Goodreads + 5 — https://www.goodreads.com/book/show/20186.Seeing_Like_a_State
  3. Goodreads — https://www.goodreads.com/work/quotes/21381-seeing-like-a-state-how-certain-schemes-to-improve-the-human-condition
  4. Nat Eliason — https://www.nateliason.com/notes/seeing-like-a-state-james-c-scott
  5. uky + 5 — https://www.uky.edu/~gmswan3/544/PowerPoint_Is_Evil.pdf
  6. SlideServe + 5 — https://www.slideserve.com/search/tile-test-data-ppt-presentation
  7. Edward Tufte & Graphics Press +2 — https://www.edwardtufte.com/notebook/columbia-accident-investigation-board-the-boeing-powerpoint-slide/
  8. gocivilairpatrol — https://www.gocivilairpatrol.com/media/cms/Columbiapdf_5161B0B3295B0.pdf
  9. Timothyblee +2 — https://timothyblee.com/2009/11/12/bottom-up-thinker-edward-tufte/
  10. Statsthinking21 — https://statsthinking21.github.io/statsthinking21-core-site/data-visualization.html
  11. Online Ethics — https://onlineethics.org/cases/representation-and-misrepresentation-tufte-and-morton-thiokol-engineers-challenger
  12. PCWorld — https://www.pcworld.com/article/195081/powerpoint_makes_you_stupid_says_military_intel.html
  13. Wordpress — https://aidontheedge.wordpress.com/2010/04/27/powerpoint-complexity-and-the-art-of-hypnotising-chickens/
  14. Slashdot + 3 — https://news.slashdot.org/story/10/04/27/1425207/powerpoint-of-afghan-war-strategy
  15. SD wise +3 + 3 — http://sdwise.com/2013/07/hey-new-york-times-a-causal-loop-diagram-is-not-a-powerpoint-fail/
  16. Slate — https://slate.com/technology/2023/02/military-powerpoint-memes-colin-powell.html
  17. The Week — https://theweek.com/articles/673091/general-mattis-save-military-ban-powerpoint
  18. Anecdote + 2 — https://www.anecdote.com/2018/05/amazons-six-page-narrative-structure/
  19. The Lightning Press — https://www.thelightningpress.com/adp-6-0-mission-command/
  20. War Room — https://warroom.armywarcollege.edu/articles/new-doctrine-mission-command/
  21. Army — https://rdl.train.army.mil/catalog-ws/view/100.ATSC/29574D4B-F3F9-425F-921E-9760E0AC3E4C-1637777286078/report.pdf
  22. Brainscape — https://www.brainscape.com/flashcards/adp-6-0-adrp-6-0-mission-command-4018532/packs/5932898
  23. Smallwarsjournal — https://archive.smallwarsjournal.com/index.php/jrnl/art/how-germans-defined-auftragstaktik-what-mission-command-and-not
  24. LegalClarity + 2 — https://legalclarity.org/what-is-an-operations-order-opord-and-its-key-elements/
  25. DoD — https://www.doctrine.af.mil/Portals/61/documents/AFDP_5-0/AFDP5-0Planning.pdf
  26. Grokipedia — https://grokipedia.com/page/Military_Decision_Making_Process
  27. Scribd — https://www.scribd.com/document/537883099/9-MCWP-5-10-FRMLY-MCWP-5-1-Marine-Corps-Planning-Process-USMC-2010-Informativa
  28. Brainly + 2 — https://brainly.com/question/15897292
  29. Global Security — https://www.globalsecurity.org/military/library/policy/army/fm/3-0/ch6.htm
  30. Wikiquote — https://en.wikiquote.org/wiki/Helmuth_von_Moltke_the_Elder
  31. Kaiserslautern American — https://www.kaiserslauternamerican.com/commander-explains-science-of-planning/
  32. Buzzsprout + 2 — https://www.buzzsprout.com/2109174/episodes/13805320-red-teaming-with-colonel-ret-steven-rotkoff
  33. Waru — https://www.waru.edu/library/damag/september-october2021/revisiting-john-boyd
  34. Marine Corps Association — https://www.mca-marines.org/gazette/opening-the-loop/
  35. Grokipedia + 4 — https://grokipedia.com/page/Millennium_Challenge_2002
  36. Grokipedia + 3 — https://grokipedia.com/page/After-action_review
  37. ResearchGate + 4 — https://www.researchgate.net/publication/3229642_Performing_a_Project_Premortem
  38. Eurasia Review — https://www.eurasiareview.com/20032017-counterinsurgency-from-bottom-up-colonel-h-r-mcmaster-and-the-3rd-armored-cavalry-regiment-in-tel-afar-spring-fall-2005-analysis/
  39. Marines — https://www.marines.mil/Portals/1/MCWP%205-1.pdf
  40. DoD — https://www.doctrine.af.mil/Portals/61/documents/AFDP_1-1/AFDP%201-1%20Mission%20Command.pdf
  41. PMI — https://www.pmi.org/learning/library/model-risk-contingency-reserve-9310
  42. ProjectManagement.com +2 — https://www.projectmanagement.com/blog-post/5806/management-reserves-and-contingency-reserves--what-s-the-difference-
  43. Iqrm — https://iqrm.net/blog/monte-carlo-simulation-project-risk-management
  44. Crossrail + 3 — https://learninglegacy.crossrail.co.uk/learning-legacy-themes/project-and-programme-management/programme-and-control-reporting/risk-management/
  45. Bcghendersoninstitute — https://bcghendersoninstitute.com/how-big-things-get-done-with-bent-flyvbjerg/
  46. Amazon — https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512
  47. Towards Data Science — https://towardsdatascience.com/the-iron-law-of-megaprojects-18b886590f0b/
  48. Medium — https://medium.com/data-science/the-iron-law-of-megaprojects-18b886590f0b
  49. Cato Institute — https://www.cato.org/policy-report/january/february-2017/megaprojects-over-budget-over-time-over-over
  50. Independent Institute — https://www.independent.org/tir/2023-fall/how-big-things-get-done/
  51. HowToes — https://howtoes.blog/2025/07/04/how-big-things-get-done-complete-book-summary-all-key-ideas/
  52. Penguin Random House — https://sites.prh.com/how-big-things-get-done-book
  53. arXiv — https://arxiv.org/pdf/1302.3642
  54. arXiv + 4 — https://arxiv.org/pdf/1710.09419
  55. PMI — https://www.pmi.org/learning/library/accuracy-hybrid-reference-class-forecasting-6456
  56. Project Management Institute — https://www.pmi.org/-/media/pmi/documents/public/pdf/research/research-summaries/flyvbjerg_megaprojects.pdf
  57. IPA — https://www.ipaglobal.com/team/edward-merrow/
  58. IPA — https://www.ipaglobal.com/resources/books/industrial-megaprojects-concepts-strategies-and-practices-for-success/
  59. IPA — https://www.ipaglobal.com/news/article/edward-merrow-reveals-why-megaprojects-fail-in-project-manager-magazine-cover-story/
  60. Society of Petroleum Engineers — https://www.spe.org/media/filer_public/de/15/de15f740-fa58-4ca9-9383-ff54030f990f/153695.pdf
  61. ResearchGate — https://www.researchgate.net/publication/254520430_Oil_Industry_Megaprojects_Our_Recent_Track_Record
  62. IPA — https://www.ipaglobal.com/news/article/weak-project-systems-imperil-next-generation-nuclear-projects/
  63. Grokipedia + 2 — https://grokipedia.com/page/Front-end_loading
  64. Wikipedia — https://en.wikipedia.org/wiki/Proven_reserves
  65. Bassexp — https://www.bassexp.com/glossary-of-oil-and-gas-terms/probabilistic-reserves-p90-p50-p10
  66. HandWiki + 2 — https://handwiki.org/wiki/Physics:WASH-1400
  67. NRC — https://www.nrc.gov/docs/ml1303/ml13031a707.pdf
  68. Wikipedia — https://en.wikipedia.org/wiki/Crew_resource_management
  69. SKYbrary Aviation Safety — https://skybrary.aero/articles/threat-and-error-management-tem
  70. Public Health Wales + 2 — https://phw.nhs.wales/services-and-teams/improvement-cymru/improvement-cymru-academy1/resource-library/academy-toolkit-guides/bow-tie-safety-model-toolkit/
  71. Nasa — https://nodis3.gsfc.nasa.gov/displayCA.cfm?Internal_ID=N_PR_7123_0001_&page_name=AppendixG
  72. Nasa — https://nodis3.gsfc.nasa.gov/displayCA.cfm?Internal_ID=N_PR_7123_001A_&page_name=AppendixG
  73. Cognitect + 2 — https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions
  74. cognitect — https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions.html
  75. Thoughtworks — https://www.thoughtworks.com/en-us/radar/techniques/lightweight-architecture-decision-records
  76. GitHub — https://github.com/rust-lang/rfcs
  77. Construx — https://www.construx.com/books/the-cone-of-uncertainty/
  78. Valueflowsolutions — https://resources.valueflowsolutions.co.uk/lean-concepts-for-organisations/the-cone-of-uncertainty
  79. Modern Analyst — https://www.modernanalyst.com/Careers/InterviewQuestions/tabid/128/ID/2324/What-is-the-Cone-of-Uncertainty.aspx
  80. Excitant — https://www.excitant.co.uk/do-9-out-of-10-strategies-fail/
  81. PubMed — https://pubmed.ncbi.nlm.nih.gov/16028817/
  82. ResearchGate — https://www.researchgate.net/publication/289998893_Why_Strategy_Execution_Unravels-and_What_to_Do_About_It
  83. Harvard Business Review — https://store.hbr.org/product/why-strategy-execution-unravels-and-what-to-do-about-it/R1503C
  84. PubMed + 2 — https://pubmed.ncbi.nlm.nih.gov/17432155/
  85. Amazon — https://www.amazon.com/Simple-Rules-Thrive-Complex-World/dp/0544705203
  86. CBS News — https://www.cbsnews.com/news/turning-great-strategy-into-great-performance-09-06-2008/
  87. Amazon — https://www.amazon.com/Playing-Win-Strategy-Really-Works/dp/142218739X
  88. Rogerlmartin — https://rogerlmartin.com/archive/articles
  89. LinkedIn — https://www.linkedin.com/pulse/playing-win-ag-lafley-roger-l-martin-juan-carlos-zambrano
  90. Harvard Business Review — https://hbr.org/2014/01/the-big-lie-of-strategic-planning
  91. Harvard Business Review — https://hbr.org/product/the-big-lie-of-strategic-planning/R1401F-PDF-ENG
  92. Consultant's Mind — https://www.consultantsmind.com/2020/04/12/hbr-big-lie-of-strategic-planning/
  93. Todopmp — https://todopmp.com/wp-content/uploads/2018/12/The-Big-Lie-of-Strategic-Planning-1.pdf
  94. Alex Murrell — https://www.alexmurrell.co.uk/summaries/richard-rumelt-good-strategy-bad-strategy
  95. Aydoo Services — https://aydoo.services/en/articles/good-strategy-bad-strategy/
  96. LinkedIn + 2 — https://www.linkedin.com/pulse/good-strategy-bad-richard-rumelt-juan-carlos-zambrano?trk=pulse-article
  97. The Right Questions + 2 — https://therightquestions.co/book-review-of-a-top-book-on-strategy/
  98. ScienceDirect — https://www.sciencedirect.com/science/article/abs/pii/S0065260110430014
  99. Grokipedia — https://grokipedia.com/page/Planning_fallacy
  100. Washington Monthly — https://washingtonmonthly.com/2011/04/26/the-information-sage/
  101. Thefullwiki — http://www.thefullwiki.org/Edward_Tufte
  102. Wiley Online Library + 2 — https://onlinelibrary.wiley.com/doi/10.1111/j.1467-6494.1972.tb00651.x
  103. CiteSeerX — https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=3ce1a6507833f3c0facc69cf3743cc2d1988ac19
  104. ResearchGate — https://www.researchgate.net/publication/229527091_Being_Polite_and_Keeping_MUM_How_Bad_News_is_Communicated_in_Organizational_Hierarchies1
  105. XploreAgile + 4 — https://www.xploreagile.com/when-green-means-red-the-risks-of-watermelon-status-reporting/
  106. Substack — https://cutlefish.substack.com/p/tbm-350-connecting-dots
  107. Medium — https://cwodtke.medium.com/you-cant-handle-okrs-5465cf161e81
  108. Taxpayers for Common Sense — https://www.taxpayer.net/transportation-infrastructure/big-dig-billions-over-budget/
  109. Taxpayers for Common Sense — https://www.taxpayer.net/national-security/bostons-big-dig-will-it-be-back-for-a-special-earmark/
  110. NBC News — https://www.nbcnews.com/id/wbna22394932
  111. Wikipedia — https://en.wikipedia.org/wiki/Big_Dig
  112. CBS News — https://www.cbsnews.com/boston/news/new-estimate-puts-rising-big-dig-costs-at-24-3-million/
  113. Boston.com — https://www.boston.com/uncategorized/noprimarytagmatch/2012/07/10/true-cost-of-big-dig-exceeds-24-billion-with-interest-officials-determine/
  114. Interesting Engineering — https://interestingengineering.com/lists/7-big-facts-about-the-big-dig
  115. CIO + 7 — https://www.cio.com/article/264430/budget-the-money-pit-could-it-have-prevented-budget-overruns-in-boston-s-big-dig.html
  116. Harvard Digital Data Design Institute — https://d3.harvard.edu/platform-rctom/submission/the-failed-launch-of-www-healthcare-gov/
  117. Wikipedia — https://en.wikipedia.org/wiki/HealthCare.gov
  118. Government Executive — https://www.govexec.com/management/2016/02/poor-leadership-derailed-obamacare-rollout-not-technology/126146/
  119. U.S. GAO + 4 — https://www.gao.gov/products/gao-14-694
  120. Time — https://time.com/magazine/us/10209/march-10th-2014-vol-183-no-9-u-s/
  121. 4sight Health — https://www.4sighthealth.com/healthcare-govs-death-defying-2013-launch-implications-govt-led-payment-reform/
  122. Medium + 2 — https://medium.com/@bishr_tabbaa/small-is-beautiful-the-launch-failure-of-healthcare-gov-5e60f20eb967
  123. PubMed Central — https://pmc.ncbi.nlm.nih.gov/articles/PMC3206716/
  124. Computer Weekly — https://www.computerweekly.com/news/2240205626/MPs-brand-NHS-National-Programme-for-IT-a-fiasco-as-posthumous-costs-rise
  125. Wikipedia + 2 — https://en.wikipedia.org/wiki/NHS_Connecting_for_Health
  126. UK Parliament — https://committees.parliament.uk/committee/127/public-accounts-committee/news/181704/dismantled-national-programme-for-it-in-nhs-report-published/
  127. University of Cambridge + 4 — https://www.cl.cam.ac.uk/archive/rja14/Papers/npfit-mpp-2014-case-history.pdf
  128. Wikipedia — https://en.wikipedia.org/wiki/Sentinel_(FBI
  129. Blogger — https://liberalengland.blogspot.com/2024/04/karl-popper-post-office-horizon-it.html
  130. American Enterprise Institute — https://www.aei.org/research-products/report/beyond-the-moonshot-apollos-hidden-lessons-for-managing-complex-technological-projects/
  131. NASA Technical Reports Server — https://ntrs.nasa.gov/api/citations/20190002249/downloads/20190002249.pdf
  132. Substack — https://thecogitatingceviche.substack.com/p/spacexs-revolutionary-development
  133. Futureblind — https://futureblind.com/p/take-the-iterative-path
  134. Ti — https://www.ti.org/pdfs/IronLawofMegaprojects.pdf
  135. ResearchGate — https://www.researchgate.net/publication/299393235_Introduction_The_Iron_Law_of_Megaproject_Management
  136. AUSA — https://www.ausa.org/sites/default/files/Baker_0609.pdf
  137. projectcubicle — https://www.projectcubicle.com/contingency-reserve-management-reserve/
  138. Dee Project Manager — https://deeprojectmanager.com/contingency-reserves-vs-management-reserves/
  139. IPA — https://www.ipaglobal.com/news/article/what-is-front-end-loading-fel-in-project-management/

Commissioned from our research desk. Subject to final editorial discretion.

The overlooked failure mode where technical strategy documents optimize for internal legibility—clean diagrams, neat swim lanes, phased roadmaps—instead of capturing the actual messy constraints that will determine success or failure. Explore how the act of making a strategy 'presentable' strips out the conditional logic and risk factors that matter most. Research how military and infrastructure planning disciplines handle uncertainty documentation differently. The takeaway is that a strategy deck that looks clean probably isn't honest enough to be useful.