The Last Mile in Legal Has Its Own Geography

McKinsey published the data. HBR named the gap. Both are right. Both are general. Here is what the terrain looks like inside a law firm.

May 14, 2026

The McKinsey numbers do not need a second read to be sobering. The first read does the job.

The State of Organizations 2026 surveyed more than ten thousand senior leaders across fifteen countries and sixteen industries. Eighty-eight percent of organizations are deploying AI in at least parts of their business. Fewer than twenty percent see significant impact on the bottom line. Eighty-six percent of leaders say they are not ready to embed AI into day-to-day operations. One in six organizations has no clear C-level owner for AI adoption at all.

HBR’s “last mile” diagnosis from a closed-door Harvard summit fills in the texture McKinsey’s spreadsheet cannot. The summit included a global investment bank with more than two hundred and fifty LLM-connected applications in production, a payments network where copilot adoption sits above ninety-nine percent of employees, and an apparel group running eighteen thousand automated finance processes. In every one of those firms, finance teams hunt for measurable impact in headcount and cycle-time numbers and come up empty. The work is happening. The value is not landing.

Andrei Savine pulled both pieces together in March: enterprise AI dies in the last mile because organizations under-funded the operational layer that turns model output into business value. He calls it the Production Layer. He cites an executive in the McKinsey report who put the corrective ratio at five to one — for every dollar spent on technology, five should be spent on people, process, and verification.

That diagnosis is correct.

That diagnosis is also a tourist map.

Inside the firm, the disconnect is on the calendar. A partner explains why AI will not materially change the way they practice. The IT department, meanwhile, is months into a slow restructure built on the opposite assumption. Both positions are held with conviction. Neither side can yet describe the outcome on the other side of the change.

The general diagnosis works at the macro level because it abstracts away the structural features of the firms it describes. Software companies can absorb productivity gains by reducing headcount. They have CEOs who can decree adoption. Their work product is a piece of software, not a lawyer-hour. None of that is true in a law firm.

This piece is a map of what the last mile actually looks like inside one. The hazards in the terrain. The path that does not end in either a mass layoff or a restructuring announcement masquerading as strategy. The question is not whether five-to-one is the right ratio. It is whether law firms know what the five is supposed to build.

Why the General Diagnosis Underestimates Legal

The five features that make a law firm break the standard model are not exotic. They are the structure of the business. Anyone who has worked inside a firm knows them. Anyone trying to apply a generic enterprise-AI playbook to a firm runs into them within the first quarter.

Start with the billable hour. The standard productivity gain from AI is a productivity gain. In a firm billing by the hour, the same gain is a revenue cut unless something changes upstream. Two software companies have absorbed their productivity gains by reducing headcount in the last few months — roughly 1,600 people at Atlassian, and a 2,000-person restructure at WiseTech shortly before. Law firms cannot run that play. The work product is the lawyer-hour itself. Wolters Kluwer’s 2026 Future Ready Lawyer survey of 810 lawyers found 54% expect firms to use AI efficiency for more clients or competitive pricing; the 2024 edition projected that AI automation could reduce hourly billing per lawyer by roughly $27,000 a year. Those numbers cut against revenue, not toward it. The inversion is what makes generic AI advice land badly in partner meetings: efficiency arguments are heard as compensation arguments.

Beneath the billable hour sits the apprenticeship. Junior associates learn judgment by doing the work AI does best — first-pass document review, citation checks, due diligence summaries, contract redlining. The training pipeline runs through the same activities AI is now positioned to consume. Automate the work without redesigning the apprenticeship, and the firm produces senior associates whose calibration never developed. The Citi Hildebrandt 2026 Client Advisory reports revenue growth of 11.3% across surveyed firms, productivity per lawyer down 0.6%, and 88% of firms planning continued associate growth — built on the assumption that the apprenticeship pipeline still works. The assumption may not hold.

Governance compounds the problem. McKinsey’s number — one in six organizations without a clear C-level AI owner — understates the legal case. In an equity partnership, even a C-level owner cannot decree adoption. Decisions move through practice-group chairs, executive committees, and partner votes. The ILTA 2025 Technology Survey of 580 firms found that user resistance is the top barrier to AI adoption at 57%, up from 54% the year prior; half of those firms have no formal AI policy at all. The gap is not a failure of legal IT. It is a structural feature of how authority is held and exercised in a partnership.

Privilege and conflicts sit beneath the policy gap. Technical friction in legal includes constraints other industries do not face. Solicitor-client privilege. Conflict-of-interest screens. Data residency for regulated client matters. Prohibitions on third-party model training. Ethical-wall enforcement that must survive at the model and the prompt level. ABA Formal Opinion 512 and the Federation of Law Societies of Canada Model Code both treat AI use as a competence question, a confidentiality question, and a supervision question simultaneously. The provincial regulators — Ontario, British Columbia, Alberta, and the Barreau du Québec — have all issued guidance that constrains which tools are usable for which matters. None of this is a security checklist. It is a substantive constraint on adoption.

Last comes the asymmetry. The price of being wrong exceeds the benefit of being right. Mata v. Avianca made the failure mode public in 2023. Damien Charlotin’s AI Hallucination Cases Database has been logging the failure mode ever since, at the rate of dozens of new entries per month from courts in more than a dozen countries. The downside of an AI-fabricated citation is career-ending. The upside of an AI-generated brief is incremental. When the loss function is asymmetric, rational professionals default to avoidance — which produces exactly the shallow utilization HBR’s last-mile diagnosis describes.

Generic diagnoses do not survive these five features. A different map is needed.

The Map: Five Hazards in the Terrain

These hazards are not theoretical. They are already arriving.

Hazard 1 — The Apprenticeship Gap

The work AI handles best is the work juniors learn from. Document review. Due diligence summaries. Citation checks. Contract markup. Strip those activities out without replacement and the firm is running an apprenticeship system whose training data has been deleted.

The signal does not appear in the first year. It appears two and three years in, when an associate who learned to draft alongside an LLM is asked to negotiate a non-standard term in a deposition prep meeting and cannot triangulate why the standard term existed. The output looks fine. The pattern-matching against past work — the thing that turns a third-year into a fourth-year — has not happened.

The hazard sits at the intersection of two operating questions. The decomposition question asks what the junior was actually doing inside the document review. Some of it was the review itself. Some of it was learning what a representation looks like when it is doing real work in a credit agreement, learning which questions to ask before flagging an issue to a senior, learning the difference between a problem and an artifact of the drafting party’s preferences. The collaboration-quality question asks what the junior was learning by doing the work, not what they were producing. If only the production is automated, the learning evaporates with it.

The Citi Hildebrandt advisory caught the contradiction. Firms are still planning associate growth on the old assumption — that the same training pipeline still works — while productivity per lawyer ticked down. The senior-heavy staffing model some firms are leaning into assumes juniors continue to develop into seniors capable of senior work. The pipeline that produced that development is being quietly automated. The hazard arrives on the promotion clock. Juniors will keep being hired; the calibration that used to develop alongside them will not appear on schedule, and the firm will not notice until the class it tries to promote is already on the partnership runway.

Hazard 2 — The Pricing Collision

The hourly model inverts under AI. The same brief that took twenty hours last year takes seven this year. The client knows. The client has run the spreadsheet.

The signal arrives in the form of an RFP that asks specific questions. Which tools does the firm use? On which matter types? What productivity assumption is baked into the proposed rates? The 2025 ACC Chief Legal Officers Survey, summarized in subsequent analysis, found that 59% of general counsel see no clear cost savings from outside counsel using AI. A transparency gap, the report called it. That phrase will appear in a procurement deck within twelve months.

Pricing models that align with the new reality already exist. Fixed fees. Capped fees. Success fees. Capacity-based arrangements. AI-augmented hourly with a productivity discount. The menu is not new. What is new is the pressure to choose, and the pressure to choose visibly. Thomson Reuters’ 2025 Future of Professionals projects up to 240 hours per professional saved per year by AI-augmented workflows; the 2024 edition projected 12 hours per week saved by 2029 and roughly $100,000 in additional billable hours per U.S. lawyer. Those numbers are useful to a firm that has redesigned its pricing to capture them. They are punitive to a firm that has not.

This hazard is the one most likely to drive the same failure mode inside legal. A firm that does not redesign pricing and roles will eventually balance the books with headcount. The redesign is uncomfortable. The alternative is worse. The choice is whether to lead the pricing conversation or inherit one shaped by clients who started running their own AI math two budget cycles ago.

Hazard 3 — The Judgment Hollowing

Apprenticeship is about who comes next. Judgment hollowing is about who is already in the chair.

Senior judgment was built on years of doing junior work. Take that work away through AI mediation — not removal, mediation — and judgment does not develop the same way. The hazard appears in two opposing modes. One is overtrust: lawyers who accept AI output uncritically because the immersion that produced calibration has thinned out. The other is blanket rejection: lawyers who refuse to engage with AI assistance because the output feels uncalibrated even when it is sound. The first failure ships errors. The second ships sluggishness. Both come from the same root.

This is what skill-formation research in adjacent professions has been pointing at. When the practice that produced calibration is mediated, calibration drifts. Output calibration and signal discrimination — the two senior capacities that distinguish a partner from a senior associate — degrade in opposite directions when the underlying practice that built them disappears. The senior who used to read a memo and feel the wrong sentence is still reading the memo. The reading is faster. The feel is duller.

The signal is not visible on a dashboard. It is visible in review cycles. A senior who used to spot the problem in the first pass now spots it in the third. A junior whose draft used to be rebuilt is now approved with light edits — not because the draft is better but because the senior’s bar moved. The work product reads acceptable in both cases. The institutional capability does not develop. The hazard is the part of the iceberg the engagement letter cannot describe and the matter close-out cannot bill.

Hazard 4 — The Partnership Fracture

Adoption splits the equity table. Litigation moves on one tool while M&A holds out. IP runs a different stack than tax. Practice groups operate as small businesses inside the firm; their adoption velocities can diverge by a factor of five within the same year, and the divergence shows up in realization rates before it shows up in compensation discussions.

The ILTA 2025 survey captured the inflection point. Eighty percent of firms reported using or exploring generative AI. Half reported no formal AI policy. Those two numbers describe a partnership in which adoption is happening faster than governance, and where the governance vacuum is being filled at the group level by whoever decided to act first.

The signal is not the adoption rate. The signal is the compensation tension that follows. Two practice groups that bill different realization rates against similar matter types invite a comp-committee conversation no managing partner enjoys. Recruitment messaging that promises one thing across a firm where the lived experience varies by group invites a different conversation with laterals. The fracture is not technological. The fracture is political, and it arrives on the partnership’s calendar approximately one fiscal year after the first group commits to its first material AI deployment.

Inside the firm, the divergence is already concrete. One practice group, pressed by a client to deliver AI-driven efficiencies, has mandated tool use and is rebuilding workflows accordingly. Other groups have not opened the conversation. All of them sit inside the same partnership. The comp committee will eventually have to reconcile what the calendar already shows.

The leadership challenge in this hazard is velocity matching, not adoption. Getting compatible adoption velocities across groups is a different problem from getting any adoption at all, and it is not solved by the same instruments.

Hazard 5 — The Asymmetric Stakes

The price of being wrong exceeds the benefit of being right.

Mata v. Avianca was the warning shot — a Southern District judge, a $5,000 sanction, and the first widely circulated story of fabricated citations submitted to a federal court. Charlotin’s database has been the running count ever since: well over a thousand decisions tracked across more than a dozen countries, with new entries arriving at the pace of dozens per month. Johnson v. Dunn, decided in the Northern District of Alabama in July 2025, extended the line into BigLaw. A practice-group co-leader at a large, well-regarded firm signed a motion containing fabricated citations. The court signaled that monetary sanctions are no longer sufficient to deter AI-generated errors, and that future cases may see referrals to bar counsel and other escalations.

The signal is not the headline cases. The signal is the malpractice underwriting questionnaire. Carriers have started asking which tools a firm uses, for which matter types, with which verification process. Some are writing AI-specific exclusions. The Federal Court of Canada has issued a notice to the parties and the profession on AI in court proceedings; provincial superior courts have followed; U.S. district courts have done the same on a docket-by-docket basis. Disclosure of AI use is no longer optional in many jurisdictions, and in some it must include disclosure of how the AI was used.

The hazard lives in the loss function. As long as one fabricated citation costs more than ten useful drafts save, the rational play is to avoid the tool — and that avoidance is what produces the shallow utilization the survey data keeps measuring. The accountability avoidance pattern McKinsey identifies is rational behavior in an asymmetric environment. Resolution is not the elimination of risk. Resolution is clarity about who is responsible for which verification, at which step, with what record.

Five hazards. None of them appear on the McKinsey or HBR map. The path through them is what the next section describes.

The Path: What Implementation Actually Looks Like

Six moves. Each one resolves a hazard. None of them resolve all the hazards. The work has to be done in sequence and in combination.

Move 1 — Start with measurable operations before client-facing work

Knowledge management. Marketing. Finance. IT. Recruiting. These are the practice areas of the firm where measurement is possible, where the stakes are bounded, and where the team can build the muscle for measurement before pointing AI at client matters.

The reason this comes first is not theoretical. A firm that cannot measure value in its own back office has no business claiming to measure value in client work. Most firms skip the step anyway, because the political appeal of a client-facing pilot is too strong. There is a partner who wants it. There is a vendor who will demo it. There is a press release in the budget. The internal pilot is less photogenic. It is also where the failure modes show up cheaply.

What measurable internal deployments produce is more valuable than the deployments themselves. Real numbers. Real failure modes. Real governance precedents. The early adopters of internal AI become the people who teach the practice groups, because they have already had the embarrassing first conversation with privacy, the awkward second conversation with risk, and the corrective third conversation with finance.

A firm that has not staffed its internal AI team before standing up its client-facing AI team is not running an enterprise AI program. It is running a procurement exercise with extra steps. The move that says “we will start where it matters most” sounds bold and usually fails. The starting point that matters most is the one where mistakes cost the least.

Move 2 — Run the Task Audit at the practice-group level, not the firm level

A litigation matter is not an M&A transaction is not a regulatory filing is not an IP prosecution. The work is structurally different. The AI fit is structurally different. The realistic adoption pace is structurally different. Firm-wide rollouts produce shallow utilization because they ignore those differences and try to deploy one set of tools, with one set of metrics, across groups that need different things.

What the federated version looks like is unglamorous. Each practice group decomposes its own work into the activities that compose a matter — research, drafting, review, analysis, communication, project management. Each group runs its own audit of where AI fits, where it does not, and where the answer is uncertain. Litigation may find AI fits document review and timeline construction; M&A may find AI fits diligence summaries and disclosure schedule drafts; tax may find narrow but high-value uses around regulatory text comparison. Each group owns its own map.

This is what the HBR prescription — redesign roles, budgets, and processes — looks like when the redesign is translated into a firm. Not a firm-wide redesign. A federated one, in which the firm-level work is to set guardrails and standardize the verification, and the group-level work is to choose the use cases.

The Task Audit is the artifact. The honest version takes a quarter per group and produces something a partner can defend in a comp committee. The dishonest version takes a week, looks like a deck, and ages badly.

Move 3 — A single accountable owner per practice group, not a committee

Committees defer. Owners decide. McKinsey’s number — one in six organizations without a clear C-level AI owner — is worse in legal, because even where an owner exists at the firm level, the partnership structure dilutes accountability. The committee is the partnership’s default response to a contested decision, and AI adoption is a contested decision.

What works is a named partner in each practice group who owns AI decisions for that group. Not a committee chair. Not a project sponsor. Not a steering-group member. An owner — someone whose performance review includes a line item for the group’s AI capability development, and whose decisions do not require a partnership-wide vote to take effect inside the group. The owner reports up to a chief AI officer or equivalent at the firm level, but the accountability lives at the group level because that is where the work lives.

This configuration is unfashionable in firms that prefer consensus. It is also the only configuration that produces decisions on the timeline AI requires. The alternative is a steering committee that meets monthly, defers two of the three decisions on the agenda, and ratifies the decision a partner already made between meetings.

The named-owner model has a useful side effect. Partners who own decisions become accountable for outcomes. Partners on committees do not. Functional AI capability over the next three years will sit with firms that staffed for ownership early, not with firms that staffed for governance theater.

Move 4 — Redesign the apprenticeship deliberately

The apprenticeship gap does not close by accident. It closes by design — or it does not close.

The redesign question is not how the firm automates junior work. The redesign question is what juniors do instead, such that they emerge with the judgment senior practice requires. The automation question is the easy one. The redesign question is where the actual capacity sits.

What the redesign looks like in practice is structural. A second-year associate who used to spend a third of their hours on first-pass review now spends a third of their hours on something else. The “something else” has to do for the second-year’s development what the first-pass review used to do. Possibilities exist. Supervised secondary review, where the associate critiques the AI’s output and learns to spot what a senior would spot. Structured client-interaction time, where the associate develops the judgment that is hardest to automate. Deliberate cross-practice exposure, where the associate sees how a deal partner and a litigator weight the same fact pattern differently. None of these is automatic. All of them require partner time, which is the scarcest resource in a firm.

The cost of the redesign is real. The cost of not redesigning is realized in five years, when the firm tries to promote a class of senior associates and discovers their calibration is not where it needs to be. The redesign work flows into Section 4, where role redefinition becomes the unit of analysis.

Move 5 — Address pricing before clients force the conversation

The general counsel office is going to ask. Better to have an answer than to be asked. Better to have proposed the answer than to be in defensive negotiation when the question arrives, because the question will not arrive politely.

Fixed fees. Success fees. Capacity arrangements. AI-augmented hourly with a productivity discount. The menu exists. The choice is which structure fits which matter type and which client. The choice is also which structure preserves margin while signaling that the firm has thought seriously about the productivity assumption clients are making.

Pricing is not a finance department problem. Pricing is a partnership problem, because the realization-rate consequences of every pricing experiment land on individual partner P&Ls. The right pattern is to run experiments inside specific practice groups, with specific clients, on specific matter types, and harvest the data before the pricing committee tries to set a firm-wide policy. Firm-wide policy on pricing arrives last, after the experiments.

This is uncomfortable. It is also where the strategic self-direction question gets answered. Firms that lead the pricing conversation define the market. Firms that wait inherit a market other firms defined. The Wolters Kluwer survey found 54% of lawyers expect firms to use AI efficiency for more clients or competitive pricing — the client side has already done the math. The question for the firm is whether the math gets done in the partnership’s frame or in the client’s frame.

Move 6 — Measure capability development, not hours saved

Hours saved is the metric that produced those software-industry cuts. It is the wrong metric in legal, because the answer to “what did we do with the saved hours” cannot be “we cut the people.” The model does not work if the people are gone. The whole apprenticeship hazard, the whole judgment-hollowing hazard, every reason a firm has a future at all is grounded in the people being there to develop into the next generation of senior practitioners.

What to measure instead lives at the capability layer. Practice-group adoption depth — how many lawyers in the group are using AI for substantive work, not for one-off email drafts. Output calibration scores against blinded review. Role evolution — whether roles are changing in the direction the firm intends, or drifting. Time reinvested — when a task that took ten hours takes three, where do the seven hours go, and what shows up at the end of the quarter that did not exist at the beginning.

The five-to-one ratio from the McKinsey advisor surfaces at this move. Five-to-one is not a budget rule. It is a measurement principle. If the dollar count on the people side is small, the measurement on the people side will be small, and the capability the people side is supposed to build will not develop. A firm that spends one dollar on tools and twenty cents on capability development is running a one-to-five program with extra slides, not a five-to-one program.

Role Redefinition Through Live Operating Questions

The Production Layer cannot be installed. It has to be developed.

Savine’s prescription assumes installation — build the layer, fund it five-to-one, install the agentic control plane, run the pipeline. Inside a firm, the layer is not infrastructure. The layer is a set of capabilities held inside a set of roles that are changing shape faster than HR can document them. The Production Layer in a law firm looks less like a new platform team and more like existing roles being re-aimed at capability development, with new measurement underneath.

The work is still early. That matters. If the last mile were mainly a deployment problem, the evidence inside a firm would appear as tool rollouts, usage charts, and completed implementation plans. The evidence looks different inside the work. It appears as roles changing shape before the metrics are settled, and as operating questions the organization has to learn how to answer before outcomes are clean enough to report. Here’s what my team is doing to build the Production Layer in the firm I work in, and what the operating questions look like in each case:

Conversation sophistication scoring. We are building scoring metrics across LLM conversations in the firm’s AI products. The unresolved question is not whether conversations can be scored — they can. The hard question is which scores should change what happens next. A metric that does not connect to an intervention becomes dashboard ornamentation. A useful metric tells the organization where users need better scaffolding, where a workflow is too vague, where output calibration is weak, or where a product is encouraging shallow prompting. Someone has to decide what better conversations mean, what action follows the signal, and who owns the intervention. That deciding is role design, not analytics.

Technology Training becomes Learning Enablement. We’re moving from technology training to learning enablement. The old mandate was organized around software instruction. The new mandate is organized around business outcomes, particularly where AI software changes the work itself. The work moves from showing people how to use the tool to helping groups develop the capacity to get a better business result from the tool — which includes adoption, workflow fit, practice-group context, measurement, and reinforcement after the formal training ends. The Production Layer in this case looks less like new infrastructure and more like an existing team with a redefined mandate and a different measurement underneath.

Merging Business Intelligence, AI Enablement, and Application Development. We are reconfiguring a programming leadership role into an Engineering Manager with a portfolio across BI, AI Enablement, and Applications. The team is small. The mandate is to create an AI-native development process — which is not the same as a faster ticket pipeline. Artifacts are cheap. Judgment and systems are the hard part. The role is not only about producing more code or more applications. It is about building a system where AI changes the development process without hollowing out architecture, review, accountability, or product judgment. The hazard inside the role is the same hazard inside the firm: surface productivity gain at the cost of underlying capability.

These are signs of the terrain, not success stories yet. In each case, the software is the easy part to name. The hard part is deciding what new capacity the organization needs, which role owns it, and how the firm will know whether that capacity is improving. The Production Layer thesis points at this. It cannot specify it. The specification has to come from inside the practice, inside the roles, and inside the unresolved questions that appear before outcomes are clean enough to report.

Closing

The map drawn here is real. The hazards are observable inside the work. The path is operational. None of that does the work.

Generic diagnoses produce generic responses. The McKinsey numbers will be quoted at every legal industry conference for the next eighteen months. The HBR last-mile language will become a slide-deck staple, then a panel topic, then a vendor positioning line. None of that will move a single firm forward, because none of it engages with the structural features that determine whether the firm has a future on the other side of the curve.

What moves a firm forward is the development work that happens at the practice-group level, in the pricing conversation, in the apprenticeship redesign, in the named partner who decides to own a decision instead of distributing it across a committee. The Production Layer is not a build. It is a development arc. The arc takes years and runs through individual roles whose mandates are changing faster than the org chart can document them.

The five-to-one ratio at the top of this piece is the right principle. It is not the right plan. The plan has to be built inside a partnership, against billable-hour math the math does not want to give back, inside a regulatory landscape that does not allow shortcuts, and with case law and malpractice carriers ratcheting up the cost of being wrong while the survey data ratchets up the expectation of efficiency. The work is not optional. The pace of the work is not negotiable. The map is the first artifact. The path is the second. Neither is the work.

AI does not make a law firm better. It exposes whether the firm was already doing the work.

Andrew Lewis was Here

Discussion about this post

Ready for more?