Friday, April 24, 2026

VBCC 20260424 Strategies from OCM to V2 to V3...

 

First-pass strategic take

Yes, EOM is meaningfully different from OCM, but not enough to be reassuring. It is better designed, narrower, more patient-centered, more equity-aware, and more digitally ambitious. But the early evidence still looks like OCM 2.0 rather than a true new operating system for value-based cancer care: modest payment reductions, no visible quality/utilization gain yet, and net losses once care-management and incentive payments are counted.

That is actually a strong panel premise: VBCC has not failed as an aspiration; it has failed because the measurement layer is still too weak, too claims-centric, and too poorly connected to what oncology patients and clinicians actually experience.


1. OCM vs EOM: what changed, and should we be reassured?

OCM: the disappointment

The final Brooks/JAMA paper is devastating in a quiet, technocratic way. OCM ran from 2016–2022, included more than 200 practices, covered about one-fourth of systemic cancer treatment in FFS Medicare, and produced a statistically significant but modest $616 reduction per episode, increasing to $1,282 in the final performance period. But there were no statistically significant differences in hospitalizations, ED visits, or quality, and after MEOS and performance payments, Medicare had an estimated $639 million net loss.

That says: OCM learned how to bend some spending, but not enough to pay for itself, and not with a measurable patient-quality signal. The Thomas/Ward critique anticipated part of this: OCM was not really a true bundle; it preserved FFS and layered on a complicated shared-savings/payment overlay, leaving practices partly paid to do more and partly rewarded for doing less.

EOM: the real improvements

EOM is not just a rename. It makes several real changes:

Design feature

OCM

EOM

Cancer scope

Broad systemic therapy episodes

Narrowed to seven cancer types

Hormonal therapy-only episodes

Included in OCM

Generally excluded; focuses on systemic chemotherapy, not hormonal-only

Risk

Shared savings; downside risk evolved

Downside risk from the start

MEOS

$160 PBPM

$110 PBPM, or $140 PBPM for dual eligibles

Equity

Less central

HRSN screening, equity plans, dual-eligible enhanced payment

Patient-reported outcomes

Not central

ePRO collection and monitoring required

Digital/data strategy

Less mature

Requires quality, clinical, and sociodemographic data; CEHRT and CQI use

Participation

>200 practices in OCM

As of March 2026, 28 practices and 1 commercial payer

CMS describes EOM as focused on seven cancers, six-month episodes, total-cost accountability, MEOS payments, required 24/7 access, navigation, evidence-based guidelines, comprehensive care plans, HRSN screening, ePROs, CQI data use, and certified EHR technology. (Centers for Medicare & Medicaid Services)

The non-reassuring part

The first EOM evaluation already has a familiar smell. CMS’s first at-a-glance report for Performance Period 1, July–December 2023, says EOM likely reduced payments but produced a net loss to Medicare after MEOS and incentive payments, and did not affect quality or utilization measures, including hospice use before death or acute care utilization. (Centers for Medicare & Medicaid Services)

That is the core answer: EOM is more sophisticated, but the early signal is still not a proof of VBCC. It is a better-engineered experiment, not yet a successful value model.


2. Are there new and better metrics in EOM?

Yes, but with a big caveat. EOM adds the right categories of measurement, but many are still not mature, outcome-forward, or digitally reliable enough to bear payment accountability.

The good news is that EOM formally moves beyond pure claims logic. It requires ePROs, HRSN screening, clinical data elements, sociodemographic data, and quality reporting. CMS’s participant resources now include an EOM Clinical Data Elements Guide, Sociodemographic Data Elements Guide, Quality Measures Guide, and cost/quality performance data files for Performance Periods 1–2. (Centers for Medicare & Medicaid Services)

Your prior measurement report frames this well: the plausible springboard is the convergence of EOM redesign requirements, digital quality measures, and oncology interoperability infrastructure such as USCDI+ Cancer and mCODE.

But the caveat is decisive: EOM has better measurement ingredients, not yet better measured outcomes. The model asks for ePROs and HRSN data, but the early evaluation did not show measurable improvement in acute care utilization or quality. (Centers for Medicare & Medicaid Services)

The most promising domains for real VBCC metrics are the ones you already identified:

ePRO symptom/toxicity control. This is probably the single strongest new metric domain because it connects directly to patient experience, acute care avoidance, and clinical response workflows. Your report proposes measuring both completion of standardized symptom assessments and timely response to severe symptom alerts.

Physical function preservation. More meaningful than “patient satisfaction,” and closer to what cancer patients care about: can I function, work, walk, sleep, eat, and live my life?

Avoidable acute care utilization. Still useful, but claims alone are crude. ED visits and admissions become more meaningful if paired with symptom-triggered preventability review or ePRO context.

Evidence-based regimen/pathway concordance. This could matter greatly, but it requires structured stage, biomarkers, treatment intent, and exception logic. Otherwise, it becomes a documentation game.

Goal-concordant end-of-life care. Claims can measure late chemotherapy and hospice timing, but the real outcome is whether care matched patient goals. Your report rightly pairs claims-based EOL metrics with structured goals-of-care documentation.

Financial toxicity. This is a missing VBCC domain. If cancer care bankrupts or destabilizes a patient, “value” has not been achieved. Your report proposes validated financial toxicity screening plus navigation response.

So the answer is: EOM points toward better metrics, but it has not yet demonstrated better outcomes. It is more promising as a measurement platform than as a proven payment model.


3. What should “V3” of CMS value-based oncology look like?

Your strongest forward-looking thesis is:

OCM tested whether care-management payments plus shared savings could make oncology cheaper. EOM tests whether a narrower, risk-bearing, equity-aware model with ePROs can do better. V3 should test whether digitally computable, patient-centered oncology outcomes can finally become the basis of value-based cancer care.

In other words, V3 should not be merely EOM with different benchmarks. It should be a model where measurement is the product.

V3 should have five design principles

First, V3 should make oncology clinically legible. It cannot rely mainly on claims. It needs structured diagnosis, stage, biomarkers, line of therapy, treatment intent, progression/recurrence, performance status, and death date. EOM’s clinical data element work and mCODE/FHIR alignment are early scaffolding, but V3 should make this the required substrate. Your report puts it neatly: oncology’s real clinical context is often buried in notes and cannot be reliably inferred from claims.

Second, V3 should use AI/NLP as measurement infrastructure, not as a magic wand. The panel topic is strongest if you say: AI review of EHRs may finally allow VBCC to measure what claims cannot—stage, progression, toxicity, treatment intent, ECOG-like function, adverse events, biomarker appropriateness, and goals-of-care discussions. But AI-derived metrics must be validated, audited, version-controlled, and bias-tested. Otherwise V3 becomes a digital façade: computable, impressive, and wrong.

Third, V3 should use a small core measure set. Something like 8–12 measures, not 50. Candidate domains: symptom control, functional preservation, avoidable acute care, evidence-based regimen appropriateness, time to treatment, EOL goal-concordance, financial toxicity, and equity/whole-person supports. That aligns with your prior report’s proposed VBCC portfolio.

Fourth, V3 should separate “drug price exposure” from “care delivery performance.” OCM and EOM struggle because oncology spending is dominated by therapies whose prices and clinical indications are not fully controlled by the practice. EOM’s early report says systemic cancer treatment drug spending accounts for about 58% of EOM episode costs, and participants reported focusing on drug spending interventions. (Centers for Medicare & Medicaid Services) V3 should distinguish: Did the practice choose appropriate therapy? Did it manage toxicity? Did it avoid preventable acute care? Did it align care with patient goals? It should not simply punish a practice because the correct therapy is expensive.

Fifth, V3 should make equity measurable without making safety-net care financially toxic to providers. EOM adds HRSN screening and dual-eligible enhanced MEOS payments, which is directionally right. But V3 should require equity stratification and closed-loop resource referral metrics while protecting practices that care for medically and socially complex populations. CMS describes EOM as requiring HRSN screening, equity plans, expenditure/utilization reports to identify disparities, and higher MEOS payments for dual eligibles.


A sharper conference thesis

Here is the central framing I would use:

“Value-based cancer care has been stuck in an awkward middle stage: payers can measure cost, but not value; clinicians can describe value, but not compute it; patients can feel value, but it rarely appears in payment models. OCM showed that care redesign can produce modest savings without measurable quality improvement. EOM improves the model by adding downside risk, ePROs, health-related social-needs screening, and richer data requirements, but early results still look financially and clinically inconclusive. The next version of VBCC will depend less on another tweak to shared savings and more on whether AI-enabled EHR review, ePROs, mCODE/USCDI+ Cancer, and digital quality measures can produce auditable, patient-centered, oncology-specific metrics at scale.”

That is a strong panel because it is neither naïvely optimistic nor drearily cynical.


Draft paragraph-long panel proposal

Are We Finally Ready for Real Value-Based Cancer Care? From OCM and EOM to AI-Enabled Measurement.
For more than a decade, value-based cancer care has promised to reward better outcomes rather than higher volume, yet progress on the ground has remained limited. The Oncology Care Model produced modest reductions in Medicare episode payments but no significant improvements in utilization or quality, and net losses to Medicare after model payments. Its successor, the Enhancing Oncology Model, adds important improvements—downside risk, narrower cancer scope, electronic patient-reported outcomes, health-related social-needs screening, enhanced services, and richer clinical data requirements—but early results remain inconclusive. This panel will ask whether the missing ingredient has been measurement itself: the ability to capture oncology stage, biomarkers, treatment intent, toxicity, function, goals of care, financial toxicity, and equity outcomes in computable, auditable form. We will explore whether AI-enabled EHR review, ePROs, mCODE/USCDI+ Cancer, and digital quality measures can support a true next-generation “V3” oncology value model—one that moves beyond claims-based cost control toward patient-centered, clinically meaningful cancer care performance.


Possible panel title options

Best straightforward title:
From OCM to EOM to V3: Can Better Measurement Finally Make Cancer Care Value-Based?

More provocative:
Value-Based Cancer Care’s Missing Operating System: Metrics, AI, and the Road Beyond EOM

Most “conference program” friendly:
The Next Generation of Value-Based Cancer Care: Lessons from OCM, Early EOM, and AI-Enabled Outcome Measurement

Most Bruce-style:
After OCM and EOM: Is Value-Based Cancer Care Still Waiting for Its Measurement System?


Suggested panel architecture

You as moderator/host should set up the tension: “We have had a decade of VBCC conferences, but the field still often means payer coverage plus ASP drug pricing. What would make it real?”

Ideal panelists:

  1. OCM evaluation author
    Gabriel Brooks would be excellent, because the JAMA paper is now the cleanest empirical anchor. Nancy Keating would also be very strong, especially for evaluation design and interpretation. The Brooks paper lists Brooks, Trombley, Landrum, Liu, Simon, and Keating among key authors, with Brooks, Trombley, and Keating drafting the manuscript.
  2. CMMI/EOM representative
    Someone who can speak to EOM’s design choices, first evaluation, clinical data elements, ePROs, HRSN screening, and what CMS hopes to learn by 2030. Even if they cannot speculate about “V3,” they can discuss what EOM is designed to test.
  3. Community oncology practice leader
    Preferably someone actually implementing EOM, ePROs, navigation, urgent care access, and “call us first” workflows. This prevents the panel from becoming a policy seminar in the clouds.
  4. Measurement/interoperability person
    Someone from ONC/ASTP, NCI’s USCDI+ Cancer effort, HL7/mCODE, NCQA, or a serious EHR/FHIR implementer. Their role: what needs to be standardized before oncology outcomes can be measured without heroic chart abstraction?
  5. Patient-centered outcomes / financial toxicity voice
    Ideally someone working on ePROs, symptom monitoring, financial toxicity, or patient advocacy. This keeps the panel honest: VBCC should not become “cost-based cancer care with nicer adjectives.”

My bottom-line answer to your three questions

OCM vs EOM: EOM is a real improvement in design, but not yet reassuring in results. It narrows the model, adds downside risk, requires ePROs and HRSN screening, and builds a richer data strategy. But early EOM still shows payment reductions offset by participant payments and no clear quality/utilization improvement.

Metrics in EOM: EOM has better metric ingredients—especially ePROs, HRSN screening, clinical data elements, sociodemographic data, and CQI reporting—but the decisive step from “data collection” to “valid outcome measurement” is still ahead.

V3 forecast: V3 should be an AI/digital-measurement-enabled oncology value model: clinically rich, patient-centered, equity-stratified, auditable, and built on structured oncology data plus validated AI extraction from EHR text. The goal is not simply to save money on chemotherapy episodes, but to measure whether cancer care actually improves symptoms, function, appropriateness, goal-concordance, financial well-being, and avoidable acute care—without drowning practices in manual reporting.

 

Thursday, April 23, 2026

Appendix S - A Full-on Rewrite by AI

I gave Chat GPT some drafts of  AMA CPT Appendix S for software services and asked it to make the best re-write it could, aiming to fix a number of problems.  (The copy below is its 2nd attempt to make this rewrite.)

###

 


Appendix S - ReWritten Anew, by Chat GPT

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

Appendix S describes the clinical role of software output in medical services and procedures. It does not define “artificial intelligence,” “machine learning,” or other general classes of software. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

A term from Appendix S should be used only when the software output is relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions and is pertinent to patient management in the clinical context described by the code. Software used only for administrative, financial, scheduling, workflow, or general productivity purposes does not fall within this taxonomy solely because it is used in health care.

The terms in Appendix S describe the software-enabled analytical role at issue. In some cases, that role may correspond closely to the reportable service. In other cases, it may describe only one component of a broader service. Accordingly, Appendix S does not by itself determine whether a service is separately reportable or require reclassification of an already coded physical service or device solely because software is used within it.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, when appropriate, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter is a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is a parameter. The key question is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically means that software derives its output without concurrent human involvement in the analytic process from input to output. Automatic derivation may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without deriving a distinct reportable parameter and without independently generating a clinical interpretation or conclusion.

Assistive software supports physician or QHP performance. It may improve accuracy, precision, consistency, efficiency of review, inter-observer variability, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

When the service is reported as a professional service, the output from assistive software requires physician or other QHP interpretation and report. Assistive software does not independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they serve to draw attention to clinically relevant data rather than provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. Where the assistive output does not itself alter the reportable clinical conclusion, technical or analytical validation showing benefit to clinical performance may be sufficient. Where the output implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

An augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management in the clinical context described by the code. Clinical meaningfulness may be supported by equivalence to a metric already used in current practice, consistency with clinical practice guidelines or accepted clinical frameworks, or evidence that a novel output has clinically relevant impact on patient management.

For augmentative services, evidence should show more than technical or analytical performance alone. It should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The related physician work may be described by the same code, captured in another code, or incorporated into a broader service, depending on code structure.

Note: The presence of an augmentative output within a broader coded service does not by itself require separate reporting of the software function or reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation is necessary but not sufficient for autonomous classification. A service is not autonomous merely because software operates automatically. It is autonomous when the software independently provides the clinically operative interpretation, conclusion, recommendation, or action.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The conclusion or recommendation is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service provides a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight occurs at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation, conclusion, recommendation, or action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to the claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed

 

#

This version tries to do five things better than the versions taken as input.

First, it adds an entry gate;

Second, it says clearly that the taxonomy classifies the software output function, not necessarily the whole product or service;

Third, it separates automatic from autonomous more crisply;

Fourth, it removes the awkward dependence of Autonomous on a prior Augmentative construct;

Fifth, it states expressly that embedded software does not automatically pull a broader physical service into a software-only coding framework. Those are all pressure points that emerged from the current draft and the related problems in implementation.

Can Appendix S Be Made into a Logic Flow Chart?

 At a 4 16 2026 webinar on Appendix S, AMA management said they may someday do "beta testing" with sample codes and see if they are classifiable clearly, and or make a logic flow chart (if possible) from Appendix S for classification by rules.

Here's what Chat GPT 5 made of the test on 4 23 2026, using the most recent clean copy of the appendix S revision proposal.

##

##

Yes — a provisional decision tree is possible, and your instinct is basically right: the cleanest place to start is usually Autonomous first, because that category is the most outcome-oriented and easiest to test at the top level: does the software itself generate a clinically meaningful interpretation/conclusion and then recommend or initiate management? If yes, you are in Autonomous-land; if not, you fall back to the Assistive/Augmentative split. That structure fits both the text of the clean Appendix S.

But the exercise also shows why the document is hard to operationalize. 

The taxonomy mixes several different axes at once: 

  • detection vs parameter generation vs interpretation; 
  • physician involvement vs machine action; 
  • output vs report; and 
  • evidentiary sufficiency for “clinical meaningfulness.” 
  • Your own blog puts this well: AMA is trying to classify several disjunct concepts simultaneously, and that makes clean bucket logic unstable.

Here is the best decision tree I can derive from the current proposal:


Proposed Appendix S decision tree

Step 0. Is there a software output relevant to clinical care?

Before using Appendix S at all, ask whether the software output is actually used in diagnosis, cure, mitigation, treatment, or prevention, and whether it is being characterized as part of a medical service/procedure rather than merely a productivity or workflow tool. The clean draft says the software output must be useful in clinical care, with the role of the output in patient management / medical management.

If no, Appendix S may not apply, or at least the service may not be codifiable in the contemplated way. A healthcare productivity tool is generally not codifiable in CPT.

If yes, continue.


Step 1. Does the software itself generate a clinically meaningful interpretation or conclusion?

This is the best opening gate for Autonomous. The clean text defines Autonomous as software that automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.

If yes, go to Step 1A.
If no, skip to Step 2.

Step 1A. Does the software merely recommend, or does it act?

This splits the three levels of Autonomous.

  • Level I Autonomous: software offers a definitive diagnostic or management recommendation, but physician/QHP judgment is required to implement or reject it.

  • Level II Autonomous: software initiates a medical management action, but there is a reasonable chance to negate/override before implementation.

  • Level III Autonomous: software automatically initiates management actions that continue unless the physician/QHP intervenes.

This is also where the protocol emphasizes the crucial distinction between automaticity and autonomy. A tool can be automatic in getting from input to output without being Autonomous in the Appendix S sense.  


Step 2. If not Autonomous: does the software output derive a new quantitative or categorical parameter distinct from the input?

This is the key gate for Augmentative. The clean draft says Augmentative exists when the output represents a quantitative or categorical parameter qualitatively different from the input, and more than adding/averaging/descriptive statistics.

If yes, go to Step 2A.
If no, go to Step 3.

Step 2A. Is the output clinically meaningful?

Under the proposal, Augmentative — and by extension Autonomous — hinges on “clinical meaningfulness.” The clean draft says the output must contribute to patient management and be supported by evidence beyond merely technical/analytical validation, beyond mere statistical significance, and directly pertinent to the code characteristics.

If yes, classify as Augmentative.
If no, the taxonomy becomes unstable: the text suggests you are not yet entitled to call it Augmentative, but it is also more than merely Assistive if it truly derives a new parameter. This is one of the places where the logic frays.


Step 3. If no new parameter is derived: does the software merely detect clinically relevant data, without interpretation or conclusions?

This is the cleanest definition of Assistive. The clean text says Assistive software detects clinically relevant data without deriving a parameter, generating an interpretation, or providing conclusions. It improves physician/QHP performance, while the primary service outcome may remain unchanged.

If yes, classify as Assistive.
If no, the taxonomy again becomes unstable, because there are outputs that may structure, summarize, triage, or otherwise transform information without fitting neatly into pure “detection” or pure “parameter generation.”  


The same logic as short yes/no questions

  1. Is the software output part of patient care / medical management rather than mere workflow?

  2. Does the software itself generate a clinically meaningful interpretation or conclusion?

    • If yes: Autonomous. Then ask whether it recommends only, acts with override, or acts until stopped.

  3. If not, does it derive a new quantitative or categorical parameter qualitatively different from the input?

    • If yes: likely Augmentative, assuming clinical meaningfulness is shown.

  4. If not, does it merely detect clinically relevant data, leaving interpretation/report to the physician/QHP?

    • If yes: Assistive.

  5. If none of the above fits cleanly, the taxonomy is under-specified for that use case.


Why the logic partly works

It works because the current draft does contain a rough hierarchy of outputs:

  • Assistive = detects clinically relevant data, no new parameter, no conclusion.

  • Augmentative = produces a distinct parameter, but not a definitive interpretation/conclusion.

  • Autonomous = goes further and independently interprets/concludes, then may recommend or initiate management.

That is why starting with Autonomous makes sense. It tests the most advanced function first, then backs down to parameter generation, then backs down to detection.


Why the logic also collapses

Here is where your skepticism is justified.

1. The tree is not built on a single dimension

The categories are not based on one clean axis. They mix:

  • output type,

  • evidentiary burden,

  • physician involvement,

  • whether the machine acts,

  • and whether the software is a full service or only a component of a service.

That is why the same service can feel “assistive” in one respect and “autonomous” in another unless you are very strict about what exactly is being classified.

2. “Output” versus “report” versus “service” is slippery

Does Appendix S classifies the whole service, the algorithmic analysis, or just the software output component.  The "Triple A" terms are adjectives of the algorithmic analysis/output, not necessarily of the entire service.  But this variance may introduce logical wobble.

3. “Clinically meaningful” is doing too much work

The clean draft makes clinical meaningfulness central for Augmentative and, by extension, Autonomous. However, this extension from Augmentative to Autonomous (of clinically meaningful) is implied..

That means your tree has to ask not only “what does the software do?” but also “has enough evidence been shown for me to call that output clinically meaningful?” That is not a pure logical branch; it is partly an evidentiary judgment.

4. Automatic is not Autonomous

This is important and non-intuitive. Software can run automatically from input to output and still not be Autonomous. That means a flow chart that begins with “Is it automatic?” would mislead users.

5. Borderline cases do not fit neatly

As the eligible services and codes expand, so will the gray zones:

  • detection plus notification/triage,

  • structuring or summarizing evidence rather than simple detection,

  • intraoperative or intraprocedural support,

  • tools embedded inside broader physical services.

Those are exactly the cases likely to “break” a decision tree.  They point to areas where the prose can be clarified and the logic tree improved thereafter.


Your hypothesis about path dependence

I think you are right that the logic is path dependent, and in two ways.

First, Autonomous-first is the cleaner routing logic, because Autonomous is partly defined by the presence of functions that exceed Augmentative: clinically meaningful interpretation/conclusion plus recommendation/initiation of management. If those are absent, you can step down to Augmentative and then Assistive.

Second, the answer can depend on which question is asked first. For example:

  • If you ask first, “Does it generate a parameter?”, you may classify something as Augmentative before noticing that it also independently concludes and initiates action, which would push it to Autonomous.

  • If you ask first, “Does it improve physician performance?”, almost everything starts sounding Assistive.

  • If you ask first, “Is there physician involvement?”, you can get misled by the automatic/autonomous distinction.

So yes: the same service can appear to move differently through the tree depending on the entry point, which is a sign the prose taxonomy is not yet logically orthogonal.


My bottom-line judgment

Today, a usable working flow chart is possible, but only as a triage tool, not as a fully determinate classifier.

The best working version is:

  1. Does the software independently generate a clinically meaningful interpretation/conclusion and recommend/initiate management?
    Autonomous.

  2. If not, does it generate a distinct quantitative/categorical parameter beyond descriptive statistics?
    Augmentative, if clinically meaningful.

  3. If not, does it merely detect clinically relevant data for physician interpretation/report?
    Assistive.

  4. If still unclear, the case exposes a drafting gap.

That would be illuminating precisely because it shows where the taxonomy works and where it does not. 

And if the flow chart repeatedly crashes on structured-evidence, triage, embedded-software, or mixed-function tools, that failure (of today's chart) would be as valuable as success — exactly the point you made.

The most important practical fix, in my view, would be to preface any flow chart with one explicit rule:

Classify the software output function being claimed for coding — not the mere presence of software, not the whole physical service unless that software function itself is the claimed service.”

 ###

I gave Chat GPT a couple recent Cat III codes to assess as a beta test.

###

+0764T / 0765T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, tied to concurrent or prior ECG use. CMS assigned OPPS/ASC payment status effective 2025 according to vendor coding guides and public reporting. 

0903T–0905T — algorithmically generated 12-lead ECG from a reduced-lead ECG, with separate codes for global/tracing/interpretation-only configurations.

###

Using only the code text you gave, plus the current Appendix S logic, I’d characterize them this way:

0764T / 0765T

Best fit: Assistive.
In fact, the current clean Appendix S table itself uses “Algorithmic electrocardiogram risk-based assessment for cardiac dysfunction (0764T, 0765T)” as the example under Assistive.

Why that is a bit surprising, but still the document’s answer:

  • Assistive is defined as software that detects clinically relevant data without deriving a parameter, interpretation, or conclusion, and it requires physician/QHP interpretation and report.
  • The drafters intentionally allow Assistive outputs to include terms like “likelihood of,” “suggestive of,” or “risk for.” 
  • Your code summary for 0764T/0765T says AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report. That lines up closely with the Assistive bucket as the draft currently describes it.

So, under the present proposal, 0764T/0765T = Assistive, even if a lot of readers might instinctively think “risk-based assessment” sounds more augmentative. The draft has effectively pre-decided that one example.

Autonomous Level 1/2/3?
No. Not autonomous. The code text expressly keeps the physician/QHP interpretation/report in the loop, and Autonomous in Appendix S requires the software to independently generate clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.
So there is no Level I/II/III assignment here.


0903T–0905T

Best fit from the code text alone: probably Augmentative — but this is much less certain.

My reasoning:

  • These codes describe an algorithmically generated 12-lead ECG from a reduced-lead ECG.
  • That sounds like more than mere detection. The software is not just flagging a feature; it is transforming limited input into a more elaborate clinically usable output.
  • Yet the code text, as you summarized it, does not sound like the software is independently issuing a diagnostic interpretation or management conclusion. So it does not sound Autonomous.
  • That leaves Augmentative as the closest fit, because Augmentative covers software whose output is a quantitative or categorical parameter qualitatively different from the input, and more than simple descriptive reporting, but without a definitive interpretation or conclusion.

That said, there is an awkwardness here: a reconstructed 12-lead tracing is not obviously the kind of “parameter” Appendix S talks about, like an index, score, classification, risk score, predictive/prognostic metric. That is why I say probably Augmentative, but not cleanly.

In plain English:

  • It is not Assistive, because it seems to do more than detect.
  • It is not Autonomous, because it does not appear to independently conclude/diagnose/manage.
  • So it lands closest to Augmentative, but as an example of how Appendix S fits some services awkwardly.

Autonomous Level 1/2/3?
Again, no. Nothing in the code text you gave suggests the software:

  • makes a definitive diagnosis recommendation,
  • initiates a management action with override,
  • or automatically initiates ongoing management.

So no autonomous level.


Bottom line

My classification from the code text alone would be:

  • 0764T / 0765T → Assistive
    • and not just by inference: the Appendix S draft’s own table uses them as the Assistive example.
  • 0903T–0905T → probably Augmentative
    • but with a real caveat that these codes expose a weakness in the Appendix S prose, because a reconstructed waveform is not a neat “score/index/classification” style output.
  • Neither appears Autonomous, so no Level I/II/III for either code family.

The truly interesting point is that 0903T–0905T may be exactly the sort of example that would stress-test the framework. It does something clearly more than detection, but not obviously what Appendix S writers have in mind when they talk about “clinically meaningful parameter generation.

###

STRESS TEST LOGIC WITH TEN EXAMPLES

###

Yes — here is a fire test with 12 real examples. The broad result is that the framework works cleanly for a handful of services whose descriptors already use words like augmentative or match the Appendix S example table, but it starts to wobble when a code describes reconstruction, mapping, indexing, or alerting rather than straightforward detection, parameter-generation, or independent conclusion/action. 

That is very much in line with the tensions you flagged in your blog.

I’d use this shorthand:

  • Clean fit = Appendix S handles it fairly well.

  • Wobble = the code can be placed, but only by inference or with some strain.

  • Major wobble = the code exposes a real drafting gap.

The fire test

1) 0764T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, performed with a concurrent ECG
Best fit: Assistive. Clean fit. The official Appendix S example table itself places 0764T/0765T under Assistive, and the descriptor retains physician/QHP interpretation/report rather than independent machine conclusion. (American Medical Association)

2) 0765T — same family, but based on a prior ECG
Best fit: Assistive. Clean fit. Same reasoning as 0764T. The code family is effectively the canonical Assistive ECG example. (American Medical Association)

3) 75580 — Noninvasive estimate of coronary fractional flow reserve (FFR) derived from augmentative software analysis of coronary CTA data, with interpretation/report
Best fit: Augmentative. Very clean fit. The descriptor itself uses the phrase “derived from augmentative software analysis,” and the Appendix S example table also places 75580 under Augmentative. (UHC Provider)

4) 92229 — Retinal imaging with point-of-care automated analysis and diagnostic report
Best fit: Autonomous, Level I. Very clean fit. The Appendix S example table places 92229 under Autonomous, so we can identify 92229 as the then-current Autonomous example. It produces an automated diagnostic report, but the code does not describe machine-initiated treatment, so it fits Level I rather than II or III. (American Medical Association)

5) 0903T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; with interpretation and report
Best fit: Probably Augmentative. Wobble. The algorithm transforms limited input into a richer output, which feels more like parameterized/derived output than mere detection, but the output is a reconstructed tracing rather than a classic “index, score, or classification.” It does not appear to independently generate a definitive conclusion, so it falls short of Autonomous. (AliveCor | AliveCor Home)

6) 0904T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; tracing only
Best fit: Probably Augmentative. Major wobble. This may be the most awkward ECG example. It is clearly more than Assistive detection, but it is only a tracing and not an interpretation. It seems most consistent with Augmentative, yet it reveals that Appendix S does not naturally handle “algorithmic waveform reconstruction” very elegantly. (AliveCor | AliveCor Home)

7) 0905T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; interpretation and report only
Best fit: Probably Augmentative. Wobble. Same problem as 0903T/0904T: the service depends on algorithmic reconstruction, but the descriptor does not say the software itself independently concludes/acts in the Autonomous sense. So I would still place it in Augmentative, albeit not comfortably. (AliveCor | AliveCor Home)

8) 1036T — Noninvasive hemodynamic assessment with pulmonary pressures and ejection fraction, including passive signal acquisition, augmentative algorithmic analysis, and generation of a clinical report with physician review/integration
Best fit: Augmentative. Very clean fit. The descriptor literally says “augmentative algorithmic analysis.” The physician still reviews, interprets, and clinically integrates the report, which keeps it out of Autonomous. (American Medical Association)

9) 1041T — Augmentative algorithmic analysis of EEG waveforms to identify source and propagation of epileptiform activity, with interpretation/report
Best fit: Augmentative. Very clean fit. Again, the descriptor expressly uses augmentative, and the physician/QHP interpretation/report remains part of the service. (American Medical Association)

10) 1039T — Connectomic analysis of previously performed multimodal brain MRI, requiring physician/QHP analysis of software- and physician-generated maps, with interpretation/report
Best fit: Probably Augmentative. Wobble. This looks like algorithmic production of clinically useful maps/structures rather than mere detection. But the descriptor emphasizes joint software-plus-physician analysis and physician interpretation/report, so it is not Autonomous. It fits Augmentative better than Assistive, but only because “map generation/integration” gets forced into the parameter/output bucket. (American Medical Association)

11) 1043T — Quantitative magnetic resonance, without imaging, for analysis of liver tissue, assessing one or more parameters, with automatically generated report; no further professional interpretation required
Best fit: Probably Augmentative, with an Autonomous argument. Major wobble. This is one of the best stress fractures in the framework. On one hand, it measures one or more parameters and generates a report, which sounds Augmentative. On the other hand, the guidance says it does not require further professional interpretation, which starts to sound Autonomous. But the descriptor does not clearly say the software independently renders a clinically meaningful interpretation/conclusion or initiates action. I would place it in Augmentative, but with a bright red asterisk. (American Medical Association)

12) 1052T — Interrogation/evaluation of an insertable subcutaneous heart failure decompensation monitor; output includes daily physiologic data, historical trends, and an algorithmically derived decompensation index; threshold crossing sends clinician alerts
Best fit: Probably Augmentative. Wobble. The algorithm produces a composite decompensation index, so this is more than Assistive detection. But the alerts go to the clinician, and the report is still reviewed/interpreted by the physician/QHP; the machine is not itself initiating care in the Level II or III sense. So this lands best in Augmentative. (American Medical Association)

What the fire test shows

A few examples are easy:

  • 0764T/0765T = Assistive

  • 75580 = Augmentative

  • 92229 = Autonomous Level I

  • 1036T/1041T = Augmentative
    Those are easy either because the descriptor itself uses the taxonomy word or because AMA’s own example table pre-places them. (American Medical Association)

But several examples expose the weak spots:

First, the framework handles classic outputs better than reconstructed artifacts. The 0903T–0905T family is not mere detection, but it is not obviously an “index, score, classification, risk score, predictive/prognostic metric” either. It is a generated tracing. That is not a natural fit for the prose. (AliveCor | AliveCor Home)

Second, the framework is shaky when the code describes parameter generation plus automated reporting but not obvious autonomous conclusion/action. That is why 1043T is so useful as a stress test. It may be exactly the kind of service where a flow chart would either collapse or reveal that the prose needs another branch. (American Medical Association)

Third, the framework struggles with alerts and triage-like behavior. 1052T uses a derived index and threshold-triggered alerts, yet the clinician remains in the loop. That makes it more than Assistive, but not truly Autonomous as Appendix S defines it. (American Medical Association)

My practical summary

If I had to sort these 12 today under the current revised Appendix S logic, I would say:

So the overall verdict is: Appendix S can classify a lot of existing codes, but not gracefully. It works best where the descriptor already contains the answer. It becomes much less stable when a service is about reconstruction, mapping, indexing, alerting, or automated reporting rather than straightforward detection or independent conclusion. That is exactly why your idea of “fire testing” with 20–30 examples is so strong. It is not just a nice exercise; it is a way to reveal where the taxonomy is genuinely underpowered.

I can turn this next into a one-page blog-ready table with three columns: code, my classification, why the framework wobbles or holds.

###

a pacemaker with automatic defibrillator monitors the ecg and if a sufficient major problem is detected, it automaticaly emits a cardioversion shock (no asking the doctor!). This sounds like autonomous software to me, and would it be placed here and put on the future CMAA code list? I don't think anyone would expect that, but why or why not? 'Tis a puzzlement!

###

Yes — conceptually, your ICD example sounds very much like Autonomous, and probably Level III under the current draft language. But coding-wise, I do not think that means it would migrate into a future CMAA bucket just because autonomous software is embedded inside the device. That is exactly the kind of paradox that shows where Appendix S logic and real-world CPT architecture can diverge.

Why it sounds Autonomous first. The clean draft says Autonomous software automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement, and Level III covers software that automatically initiates management actions and continues unless the physician intervenes. Your example — device monitors rhythm, detects a qualifying malignant arrhythmia, and fires a shock with no doctor asked — sounds much closer to machine-initiated therapeutic action than to Assistive or Augmentative. In the draft’s own terms, that is the most natural place to put it, and more specifically it sounds closer to Level III than Level I or II because the therapy is actually delivered, not merely recommended or queued for override.

But the second question is the real one: would that mean it should appear as a new CMAA-style software service? I think probably not, for a practical CPT reason. Implantable cardioverter-defibrillators are already recognized in Medicare/CPT as physical device procedures and device-management services. CMS materials describe an ICD as an implanted device that continuously monitors cardiac activity and delivers therapy when it detects life-threatening arrhythmias, and Medicare coding/payment infrastructure already exists for ICD implantation and device evaluation/programming rather than treating the sensing/shocking logic as a separately coded AI service. CMS and manufacturer reimbursement materials show established ICD procedure families and device-evaluation families, including implantation codes in the 33202–33275 range and device evaluation/programming codes such as 93282–93284, 93287, 93289, 93295, and 93296. (Medtronic)

So the clean answer is:

  • As a pure Appendix S thought experiment: yes, the device’s embedded decision-and-shock logic looks like Autonomous, probably Level III.

  • As a CPT coding reality: no, I would not expect the whole service to be pulled into CMAA merely because autonomous logic exists inside the implanted device. It is more naturally understood as part of the already-coded device therapy/service architecture. (Medtronic)

That is also very consistent with your own blog instinct that physical services should remain coded by the physical service component, and a folded-inside AI component should not automatically drag everything onto CMAA. You made that point about WGS, but the same structural logic applies even more strongly to an implanted therapeutic device.

There is also support for this from the webinar’s framing. The speakers repeatedly said Appendix S adjectives are meant to describe the algorithmic analysis/output, not necessarily to reclassify an entire broader service. They also said the terms can be used in many parts of CPT “wherever those would be used,” which implies a descriptor-level tool, not a command that every device containing autonomy must be uprooted into a software-only section.

So the puzzlement is real, but I think the resolution is:

Appendix S may classify the software function as Autonomous Level III, while CPT may still leave the billable service in the traditional implanted-device code family.

That is not actually inconsistent. It just means the taxonomy and the code placement are not the same question.

In fact, your ICD example is a very good stress fracture in the framework, because it exposes an unstated rule that probably needs to be said out loud:

“Appendix S can characterize a software function without requiring that the containing physical device/service be rehomed into CMAA.”

Without a rule like that, absurd results follow. A defibrillator, insulin pump, smart neurostimulator, or ventilator with closed-loop logic would all threaten to become “software codes,” which I agree no one actually expects. Your example therefore does not refute the taxonomy, but it strongly suggests the prose needs a boundary principle between:

  1. software as the service, and

  2. software embedded in an already-coded physical therapeutic device/service.

So my net answer is:

  • Three-A label: Autonomous

  • Level: probably Level III

  • Would it therefore move to CMAA? Probably no

  • Why not? Because the better coding principle is that the autonomous logic is intrinsic to the already-coded implanted therapeutic device service, not a separately reported software service. (Medtronic)

This is exactly the kind of example that would make a great paragraph in your blog, because it shows the taxonomy can have descriptive value while still needing a scope-limiting rule to prevent category mistakes.

###

It highlights that the Appendix S does not define what enters into it, does not define AI or ML, etc. 

 This is compatible with the current committee zeitgeist and groupthink but it may bite them in the butt later.

###

Yes — I think that is a fair criticism.

The current approach seems to be: do not define AI, do not define the outer boundary of what enters Appendix S, and instead focus on classifying the software output once a case is already in front of the committee. The clean draft says exactly that there is no single product or service for which the term “AI” is sufficient or necessary, and therefore “AI” is not defined in this taxonomy. It then moves immediately to classifying outputs as assistive, augmentative, or autonomous.

That may be politically convenient in the short run. It avoids fights over definitions, avoids having to say what is “really AI,” and gives the committee flexibility. AMA wants Appendix S as a tool to help panel reviewers judge the role of the output in patient management, not as a general theory of AI or a boundary-setting document.

But your concern is the stronger one: if you do not define the domain, then the taxonomy may become unstable at the edges. That is exactly why examples like ICDs, reconstructed ECGs, WGS-plus-AI interpretation, intraoperative guidance, or structured-evidence tools become so troublesome. The committee can say, case by case, “we know it when we see it,” but that works only until the edge cases pile up.

So I think your point is:

the non-definition strategy buys flexibility now, but at the cost of scope creep, inconsistency, and future surprises.

That is also very close to what you wrote in your blog: Appendix S expects precise downstream categorization while leaving cloudy the upstream question of what falls into the universe in the first place.

My own phrasing would be:

The current committee view is understandable: keep Appendix S functional, not philosophical. 

But refusing to define the entry gate may work only as long as the examples remain easy. As software-intensive services proliferate, the lack of a scope rule may become the document’s biggest weakness.

Or even more sharply:

A taxonomy without an entry criterion is not just incomplete; it risks becoming path-dependent, ad hoc, and vulnerable to exactly the odd results no one intended.

That feels like a strong blog line.

APPENDIX S - THE FINAL REWRITE BY AI

Excellent concise summary. Now, here's a task for you. i reattach the clean Appendix S version b for the upcoming CPT meeting. You've seen numerous collateral documents and done much thinking and logic yourself. I want you to make a substantial rewrite of Appendix S. You don't need to follow the template paragraph by paragraph, take all you know now and produce "an" Appendix S, but freed of confusing things that don't work and amplified and improved by some of the changes we've seen this afternoon. of course, the general tone remains AMA CPT poliicy style. The macro structure has to stay the same - Assistive, Augmentative, Autonomous, although we now appreciate the problems caused by collapses six or seven axes of variation onto one axis of 3 points. do everything you can to improve Appendix S. If you need to delete paragraphs and insert new paragraphs, and so on, you have permission to do so. Be the best policy writer (and best thinker!) you can be.

###

Below is a fresh rewrite of Appendix S based on the current clean Option B text, the webinar’s stated goals, and the fault lines we identified: the need for an entry gate, clearer distinction between software output and the whole service, less circular reliance on “augmentative” inside “autonomous,” and a clearer statement that embedded software does not by itself reclassify an already-coded physical service. Those are all issues that appear in the current draft, the webinar discussion, and your blog critique.

I have kept the macro structure intact: Assistive, Augmentative, Autonomous. But I have rewritten the front end and the definitions to make the logic more explicit and less path-dependent.


Appendix S

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for determining the appropriate use of the terms assistive, augmentative, and autonomous in CPT code descriptors, related procedural language, and supporting materials for medical services and procedures that include software-enabled analysis.

For purposes of this taxonomy, the terms in Appendix S classify the software output function that is being described for coding. They do not, by themselves, define the entirety of a product, determine whether a service is separately reportable, or require that an already-coded physical service or device be reclassified solely because software is used within that service.

Appendix S does not define “artificial intelligence,” “machine learning,” or any other general class of software technologies. Rather, it provides terminology for describing the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

For use of a term from Appendix S, the software output must be relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and must be pertinent to patient management within the clinical context described by the code. Software that is limited to administrative, financial, scheduling, workflow, or general productivity functions does not fall within this taxonomy solely by reason of being used in health care.

The terms in Appendix S are adjectives that characterize the software-enabled analytical function being described. In some circumstances, that analytical function may correspond closely to the billable service. In other circumstances, it may describe only a component of a broader service. Classification under Appendix S therefore characterizes the software output function at issue; it does not by itself determine whether the broader service is separately reportable, bundled, or more appropriately described elsewhere in CPT.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, where applicable, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter means a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is necessarily a parameter for purposes of this taxonomy; the key consideration is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically refers to the software deriving its output without concurrent human involvement in the analytic process from input to output. Automatic derivation of output may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without independently deriving a distinct reportable parameter, and without independently generating a clinical interpretation or conclusion.

Assistive software output is clinically supportive of physician or QHP performance. It may improve, for example, accuracy, precision, consistency, inter-observer variability, efficiency of review, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

The output from assistive software requires physician or other QHP interpretation and report when the service is reported as a professional service. Assistive software does not, for purposes of this taxonomy, independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they function to draw attention to clinically relevant data rather than to provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. In general, technical or analytical validation demonstrating benefit to clinical performance may be sufficient where the assistive output does not itself alter the reportable clinical conclusion. Where the assistive output uses terminology that implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but the software does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

The augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management within the clinical context of the code. Clinical meaningfulness may be supported by, for example:

  • equivalence to a metric already used in current clinical practice;

  • consistency with clinical practice guidelines, professional standards, or accepted clinical frameworks; or

  • evidence that the novel output has a clinically relevant impact on patient management.

For augmentative services, evidence should demonstrate more than technical or analytical performance alone. The evidence should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The physician work associated with augmentative output may be separately described by the same code, may be captured in another code, or may be incorporated into a broader service, depending on the code structure.

Note: The fact that an augmentative output is used within a broader coded service does not by itself require separate reporting of the software function, nor does it require reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It includes software output that independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the claimed role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation of output is necessary but not sufficient for autonomous classification. A service is not autonomous merely because the software operates automatically. It is autonomous when the software independently provides the interpretation, conclusion, recommendation, or action that is clinically operative.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The recommendation or conclusion is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service includes a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight is exercised at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already-coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation/conclusion/action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed


#

#

#

This version tries to do five things better than the current draft:

  • first, it adds an entry gate;
  • second, it says clearly that the taxonomy classifies the software output function, not necessarily the whole product or service;
  • third, it separates automatic from autonomous more crisply;
  • fourth, it removes the awkward dependence of Autonomous on a prior Augmentative construct;
  • and fifth, it states expressly that embedded software does not automatically pull a broader physical service into a software-only coding framework. Those are all pressure points that emerged from the current draft and the related discussion.

If you want, I’ll next do one of two things:

  1. turn this into a redline-style comparison memo against Option B, or

  2. compress it into a shorter, more committee-friendly version that sounds even more like something AMA staff might circulate.

###
The re write of Appendix S is good but somewhat dry and verbose, can you improve readability and engagement of the reader.
###

Absolutely. Here is a leaner, calmer rewrite of the Appendix S text. I kept the same architecture and policy tone, but tightened sentences, reduced repetition, and tried to make each paragraph carry one clear idea.


Appendix S

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

The terms in Appendix S classify the software output function described for coding. They do not, by themselves, define the whole product, determine whether a service is separately reportable, or require reclassification of an already coded physical service or device solely because software is used within it.

Appendix S does not define “artificial intelligence,” “machine learning,” or other general classes of software. Its purpose is narrower: to describe the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

A term from Appendix S should be used only when the software output is relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and is pertinent to patient management in the clinical context described by the code. Software used only for administrative, financial, scheduling, workflow, or general productivity purposes does not fall within this taxonomy solely because it is used in health care.

The terms in Appendix S are adjectives describing the software-enabled analytical function at issue. In some cases, that function may closely correspond to the reportable service. In other cases, it may describe only one part of a broader service. Classification under Appendix S therefore describes the software output function, not necessarily the whole service.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, when appropriate, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter is a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is a parameter. The key question is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically means that software derives its output without concurrent human involvement in the analytic process from input to output. Automatic derivation may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without deriving a distinct reportable parameter and without independently generating a clinical interpretation or conclusion.

Assistive software supports physician or QHP performance. It may improve accuracy, precision, consistency, efficiency of review, inter-observer variability, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

When the service is reported as a professional service, the output from assistive software requires physician or other QHP interpretation and report. Assistive software does not independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they serve to draw attention to clinically relevant data rather than provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. Where the assistive output does not itself alter the reportable clinical conclusion, technical or analytical validation showing benefit to clinical performance may be sufficient. Where the output implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

An augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management in the clinical context described by the code. Clinical meaningfulness may be supported by equivalence to a metric already used in current practice, consistency with clinical practice guidelines or accepted clinical frameworks, or evidence that a novel output has clinically relevant impact on patient management.

For augmentative services, evidence should show more than technical or analytical performance alone. It should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The related physician work may be described by the same code, captured in another code, or incorporated into a broader service, depending on code structure.

Note: The presence of an augmentative output within a broader coded service does not by itself require separate reporting of the software function or reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation is necessary but not sufficient for autonomous classification. A service is not autonomous merely because software operates automatically. It is autonomous when the software independently provides the clinically operative interpretation, conclusion, recommendation, or action.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The conclusion or recommendation is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service provides a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight occurs at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation, conclusion, recommendation, or action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to the claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed

This version trims some of the longer explanatory passages, reduces repeated qualifiers, and tries to make each section easier to read in one pass. It also keeps the key improvements from the earlier rewrite: an entry gate, a distinction between software output and the whole service, a cleaner separation of automatic from autonomous, and explicit protection against dragging every embedded-software device into a separate coding bucket.

I can also do one more pass that is even closer to official AMA committee prose — slightly stiffer, slightly shorter, and more “editorial-panel-ready.”