Wednesday, April 29, 2026

AI Compares Option D 0429 and Appendix S 2024

 Below is a detailed comparison of OPTION D ONLY against the published/public 2025 Appendix S you provided. I am treating the “purpose” of the revisions as inferred from the clean Option D text, since we do not have a marked-up author memo.

Bottom line

Public Appendix S is a compact, three-part taxonomy: Assistive = detects data; Augmentative = analyzes/quantifies data; Autonomous = interprets data and independently generates conclusions. It is short, but leaves many borderline cases unresolved, especially where software produces a score, risk category, likelihood statement, or recommendation.

Option D appears designed to make the taxonomy more operational and harder to game. It shifts the focus from generic “work performed by the machine” to the software output(s) and the role of those outputs in clinical care. It adds explicit requirements around derived parameters, clinical meaningfulness, validity, patient-management contribution, and the distinction between augmentative parameters and autonomous interpretations/conclusions. It also weakens the old bright-line rule that augmentative software always requires physician/QHP interpretation and report.


1. Overall purpose: from a simple AI taxonomy to a coding-descriptor decision framework

The published Appendix S says the taxonomy provides guidance for classifying AI applications into assistive, augmentative, and autonomous, and expressly says “AI” is not defined in the code set because the term alone is not sufficient to describe clinical use or utility. It classifies services based on the clinical procedure and the “work performed by the machine” on behalf of the physician/QHP.

Option D keeps that basic purpose but makes it more CPT-operational. It says the taxonomy provides guidance in determining the appropriate terminology for CPT code descriptors, not merely for classifying AI applications. It also says classification depends on the software output(s) and their role in clinical care.

That is a significant change in orientation. Public Appendix S is more like a conceptual taxonomy. Option D is more like a coding-language control document: what should the CPT descriptor call this software-mediated service?

Result: Option D makes the Appendix less about “what kind of AI is this?” and more about “what kind of clinically usable output is being generated, and what descriptor term is justified?”


2. Option D adds a threshold requirement: the software output must be clinically useful

The public version says the taxonomy applies to AI in medical services and procedures, but it does not add a strong gatekeeping sentence about medical-purpose usefulness.

Option D adds that, in CPT, the AI software outputs must be useful in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions.

This is important because it excludes or de-emphasizes software outputs that are merely technical, administrative, convenience-oriented, or workflow-supportive unless they have a clinical role.

Result: Option D narrows the meaningful universe of Appendix S. It says the relevant object is not “AI software” in the abstract, but clinically useful software output.


3. Option D adds a CPT application/validity requirement

This is one of the strongest additions. Option D says assignment of an Appendix S term requires that the service or procedure demonstrate validity applicable to the specific term included in the code descriptor through the appropriate CPT application process, such as Category I or Category III.

The public Appendix S does not state this with comparable explicitness. It classifies the work, but does not clearly say that the label “assistive,” “augmentative,” or “autonomous” must itself be supported by validity evidence tied to the descriptor.

Purpose: This appears intended to prevent applicants from casually claiming “augmentative” or “autonomous” status merely because the product uses AI or produces a complex-looking output.

Result: Option D turns the taxonomy into an evidentiary standard. A software service does not merely “sound augmentative”; it must show validity appropriate to the claimed category.


4. Assistive: from “detects data” to “provides clinically relevant data without derived parameters or conclusions”

Public Appendix S

The public version defines assistive software as machine work that detects clinically relevant data without analysis or generated conclusions, requiring physician/QHP interpretation and report.

Option D

Option D defines assistive software as software whose outputs provide clinically relevant data without deriving a quantitative or categorical parameter — such as an index, score, or classification — and without generating an interpretation or conclusion.

Option D then expands the assistive category by saying assistive output is clinically supportive and may improve physician/QHP performance — accuracy, precision, inter-observer variability — even if the output of the primary service is unchanged.

Why this matters

The public version’s phrase “without analysis” is potentially too blunt. Many “assistive” tools do some computational work — detection, segmentation, highlighting, flagging, image enhancement — but may not create a new clinical parameter or conclusion. Option D’s distinction is more precise: the issue is not whether the machine did computation, but whether the output crosses into a derived parameter or interpretive conclusion.

Result: Option D may keep some computer-assisted detection or highlighting tools in Assistive, even if they involve substantial algorithmic processing, so long as they do not produce a score, classification, interpretation, or conclusion.


5. Assistive now oddly includes “likelihood of,” “suggestive of,” and “risk for”

Option D states that assistive outputs may include terms such as “likelihood of,” “suggestive of,” or “risk for.”

This is a notable and potentially controversial shift. In the public version, “likelihood of pathophysiology” appears in the autonomous discussion as an example of a clinically meaningful conclusion used to establish a diagnosis or intervention.

Purpose: Option D may be trying to recognize that not every “likelihood” or “risk” phrase is autonomous. A tool might flag an area as “suggestive of X” while still requiring physician interpretation and report.

Result: The same words — “likelihood,” “suggestive,” “risk” — no longer automatically imply autonomy. The classification depends on whether the output is merely clinically relevant/supportive data or whether it becomes a derived parameter, definitive interpretation, diagnosis, or management conclusion.

This is clarifying in one sense, but it also creates a new ambiguity: a “risk for” output can sound like a risk score, which Option D elsewhere places under augmentative outputs. The practical distinction would need to rest on whether the output is a true quantitative/categorical parameter versus a supportive detection-like statement.


6. Augmentative: major rewrite from “analyzes/quantifies” to “derived parameter distinct from input”

Public Appendix S

The public version defines augmentative machine work as analyzing and/or quantifying data to yield clinically meaningful output, requiring physician/QHP interpretation and report.

Option D

Option D defines augmentative software as output that represents a quantitative or categorical parameter that is qualitatively different than the input. It must be more than a summation of data inputs — more than adding, averaging, measuring, or reporting descriptive statistics.

It then gives examples: clinical scales, indexes, categorical classifications, risk scores, or other metrics that may be used in diagnosis, cure, mitigation, treatment, or prevention.

Why this matters

This is probably the biggest conceptual revision in Option D. Public Appendix S could treat almost any analytic or quantitative software output as augmentative. Option D says: not so fast. A measurement, average, or descriptive statistic is not enough. The output must be a clinically meaningful derived parameter that is distinct from the input.

Result: Option D likely reduces overclassification as augmentative. Simple measurement tools, calculators, summation tools, or descriptive analytics would not automatically become augmentative merely because they quantify data.


7. Option D adds validation expectations for augmentative outputs

Option D adds that an augmentative output should be validated by equivalence to a metric currently in clinical use. If novel, with no such existing metric, it should be validated for impact on patient management, such as novel predictive or prognostic indices.

This is absent from the public version’s shorter augmentative definition.

Purpose: This seems designed to stop a proliferation of black-box scores that are “clinically meaningful” only by assertion.

Result: For a novel AI-derived score, Option D asks: Is this equivalent to something clinicians already use? If not, is there evidence it changes patient management? That is a much higher and more concrete bar.


8. Option D explicitly distinguishes augmentative from autonomous by excluding definitive interpretation/conclusion

Option D says software is classified as augmentative if the output does not include a definitive interpretation or conclusion as would be required for autonomous.

The public version’s boundary is less explicit: augmentative analyzes/quantifies; autonomous interprets and independently generates conclusions.

Purpose: Option D is trying to make the boundary less dependent on the vague verb “analyzes” and more dependent on the output’s clinical role.

Result: A risk score, index, scale, or classification may be augmentative if it supports physician decision-making. But once the software independently generates a definitive diagnosis, specific management recommendation, or intervention-level conclusion, it moves toward autonomous.


9. Clinical meaningfulness becomes a defined concept

Option D repeatedly uses “clinically meaningful” and then defines it. For augmentative outputs, it says they are clinically meaningful based on documentation that the output contributes to patient management.

It also includes a key definition: software outputs are clinically meaningful based on documentation that the output contributes to patient management.

The public version uses “clinically meaningful output” and “clinically meaningful conclusions,” but does not define the term as tightly in the provided text.

Result: Option D makes “clinically meaningful” less rhetorical. The output must be tied to patient management, not merely statistically significant, technically impressive, or biologically interesting.


10. Augmentative no longer always requires physician/QHP interpretation and report

This is a major practical change.

In the public table, augmentative software requires physician/QHP interpretation and report: Yes.

In Option D, the table says the output requires physician/QHP interpretation and report “Conditionally” for augmentative, with a note that physician work related to augmentative services may be captured by existing codes.

Option D also states in the body that software with augmentative outputs may or may not require physician/QHP work through interaction with the software, such as adjustment of software settings based on clinical context.

And it adds that physician work related to augmentative services is typically captured by existing codes, such as when the output is a data element in E/M, a factor in pre-surgical planning, or integrated into a separate service with physician/QHP interpretation.

Purpose: Option D seems to be acknowledging real-world coding. Many augmentative outputs are not separately interpreted and reported as standalone physician work; they are incorporated into another service.

Result: This change may reduce the pressure to create separate CPT codes for every augmentative software output. It also clarifies that the software output may be part of a broader service rather than a separately reportable AI act.


11. Autonomous: from “interprets data and independently generates conclusions” to “derives parameters and generates interpretations”

Public Appendix S

The public version defines autonomous machine work as automatically interpreting data and independently generating clinically meaningful conclusions without concurrent physician/QHP involvement. It says autonomous services include interrogating and analyzing data and may include acquisition, preparation, and/or transmission.

Option D

Option D defines autonomous software as software that automatically, without concurrent physician/QHP involvement, derives parameters and independently generates clinically meaningful interpretations.

It then says reporting the derived parameters is essential for transparency and explainability at all autonomous levels.

Purpose: Option D appears to build autonomy on top of the same concept used for augmentative software: derived parameters. Autonomous software does not merely “interpret data”; it derives parameters and then generates interpretations/conclusions from them.

Result: Autonomous becomes more structurally defined. It is not just “AI made a conclusion.” It is: input → derived parameter(s) → independent clinical interpretation/conclusion/recommendation/action.


12. Option D adds transparency and explainability requirements for autonomous outputs

Option D states that reporting derived parameters is essential for transparency and explainability at all autonomous levels.

The public version does not include this explicit transparency/explainability requirement in the provided text.

Purpose: This likely responds to concerns that autonomous AI decisions need traceable intermediate outputs, not just black-box conclusions.

Result: Option D makes autonomous status harder to justify if the software cannot disclose or report the derived parameters underlying its recommendation or action.


13. Option D strengthens autonomous validity and utility language

Option D says recommendations for definitive diagnostic conclusions or specific management/interventions should provide clinical utility and have demonstrated validity. It adds that this may include placing results in the context of epidemiologic data or clinical practice guidelines, especially for Levels II and III.

The public version describes autonomous output as clinically meaningful conclusions that may be used to establish a diagnosis or implement an intervention, but Option D adds more explicit proof expectations.

Result: Option D raises the bar most sharply for Level II and Level III autonomy, where software actions can proceed with override opportunity or ongoing physician oversight rather than immediate physician implementation.


14. Autonomous Level I: changed from “contestable options” to “recommendations requiring physician judgment”

The public version says Level I autonomous AI draws conclusions and offers diagnosis and/or management options that are contestable and require physician/QHP action to implement.

Option D says Level I output includes recommendations of definitive diagnostic and/or specific management or interventions based on derived parameters, requiring physician/QHP judgment to implement or reject.

Purpose: Option D makes Level I more explicitly recommendation-based and less dependent on the somewhat slippery word “contestable.”

Result: The Level I category becomes a software-generated clinical recommendation that still requires an affirmative professional decision.


15. Autonomous Level II: refined around impending action and opportunity to negate

The public version says Level II draws conclusions and initiates diagnosis and/or management options with alert/opportunity for override, which may require physician/QHP action to implement.

Option D says Level II output includes medical management actions based on interpretations and conclusions from derived parameters, and the software must allow a reasonable opportunity to negate the impending action before implementation, such as by alert.

Purpose: Option D clarifies that Level II is not merely “offers options with override.” It is closer to “the system is about to act, but the clinician gets a chance to stop it.”

Result: Level II becomes a more specific automation category: software-initiated management with a pre-implementation override window.


16. Autonomous Level III: refined around automatic ongoing management with oversight

The public version says Level III draws conclusions and initiates management, requiring physician/QHP initiative to contest.

Option D says Level III automatically initiates management actions based on interpretations and conclusions from derived parameters. These actions require physician/QHP oversight and performance review and continue unless the physician/QHP intervenes. It adds that oversight is typically over multiple interventions to determine whether management goals are achieved.

Purpose: Option D is trying to make Level III less like a single autopilot event and more like a continuing autonomous management function.

Result: Level III is narrowed to systems that automatically initiate and continue management actions unless stopped, with clinician oversight of performance over time.


17. Option D adds key definitions that public Appendix S lacks or leaves implicit

Option D adds three explicit definitions:

Derived parameters are quantitative or categorical software outputs, such as an index, score, or classification.

Clinically meaningful means documented contribution to patient management.

Automated/automatically refers to algorithmic work from input to output — deriving parameters — and can apply to either augmentative or autonomous software.

This last point is very important. In public Appendix S, “automatically” appears mainly in the autonomous definition. Option D clarifies that automatic derivation of parameters does not make software autonomous. Automation can occur in augmentative software too.

Result: This is one of Option D’s most useful corrections. It separates automation from autonomy. A tool can automatically generate a risk score and still be augmentative if it does not independently generate a definitive interpretation, diagnosis, management recommendation, or action.


18. Table changes: the taxonomy becomes more output-centered

The public table says:

Assistive: detects clinically relevant data.
Augmentative: analyzes/quantifies data to yield clinically meaningful output.
Autonomous: interprets data and independently generates clinically meaningful conclusions.

Option D’s table changes the axis:

Assistive: clinically supportive data, including “likelihood of,” “suggestive of,” or “risk for.”
Augmentative: derives clinically meaningful parameters distinct from input and identifying a particular clinical condition, including “predictive of” or “prognostic of.”
Autonomous: derives clinically meaningful parameters and from them generates interpretations or conclusions in the form of diagnosis or intervention.

Result: Option D replaces the old “detects / analyzes / interprets” ladder with a more nuanced ladder:

data support → derived clinical parameter → derived parameter plus independent interpretation/conclusion/action.

That is the central logic of Option D.


19. Practical classification consequences

Here is how Option D would likely change real-world classification outcomes:

Simple detection/highlighting tools remain assistive, even if algorithmically sophisticated, if they do not generate a derived parameter or conclusion.

Simple measurement/quantification tools may not qualify as augmentative if they merely add, average, measure, or report descriptive statistics. Option D expressly says augmentative output must be more than that.

Risk scores, indexes, clinical scales, and categorical classifications are more clearly augmentative, provided they are clinically meaningful, distinct from inputs, and documented to contribute to patient management.

Predictive or prognostic indices can be augmentative, but if novel, they require validation for patient-management impact.

Software recommendations for definitive diagnosis or specific intervention are pushed toward autonomous Level I, if physician/QHP judgment is required to implement or reject.

Software that initiates actions unless stopped becomes autonomous Level II or Level III, depending on whether there is a pre-action opportunity to negate or ongoing automatic management that continues unless the physician intervenes.


20. The most important conceptual result

The published Appendix S uses three verbs that feel intuitive but imprecise:

Detects → analyzes/quantifies → interprets/concludes.

Option D substitutes a more coding-usable structure:

Clinically relevant data → clinically meaningful derived parameter → derived parameter plus independent interpretation/conclusion/action.

That is a meaningful improvement. It gives CPT applicants and reviewers more handles: Is there a derived parameter? Is it qualitatively different from the input? Is it clinically meaningful? Is it validated? Does it contribute to patient management? Does it merely support the physician, or does it independently interpret, recommend, or act?


21. Residual problems in Option D

Option D is clearer, but not perfectly clean.

First, the assistive examples “likelihood of,” “suggestive of,” and “risk for” may blur into augmentative or autonomous territory, depending on how the output is generated and reported.

Second, Option D says augmentative output may identify a particular clinical condition, including “predictive of” or “prognostic of,” while autonomous output generates interpretations or conclusions in the form of diagnosis or intervention. The distinction between “identifies a clinical condition” and “generates a diagnosis” may still be difficult in practice.

Third, Option D imports evidentiary standards — validity, clinical utility, patient-management contribution — into a nomenclature appendix. That is probably intentional, but it may create friction because CPT taxonomy, FDA validation, clinical utility, and payer coverage are adjacent but not identical concepts.

Fourth, Option D’s statement that physician work related to augmentative services is typically captured by existing codes may be useful but could also be read as discouraging separate coding for many software outputs.


Concise executive summary

Option D is not merely a wordsmithing revision of Appendix S. It changes the taxonomy from a broad “machine work” classification into an output-based CPT descriptor framework. Its main purpose is to clarify that the relevant object is the clinical software output, not the AI product. Its main result is a sharper boundary between assistive data, augmentative derived parameters, and autonomous interpretation or action. It also raises the evidentiary bar by requiring validity, clinical meaningfulness, patient-management contribution, and — for autonomous tools — transparency/explainability of derived parameters. The biggest practical change is that automatic generation of a score is not autonomous; automation can be augmentative. Autonomy begins when the system independently generates a clinically meaningful interpretation, recommendation, diagnosis, or management action.

Friday, April 24, 2026

VBCC 20260424 Strategies from OCM to V2 to V3...

 

First-pass strategic take

Yes, EOM is meaningfully different from OCM, but not enough to be reassuring. It is better designed, narrower, more patient-centered, more equity-aware, and more digitally ambitious. But the early evidence still looks like OCM 2.0 rather than a true new operating system for value-based cancer care: modest payment reductions, no visible quality/utilization gain yet, and net losses once care-management and incentive payments are counted.

That is actually a strong panel premise: VBCC has not failed as an aspiration; it has failed because the measurement layer is still too weak, too claims-centric, and too poorly connected to what oncology patients and clinicians actually experience.


1. OCM vs EOM: what changed, and should we be reassured?

OCM: the disappointment

The final Brooks/JAMA paper is devastating in a quiet, technocratic way. OCM ran from 2016–2022, included more than 200 practices, covered about one-fourth of systemic cancer treatment in FFS Medicare, and produced a statistically significant but modest $616 reduction per episode, increasing to $1,282 in the final performance period. But there were no statistically significant differences in hospitalizations, ED visits, or quality, and after MEOS and performance payments, Medicare had an estimated $639 million net loss.

That says: OCM learned how to bend some spending, but not enough to pay for itself, and not with a measurable patient-quality signal. The Thomas/Ward critique anticipated part of this: OCM was not really a true bundle; it preserved FFS and layered on a complicated shared-savings/payment overlay, leaving practices partly paid to do more and partly rewarded for doing less.

EOM: the real improvements

EOM is not just a rename. It makes several real changes:

Design feature

OCM

EOM

Cancer scope

Broad systemic therapy episodes

Narrowed to seven cancer types

Hormonal therapy-only episodes

Included in OCM

Generally excluded; focuses on systemic chemotherapy, not hormonal-only

Risk

Shared savings; downside risk evolved

Downside risk from the start

MEOS

$160 PBPM

$110 PBPM, or $140 PBPM for dual eligibles

Equity

Less central

HRSN screening, equity plans, dual-eligible enhanced payment

Patient-reported outcomes

Not central

ePRO collection and monitoring required

Digital/data strategy

Less mature

Requires quality, clinical, and sociodemographic data; CEHRT and CQI use

Participation

>200 practices in OCM

As of March 2026, 28 practices and 1 commercial payer

CMS describes EOM as focused on seven cancers, six-month episodes, total-cost accountability, MEOS payments, required 24/7 access, navigation, evidence-based guidelines, comprehensive care plans, HRSN screening, ePROs, CQI data use, and certified EHR technology. (Centers for Medicare & Medicaid Services)

The non-reassuring part

The first EOM evaluation already has a familiar smell. CMS’s first at-a-glance report for Performance Period 1, July–December 2023, says EOM likely reduced payments but produced a net loss to Medicare after MEOS and incentive payments, and did not affect quality or utilization measures, including hospice use before death or acute care utilization. (Centers for Medicare & Medicaid Services)

That is the core answer: EOM is more sophisticated, but the early signal is still not a proof of VBCC. It is a better-engineered experiment, not yet a successful value model.


2. Are there new and better metrics in EOM?

Yes, but with a big caveat. EOM adds the right categories of measurement, but many are still not mature, outcome-forward, or digitally reliable enough to bear payment accountability.

The good news is that EOM formally moves beyond pure claims logic. It requires ePROs, HRSN screening, clinical data elements, sociodemographic data, and quality reporting. CMS’s participant resources now include an EOM Clinical Data Elements Guide, Sociodemographic Data Elements Guide, Quality Measures Guide, and cost/quality performance data files for Performance Periods 1–2. (Centers for Medicare & Medicaid Services)

Your prior measurement report frames this well: the plausible springboard is the convergence of EOM redesign requirements, digital quality measures, and oncology interoperability infrastructure such as USCDI+ Cancer and mCODE.

But the caveat is decisive: EOM has better measurement ingredients, not yet better measured outcomes. The model asks for ePROs and HRSN data, but the early evaluation did not show measurable improvement in acute care utilization or quality. (Centers for Medicare & Medicaid Services)

The most promising domains for real VBCC metrics are the ones you already identified:

ePRO symptom/toxicity control. This is probably the single strongest new metric domain because it connects directly to patient experience, acute care avoidance, and clinical response workflows. Your report proposes measuring both completion of standardized symptom assessments and timely response to severe symptom alerts.

Physical function preservation. More meaningful than “patient satisfaction,” and closer to what cancer patients care about: can I function, work, walk, sleep, eat, and live my life?

Avoidable acute care utilization. Still useful, but claims alone are crude. ED visits and admissions become more meaningful if paired with symptom-triggered preventability review or ePRO context.

Evidence-based regimen/pathway concordance. This could matter greatly, but it requires structured stage, biomarkers, treatment intent, and exception logic. Otherwise, it becomes a documentation game.

Goal-concordant end-of-life care. Claims can measure late chemotherapy and hospice timing, but the real outcome is whether care matched patient goals. Your report rightly pairs claims-based EOL metrics with structured goals-of-care documentation.

Financial toxicity. This is a missing VBCC domain. If cancer care bankrupts or destabilizes a patient, “value” has not been achieved. Your report proposes validated financial toxicity screening plus navigation response.

So the answer is: EOM points toward better metrics, but it has not yet demonstrated better outcomes. It is more promising as a measurement platform than as a proven payment model.


3. What should “V3” of CMS value-based oncology look like?

Your strongest forward-looking thesis is:

OCM tested whether care-management payments plus shared savings could make oncology cheaper. EOM tests whether a narrower, risk-bearing, equity-aware model with ePROs can do better. V3 should test whether digitally computable, patient-centered oncology outcomes can finally become the basis of value-based cancer care.

In other words, V3 should not be merely EOM with different benchmarks. It should be a model where measurement is the product.

V3 should have five design principles

First, V3 should make oncology clinically legible. It cannot rely mainly on claims. It needs structured diagnosis, stage, biomarkers, line of therapy, treatment intent, progression/recurrence, performance status, and death date. EOM’s clinical data element work and mCODE/FHIR alignment are early scaffolding, but V3 should make this the required substrate. Your report puts it neatly: oncology’s real clinical context is often buried in notes and cannot be reliably inferred from claims.

Second, V3 should use AI/NLP as measurement infrastructure, not as a magic wand. The panel topic is strongest if you say: AI review of EHRs may finally allow VBCC to measure what claims cannot—stage, progression, toxicity, treatment intent, ECOG-like function, adverse events, biomarker appropriateness, and goals-of-care discussions. But AI-derived metrics must be validated, audited, version-controlled, and bias-tested. Otherwise V3 becomes a digital façade: computable, impressive, and wrong.

Third, V3 should use a small core measure set. Something like 8–12 measures, not 50. Candidate domains: symptom control, functional preservation, avoidable acute care, evidence-based regimen appropriateness, time to treatment, EOL goal-concordance, financial toxicity, and equity/whole-person supports. That aligns with your prior report’s proposed VBCC portfolio.

Fourth, V3 should separate “drug price exposure” from “care delivery performance.” OCM and EOM struggle because oncology spending is dominated by therapies whose prices and clinical indications are not fully controlled by the practice. EOM’s early report says systemic cancer treatment drug spending accounts for about 58% of EOM episode costs, and participants reported focusing on drug spending interventions. (Centers for Medicare & Medicaid Services) V3 should distinguish: Did the practice choose appropriate therapy? Did it manage toxicity? Did it avoid preventable acute care? Did it align care with patient goals? It should not simply punish a practice because the correct therapy is expensive.

Fifth, V3 should make equity measurable without making safety-net care financially toxic to providers. EOM adds HRSN screening and dual-eligible enhanced MEOS payments, which is directionally right. But V3 should require equity stratification and closed-loop resource referral metrics while protecting practices that care for medically and socially complex populations. CMS describes EOM as requiring HRSN screening, equity plans, expenditure/utilization reports to identify disparities, and higher MEOS payments for dual eligibles.


A sharper conference thesis

Here is the central framing I would use:

“Value-based cancer care has been stuck in an awkward middle stage: payers can measure cost, but not value; clinicians can describe value, but not compute it; patients can feel value, but it rarely appears in payment models. OCM showed that care redesign can produce modest savings without measurable quality improvement. EOM improves the model by adding downside risk, ePROs, health-related social-needs screening, and richer data requirements, but early results still look financially and clinically inconclusive. The next version of VBCC will depend less on another tweak to shared savings and more on whether AI-enabled EHR review, ePROs, mCODE/USCDI+ Cancer, and digital quality measures can produce auditable, patient-centered, oncology-specific metrics at scale.”

That is a strong panel because it is neither naïvely optimistic nor drearily cynical.


Draft paragraph-long panel proposal

Are We Finally Ready for Real Value-Based Cancer Care? From OCM and EOM to AI-Enabled Measurement.
For more than a decade, value-based cancer care has promised to reward better outcomes rather than higher volume, yet progress on the ground has remained limited. The Oncology Care Model produced modest reductions in Medicare episode payments but no significant improvements in utilization or quality, and net losses to Medicare after model payments. Its successor, the Enhancing Oncology Model, adds important improvements—downside risk, narrower cancer scope, electronic patient-reported outcomes, health-related social-needs screening, enhanced services, and richer clinical data requirements—but early results remain inconclusive. This panel will ask whether the missing ingredient has been measurement itself: the ability to capture oncology stage, biomarkers, treatment intent, toxicity, function, goals of care, financial toxicity, and equity outcomes in computable, auditable form. We will explore whether AI-enabled EHR review, ePROs, mCODE/USCDI+ Cancer, and digital quality measures can support a true next-generation “V3” oncology value model—one that moves beyond claims-based cost control toward patient-centered, clinically meaningful cancer care performance.


Possible panel title options

Best straightforward title:
From OCM to EOM to V3: Can Better Measurement Finally Make Cancer Care Value-Based?

More provocative:
Value-Based Cancer Care’s Missing Operating System: Metrics, AI, and the Road Beyond EOM

Most “conference program” friendly:
The Next Generation of Value-Based Cancer Care: Lessons from OCM, Early EOM, and AI-Enabled Outcome Measurement

Most Bruce-style:
After OCM and EOM: Is Value-Based Cancer Care Still Waiting for Its Measurement System?


Suggested panel architecture

You as moderator/host should set up the tension: “We have had a decade of VBCC conferences, but the field still often means payer coverage plus ASP drug pricing. What would make it real?”

Ideal panelists:

  1. OCM evaluation author
    Gabriel Brooks would be excellent, because the JAMA paper is now the cleanest empirical anchor. Nancy Keating would also be very strong, especially for evaluation design and interpretation. The Brooks paper lists Brooks, Trombley, Landrum, Liu, Simon, and Keating among key authors, with Brooks, Trombley, and Keating drafting the manuscript.
  2. CMMI/EOM representative
    Someone who can speak to EOM’s design choices, first evaluation, clinical data elements, ePROs, HRSN screening, and what CMS hopes to learn by 2030. Even if they cannot speculate about “V3,” they can discuss what EOM is designed to test.
  3. Community oncology practice leader
    Preferably someone actually implementing EOM, ePROs, navigation, urgent care access, and “call us first” workflows. This prevents the panel from becoming a policy seminar in the clouds.
  4. Measurement/interoperability person
    Someone from ONC/ASTP, NCI’s USCDI+ Cancer effort, HL7/mCODE, NCQA, or a serious EHR/FHIR implementer. Their role: what needs to be standardized before oncology outcomes can be measured without heroic chart abstraction?
  5. Patient-centered outcomes / financial toxicity voice
    Ideally someone working on ePROs, symptom monitoring, financial toxicity, or patient advocacy. This keeps the panel honest: VBCC should not become “cost-based cancer care with nicer adjectives.”

My bottom-line answer to your three questions

OCM vs EOM: EOM is a real improvement in design, but not yet reassuring in results. It narrows the model, adds downside risk, requires ePROs and HRSN screening, and builds a richer data strategy. But early EOM still shows payment reductions offset by participant payments and no clear quality/utilization improvement.

Metrics in EOM: EOM has better metric ingredients—especially ePROs, HRSN screening, clinical data elements, sociodemographic data, and CQI reporting—but the decisive step from “data collection” to “valid outcome measurement” is still ahead.

V3 forecast: V3 should be an AI/digital-measurement-enabled oncology value model: clinically rich, patient-centered, equity-stratified, auditable, and built on structured oncology data plus validated AI extraction from EHR text. The goal is not simply to save money on chemotherapy episodes, but to measure whether cancer care actually improves symptoms, function, appropriateness, goal-concordance, financial well-being, and avoidable acute care—without drowning practices in manual reporting.

 

Thursday, April 23, 2026

Appendix S - A Full-on Rewrite by AI

I gave Chat GPT some drafts of  AMA CPT Appendix S for software services and asked it to make the best re-write it could, aiming to fix a number of problems.  (The copy below is its 2nd attempt to make this rewrite.)

###

 


Appendix S - ReWritten Anew, by Chat GPT

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

Appendix S describes the clinical role of software output in medical services and procedures. It does not define “artificial intelligence,” “machine learning,” or other general classes of software. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

A term from Appendix S should be used only when the software output is relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions and is pertinent to patient management in the clinical context described by the code. Software used only for administrative, financial, scheduling, workflow, or general productivity purposes does not fall within this taxonomy solely because it is used in health care.

The terms in Appendix S describe the software-enabled analytical role at issue. In some cases, that role may correspond closely to the reportable service. In other cases, it may describe only one component of a broader service. Accordingly, Appendix S does not by itself determine whether a service is separately reportable or require reclassification of an already coded physical service or device solely because software is used within it.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, when appropriate, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter is a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is a parameter. The key question is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically means that software derives its output without concurrent human involvement in the analytic process from input to output. Automatic derivation may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without deriving a distinct reportable parameter and without independently generating a clinical interpretation or conclusion.

Assistive software supports physician or QHP performance. It may improve accuracy, precision, consistency, efficiency of review, inter-observer variability, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

When the service is reported as a professional service, the output from assistive software requires physician or other QHP interpretation and report. Assistive software does not independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they serve to draw attention to clinically relevant data rather than provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. Where the assistive output does not itself alter the reportable clinical conclusion, technical or analytical validation showing benefit to clinical performance may be sufficient. Where the output implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

An augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management in the clinical context described by the code. Clinical meaningfulness may be supported by equivalence to a metric already used in current practice, consistency with clinical practice guidelines or accepted clinical frameworks, or evidence that a novel output has clinically relevant impact on patient management.

For augmentative services, evidence should show more than technical or analytical performance alone. It should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The related physician work may be described by the same code, captured in another code, or incorporated into a broader service, depending on code structure.

Note: The presence of an augmentative output within a broader coded service does not by itself require separate reporting of the software function or reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation is necessary but not sufficient for autonomous classification. A service is not autonomous merely because software operates automatically. It is autonomous when the software independently provides the clinically operative interpretation, conclusion, recommendation, or action.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The conclusion or recommendation is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service provides a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight occurs at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation, conclusion, recommendation, or action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to the claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed

 

#

This version tries to do five things better than the versions taken as input.

First, it adds an entry gate;

Second, it says clearly that the taxonomy classifies the software output function, not necessarily the whole product or service;

Third, it separates automatic from autonomous more crisply;

Fourth, it removes the awkward dependence of Autonomous on a prior Augmentative construct;

Fifth, it states expressly that embedded software does not automatically pull a broader physical service into a software-only coding framework. Those are all pressure points that emerged from the current draft and the related problems in implementation.

Can Appendix S Be Made into a Logic Flow Chart?

 At a 4 16 2026 webinar on Appendix S, AMA management said they may someday do "beta testing" with sample codes and see if they are classifiable clearly, and or make a logic flow chart (if possible) from Appendix S for classification by rules.

Here's what Chat GPT 5 made of the test on 4 23 2026, using the most recent clean copy of the appendix S revision proposal.

##

##

Yes — a provisional decision tree is possible, and your instinct is basically right: the cleanest place to start is usually Autonomous first, because that category is the most outcome-oriented and easiest to test at the top level: does the software itself generate a clinically meaningful interpretation/conclusion and then recommend or initiate management? If yes, you are in Autonomous-land; if not, you fall back to the Assistive/Augmentative split. That structure fits both the text of the clean Appendix S.

But the exercise also shows why the document is hard to operationalize. 

The taxonomy mixes several different axes at once: 

  • detection vs parameter generation vs interpretation; 
  • physician involvement vs machine action; 
  • output vs report; and 
  • evidentiary sufficiency for “clinical meaningfulness.” 
  • Your own blog puts this well: AMA is trying to classify several disjunct concepts simultaneously, and that makes clean bucket logic unstable.

Here is the best decision tree I can derive from the current proposal:


Proposed Appendix S decision tree

Step 0. Is there a software output relevant to clinical care?

Before using Appendix S at all, ask whether the software output is actually used in diagnosis, cure, mitigation, treatment, or prevention, and whether it is being characterized as part of a medical service/procedure rather than merely a productivity or workflow tool. The clean draft says the software output must be useful in clinical care, with the role of the output in patient management / medical management.

If no, Appendix S may not apply, or at least the service may not be codifiable in the contemplated way. A healthcare productivity tool is generally not codifiable in CPT.

If yes, continue.


Step 1. Does the software itself generate a clinically meaningful interpretation or conclusion?

This is the best opening gate for Autonomous. The clean text defines Autonomous as software that automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.

If yes, go to Step 1A.
If no, skip to Step 2.

Step 1A. Does the software merely recommend, or does it act?

This splits the three levels of Autonomous.

  • Level I Autonomous: software offers a definitive diagnostic or management recommendation, but physician/QHP judgment is required to implement or reject it.

  • Level II Autonomous: software initiates a medical management action, but there is a reasonable chance to negate/override before implementation.

  • Level III Autonomous: software automatically initiates management actions that continue unless the physician/QHP intervenes.

This is also where the protocol emphasizes the crucial distinction between automaticity and autonomy. A tool can be automatic in getting from input to output without being Autonomous in the Appendix S sense.  


Step 2. If not Autonomous: does the software output derive a new quantitative or categorical parameter distinct from the input?

This is the key gate for Augmentative. The clean draft says Augmentative exists when the output represents a quantitative or categorical parameter qualitatively different from the input, and more than adding/averaging/descriptive statistics.

If yes, go to Step 2A.
If no, go to Step 3.

Step 2A. Is the output clinically meaningful?

Under the proposal, Augmentative — and by extension Autonomous — hinges on “clinical meaningfulness.” The clean draft says the output must contribute to patient management and be supported by evidence beyond merely technical/analytical validation, beyond mere statistical significance, and directly pertinent to the code characteristics.

If yes, classify as Augmentative.
If no, the taxonomy becomes unstable: the text suggests you are not yet entitled to call it Augmentative, but it is also more than merely Assistive if it truly derives a new parameter. This is one of the places where the logic frays.


Step 3. If no new parameter is derived: does the software merely detect clinically relevant data, without interpretation or conclusions?

This is the cleanest definition of Assistive. The clean text says Assistive software detects clinically relevant data without deriving a parameter, generating an interpretation, or providing conclusions. It improves physician/QHP performance, while the primary service outcome may remain unchanged.

If yes, classify as Assistive.
If no, the taxonomy again becomes unstable, because there are outputs that may structure, summarize, triage, or otherwise transform information without fitting neatly into pure “detection” or pure “parameter generation.”  


The same logic as short yes/no questions

  1. Is the software output part of patient care / medical management rather than mere workflow?

  2. Does the software itself generate a clinically meaningful interpretation or conclusion?

    • If yes: Autonomous. Then ask whether it recommends only, acts with override, or acts until stopped.

  3. If not, does it derive a new quantitative or categorical parameter qualitatively different from the input?

    • If yes: likely Augmentative, assuming clinical meaningfulness is shown.

  4. If not, does it merely detect clinically relevant data, leaving interpretation/report to the physician/QHP?

    • If yes: Assistive.

  5. If none of the above fits cleanly, the taxonomy is under-specified for that use case.


Why the logic partly works

It works because the current draft does contain a rough hierarchy of outputs:

  • Assistive = detects clinically relevant data, no new parameter, no conclusion.

  • Augmentative = produces a distinct parameter, but not a definitive interpretation/conclusion.

  • Autonomous = goes further and independently interprets/concludes, then may recommend or initiate management.

That is why starting with Autonomous makes sense. It tests the most advanced function first, then backs down to parameter generation, then backs down to detection.


Why the logic also collapses

Here is where your skepticism is justified.

1. The tree is not built on a single dimension

The categories are not based on one clean axis. They mix:

  • output type,

  • evidentiary burden,

  • physician involvement,

  • whether the machine acts,

  • and whether the software is a full service or only a component of a service.

That is why the same service can feel “assistive” in one respect and “autonomous” in another unless you are very strict about what exactly is being classified.

2. “Output” versus “report” versus “service” is slippery

Does Appendix S classifies the whole service, the algorithmic analysis, or just the software output component.  The "Triple A" terms are adjectives of the algorithmic analysis/output, not necessarily of the entire service.  But this variance may introduce logical wobble.

3. “Clinically meaningful” is doing too much work

The clean draft makes clinical meaningfulness central for Augmentative and, by extension, Autonomous. However, this extension from Augmentative to Autonomous (of clinically meaningful) is implied..

That means your tree has to ask not only “what does the software do?” but also “has enough evidence been shown for me to call that output clinically meaningful?” That is not a pure logical branch; it is partly an evidentiary judgment.

4. Automatic is not Autonomous

This is important and non-intuitive. Software can run automatically from input to output and still not be Autonomous. That means a flow chart that begins with “Is it automatic?” would mislead users.

5. Borderline cases do not fit neatly

As the eligible services and codes expand, so will the gray zones:

  • detection plus notification/triage,

  • structuring or summarizing evidence rather than simple detection,

  • intraoperative or intraprocedural support,

  • tools embedded inside broader physical services.

Those are exactly the cases likely to “break” a decision tree.  They point to areas where the prose can be clarified and the logic tree improved thereafter.


Your hypothesis about path dependence

I think you are right that the logic is path dependent, and in two ways.

First, Autonomous-first is the cleaner routing logic, because Autonomous is partly defined by the presence of functions that exceed Augmentative: clinically meaningful interpretation/conclusion plus recommendation/initiation of management. If those are absent, you can step down to Augmentative and then Assistive.

Second, the answer can depend on which question is asked first. For example:

  • If you ask first, “Does it generate a parameter?”, you may classify something as Augmentative before noticing that it also independently concludes and initiates action, which would push it to Autonomous.

  • If you ask first, “Does it improve physician performance?”, almost everything starts sounding Assistive.

  • If you ask first, “Is there physician involvement?”, you can get misled by the automatic/autonomous distinction.

So yes: the same service can appear to move differently through the tree depending on the entry point, which is a sign the prose taxonomy is not yet logically orthogonal.


My bottom-line judgment

Today, a usable working flow chart is possible, but only as a triage tool, not as a fully determinate classifier.

The best working version is:

  1. Does the software independently generate a clinically meaningful interpretation/conclusion and recommend/initiate management?
    Autonomous.

  2. If not, does it generate a distinct quantitative/categorical parameter beyond descriptive statistics?
    Augmentative, if clinically meaningful.

  3. If not, does it merely detect clinically relevant data for physician interpretation/report?
    Assistive.

  4. If still unclear, the case exposes a drafting gap.

That would be illuminating precisely because it shows where the taxonomy works and where it does not. 

And if the flow chart repeatedly crashes on structured-evidence, triage, embedded-software, or mixed-function tools, that failure (of today's chart) would be as valuable as success — exactly the point you made.

The most important practical fix, in my view, would be to preface any flow chart with one explicit rule:

Classify the software output function being claimed for coding — not the mere presence of software, not the whole physical service unless that software function itself is the claimed service.”

 ###

I gave Chat GPT a couple recent Cat III codes to assess as a beta test.

###

+0764T / 0765T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, tied to concurrent or prior ECG use. CMS assigned OPPS/ASC payment status effective 2025 according to vendor coding guides and public reporting. 

0903T–0905T — algorithmically generated 12-lead ECG from a reduced-lead ECG, with separate codes for global/tracing/interpretation-only configurations.

###

Using only the code text you gave, plus the current Appendix S logic, I’d characterize them this way:

0764T / 0765T

Best fit: Assistive.
In fact, the current clean Appendix S table itself uses “Algorithmic electrocardiogram risk-based assessment for cardiac dysfunction (0764T, 0765T)” as the example under Assistive.

Why that is a bit surprising, but still the document’s answer:

  • Assistive is defined as software that detects clinically relevant data without deriving a parameter, interpretation, or conclusion, and it requires physician/QHP interpretation and report.
  • The drafters intentionally allow Assistive outputs to include terms like “likelihood of,” “suggestive of,” or “risk for.” 
  • Your code summary for 0764T/0765T says AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report. That lines up closely with the Assistive bucket as the draft currently describes it.

So, under the present proposal, 0764T/0765T = Assistive, even if a lot of readers might instinctively think “risk-based assessment” sounds more augmentative. The draft has effectively pre-decided that one example.

Autonomous Level 1/2/3?
No. Not autonomous. The code text expressly keeps the physician/QHP interpretation/report in the loop, and Autonomous in Appendix S requires the software to independently generate clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.
So there is no Level I/II/III assignment here.


0903T–0905T

Best fit from the code text alone: probably Augmentative — but this is much less certain.

My reasoning:

  • These codes describe an algorithmically generated 12-lead ECG from a reduced-lead ECG.
  • That sounds like more than mere detection. The software is not just flagging a feature; it is transforming limited input into a more elaborate clinically usable output.
  • Yet the code text, as you summarized it, does not sound like the software is independently issuing a diagnostic interpretation or management conclusion. So it does not sound Autonomous.
  • That leaves Augmentative as the closest fit, because Augmentative covers software whose output is a quantitative or categorical parameter qualitatively different from the input, and more than simple descriptive reporting, but without a definitive interpretation or conclusion.

That said, there is an awkwardness here: a reconstructed 12-lead tracing is not obviously the kind of “parameter” Appendix S talks about, like an index, score, classification, risk score, predictive/prognostic metric. That is why I say probably Augmentative, but not cleanly.

In plain English:

  • It is not Assistive, because it seems to do more than detect.
  • It is not Autonomous, because it does not appear to independently conclude/diagnose/manage.
  • So it lands closest to Augmentative, but as an example of how Appendix S fits some services awkwardly.

Autonomous Level 1/2/3?
Again, no. Nothing in the code text you gave suggests the software:

  • makes a definitive diagnosis recommendation,
  • initiates a management action with override,
  • or automatically initiates ongoing management.

So no autonomous level.


Bottom line

My classification from the code text alone would be:

  • 0764T / 0765T → Assistive
    • and not just by inference: the Appendix S draft’s own table uses them as the Assistive example.
  • 0903T–0905T → probably Augmentative
    • but with a real caveat that these codes expose a weakness in the Appendix S prose, because a reconstructed waveform is not a neat “score/index/classification” style output.
  • Neither appears Autonomous, so no Level I/II/III for either code family.

The truly interesting point is that 0903T–0905T may be exactly the sort of example that would stress-test the framework. It does something clearly more than detection, but not obviously what Appendix S writers have in mind when they talk about “clinically meaningful parameter generation.

###

STRESS TEST LOGIC WITH TEN EXAMPLES

###

Yes — here is a fire test with 12 real examples. The broad result is that the framework works cleanly for a handful of services whose descriptors already use words like augmentative or match the Appendix S example table, but it starts to wobble when a code describes reconstruction, mapping, indexing, or alerting rather than straightforward detection, parameter-generation, or independent conclusion/action. 

That is very much in line with the tensions you flagged in your blog.

I’d use this shorthand:

  • Clean fit = Appendix S handles it fairly well.

  • Wobble = the code can be placed, but only by inference or with some strain.

  • Major wobble = the code exposes a real drafting gap.

The fire test

1) 0764T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, performed with a concurrent ECG
Best fit: Assistive. Clean fit. The official Appendix S example table itself places 0764T/0765T under Assistive, and the descriptor retains physician/QHP interpretation/report rather than independent machine conclusion. (American Medical Association)

2) 0765T — same family, but based on a prior ECG
Best fit: Assistive. Clean fit. Same reasoning as 0764T. The code family is effectively the canonical Assistive ECG example. (American Medical Association)

3) 75580 — Noninvasive estimate of coronary fractional flow reserve (FFR) derived from augmentative software analysis of coronary CTA data, with interpretation/report
Best fit: Augmentative. Very clean fit. The descriptor itself uses the phrase “derived from augmentative software analysis,” and the Appendix S example table also places 75580 under Augmentative. (UHC Provider)

4) 92229 — Retinal imaging with point-of-care automated analysis and diagnostic report
Best fit: Autonomous, Level I. Very clean fit. The Appendix S example table places 92229 under Autonomous, so we can identify 92229 as the then-current Autonomous example. It produces an automated diagnostic report, but the code does not describe machine-initiated treatment, so it fits Level I rather than II or III. (American Medical Association)

5) 0903T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; with interpretation and report
Best fit: Probably Augmentative. Wobble. The algorithm transforms limited input into a richer output, which feels more like parameterized/derived output than mere detection, but the output is a reconstructed tracing rather than a classic “index, score, or classification.” It does not appear to independently generate a definitive conclusion, so it falls short of Autonomous. (AliveCor | AliveCor Home)

6) 0904T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; tracing only
Best fit: Probably Augmentative. Major wobble. This may be the most awkward ECG example. It is clearly more than Assistive detection, but it is only a tracing and not an interpretation. It seems most consistent with Augmentative, yet it reveals that Appendix S does not naturally handle “algorithmic waveform reconstruction” very elegantly. (AliveCor | AliveCor Home)

7) 0905T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; interpretation and report only
Best fit: Probably Augmentative. Wobble. Same problem as 0903T/0904T: the service depends on algorithmic reconstruction, but the descriptor does not say the software itself independently concludes/acts in the Autonomous sense. So I would still place it in Augmentative, albeit not comfortably. (AliveCor | AliveCor Home)

8) 1036T — Noninvasive hemodynamic assessment with pulmonary pressures and ejection fraction, including passive signal acquisition, augmentative algorithmic analysis, and generation of a clinical report with physician review/integration
Best fit: Augmentative. Very clean fit. The descriptor literally says “augmentative algorithmic analysis.” The physician still reviews, interprets, and clinically integrates the report, which keeps it out of Autonomous. (American Medical Association)

9) 1041T — Augmentative algorithmic analysis of EEG waveforms to identify source and propagation of epileptiform activity, with interpretation/report
Best fit: Augmentative. Very clean fit. Again, the descriptor expressly uses augmentative, and the physician/QHP interpretation/report remains part of the service. (American Medical Association)

10) 1039T — Connectomic analysis of previously performed multimodal brain MRI, requiring physician/QHP analysis of software- and physician-generated maps, with interpretation/report
Best fit: Probably Augmentative. Wobble. This looks like algorithmic production of clinically useful maps/structures rather than mere detection. But the descriptor emphasizes joint software-plus-physician analysis and physician interpretation/report, so it is not Autonomous. It fits Augmentative better than Assistive, but only because “map generation/integration” gets forced into the parameter/output bucket. (American Medical Association)

11) 1043T — Quantitative magnetic resonance, without imaging, for analysis of liver tissue, assessing one or more parameters, with automatically generated report; no further professional interpretation required
Best fit: Probably Augmentative, with an Autonomous argument. Major wobble. This is one of the best stress fractures in the framework. On one hand, it measures one or more parameters and generates a report, which sounds Augmentative. On the other hand, the guidance says it does not require further professional interpretation, which starts to sound Autonomous. But the descriptor does not clearly say the software independently renders a clinically meaningful interpretation/conclusion or initiates action. I would place it in Augmentative, but with a bright red asterisk. (American Medical Association)

12) 1052T — Interrogation/evaluation of an insertable subcutaneous heart failure decompensation monitor; output includes daily physiologic data, historical trends, and an algorithmically derived decompensation index; threshold crossing sends clinician alerts
Best fit: Probably Augmentative. Wobble. The algorithm produces a composite decompensation index, so this is more than Assistive detection. But the alerts go to the clinician, and the report is still reviewed/interpreted by the physician/QHP; the machine is not itself initiating care in the Level II or III sense. So this lands best in Augmentative. (American Medical Association)

What the fire test shows

A few examples are easy:

  • 0764T/0765T = Assistive

  • 75580 = Augmentative

  • 92229 = Autonomous Level I

  • 1036T/1041T = Augmentative
    Those are easy either because the descriptor itself uses the taxonomy word or because AMA’s own example table pre-places them. (American Medical Association)

But several examples expose the weak spots:

First, the framework handles classic outputs better than reconstructed artifacts. The 0903T–0905T family is not mere detection, but it is not obviously an “index, score, classification, risk score, predictive/prognostic metric” either. It is a generated tracing. That is not a natural fit for the prose. (AliveCor | AliveCor Home)

Second, the framework is shaky when the code describes parameter generation plus automated reporting but not obvious autonomous conclusion/action. That is why 1043T is so useful as a stress test. It may be exactly the kind of service where a flow chart would either collapse or reveal that the prose needs another branch. (American Medical Association)

Third, the framework struggles with alerts and triage-like behavior. 1052T uses a derived index and threshold-triggered alerts, yet the clinician remains in the loop. That makes it more than Assistive, but not truly Autonomous as Appendix S defines it. (American Medical Association)

My practical summary

If I had to sort these 12 today under the current revised Appendix S logic, I would say:

So the overall verdict is: Appendix S can classify a lot of existing codes, but not gracefully. It works best where the descriptor already contains the answer. It becomes much less stable when a service is about reconstruction, mapping, indexing, alerting, or automated reporting rather than straightforward detection or independent conclusion. That is exactly why your idea of “fire testing” with 20–30 examples is so strong. It is not just a nice exercise; it is a way to reveal where the taxonomy is genuinely underpowered.

I can turn this next into a one-page blog-ready table with three columns: code, my classification, why the framework wobbles or holds.

###

a pacemaker with automatic defibrillator monitors the ecg and if a sufficient major problem is detected, it automaticaly emits a cardioversion shock (no asking the doctor!). This sounds like autonomous software to me, and would it be placed here and put on the future CMAA code list? I don't think anyone would expect that, but why or why not? 'Tis a puzzlement!

###

Yes — conceptually, your ICD example sounds very much like Autonomous, and probably Level III under the current draft language. But coding-wise, I do not think that means it would migrate into a future CMAA bucket just because autonomous software is embedded inside the device. That is exactly the kind of paradox that shows where Appendix S logic and real-world CPT architecture can diverge.

Why it sounds Autonomous first. The clean draft says Autonomous software automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement, and Level III covers software that automatically initiates management actions and continues unless the physician intervenes. Your example — device monitors rhythm, detects a qualifying malignant arrhythmia, and fires a shock with no doctor asked — sounds much closer to machine-initiated therapeutic action than to Assistive or Augmentative. In the draft’s own terms, that is the most natural place to put it, and more specifically it sounds closer to Level III than Level I or II because the therapy is actually delivered, not merely recommended or queued for override.

But the second question is the real one: would that mean it should appear as a new CMAA-style software service? I think probably not, for a practical CPT reason. Implantable cardioverter-defibrillators are already recognized in Medicare/CPT as physical device procedures and device-management services. CMS materials describe an ICD as an implanted device that continuously monitors cardiac activity and delivers therapy when it detects life-threatening arrhythmias, and Medicare coding/payment infrastructure already exists for ICD implantation and device evaluation/programming rather than treating the sensing/shocking logic as a separately coded AI service. CMS and manufacturer reimbursement materials show established ICD procedure families and device-evaluation families, including implantation codes in the 33202–33275 range and device evaluation/programming codes such as 93282–93284, 93287, 93289, 93295, and 93296. (Medtronic)

So the clean answer is:

  • As a pure Appendix S thought experiment: yes, the device’s embedded decision-and-shock logic looks like Autonomous, probably Level III.

  • As a CPT coding reality: no, I would not expect the whole service to be pulled into CMAA merely because autonomous logic exists inside the implanted device. It is more naturally understood as part of the already-coded device therapy/service architecture. (Medtronic)

That is also very consistent with your own blog instinct that physical services should remain coded by the physical service component, and a folded-inside AI component should not automatically drag everything onto CMAA. You made that point about WGS, but the same structural logic applies even more strongly to an implanted therapeutic device.

There is also support for this from the webinar’s framing. The speakers repeatedly said Appendix S adjectives are meant to describe the algorithmic analysis/output, not necessarily to reclassify an entire broader service. They also said the terms can be used in many parts of CPT “wherever those would be used,” which implies a descriptor-level tool, not a command that every device containing autonomy must be uprooted into a software-only section.

So the puzzlement is real, but I think the resolution is:

Appendix S may classify the software function as Autonomous Level III, while CPT may still leave the billable service in the traditional implanted-device code family.

That is not actually inconsistent. It just means the taxonomy and the code placement are not the same question.

In fact, your ICD example is a very good stress fracture in the framework, because it exposes an unstated rule that probably needs to be said out loud:

“Appendix S can characterize a software function without requiring that the containing physical device/service be rehomed into CMAA.”

Without a rule like that, absurd results follow. A defibrillator, insulin pump, smart neurostimulator, or ventilator with closed-loop logic would all threaten to become “software codes,” which I agree no one actually expects. Your example therefore does not refute the taxonomy, but it strongly suggests the prose needs a boundary principle between:

  1. software as the service, and

  2. software embedded in an already-coded physical therapeutic device/service.

So my net answer is:

  • Three-A label: Autonomous

  • Level: probably Level III

  • Would it therefore move to CMAA? Probably no

  • Why not? Because the better coding principle is that the autonomous logic is intrinsic to the already-coded implanted therapeutic device service, not a separately reported software service. (Medtronic)

This is exactly the kind of example that would make a great paragraph in your blog, because it shows the taxonomy can have descriptive value while still needing a scope-limiting rule to prevent category mistakes.

###

It highlights that the Appendix S does not define what enters into it, does not define AI or ML, etc. 

 This is compatible with the current committee zeitgeist and groupthink but it may bite them in the butt later.

###

Yes — I think that is a fair criticism.

The current approach seems to be: do not define AI, do not define the outer boundary of what enters Appendix S, and instead focus on classifying the software output once a case is already in front of the committee. The clean draft says exactly that there is no single product or service for which the term “AI” is sufficient or necessary, and therefore “AI” is not defined in this taxonomy. It then moves immediately to classifying outputs as assistive, augmentative, or autonomous.

That may be politically convenient in the short run. It avoids fights over definitions, avoids having to say what is “really AI,” and gives the committee flexibility. AMA wants Appendix S as a tool to help panel reviewers judge the role of the output in patient management, not as a general theory of AI or a boundary-setting document.

But your concern is the stronger one: if you do not define the domain, then the taxonomy may become unstable at the edges. That is exactly why examples like ICDs, reconstructed ECGs, WGS-plus-AI interpretation, intraoperative guidance, or structured-evidence tools become so troublesome. The committee can say, case by case, “we know it when we see it,” but that works only until the edge cases pile up.

So I think your point is:

the non-definition strategy buys flexibility now, but at the cost of scope creep, inconsistency, and future surprises.

That is also very close to what you wrote in your blog: Appendix S expects precise downstream categorization while leaving cloudy the upstream question of what falls into the universe in the first place.

My own phrasing would be:

The current committee view is understandable: keep Appendix S functional, not philosophical. 

But refusing to define the entry gate may work only as long as the examples remain easy. As software-intensive services proliferate, the lack of a scope rule may become the document’s biggest weakness.

Or even more sharply:

A taxonomy without an entry criterion is not just incomplete; it risks becoming path-dependent, ad hoc, and vulnerable to exactly the odd results no one intended.

That feels like a strong blog line.

APPENDIX S - THE FINAL REWRITE BY AI

Excellent concise summary. Now, here's a task for you. i reattach the clean Appendix S version b for the upcoming CPT meeting. You've seen numerous collateral documents and done much thinking and logic yourself. I want you to make a substantial rewrite of Appendix S. You don't need to follow the template paragraph by paragraph, take all you know now and produce "an" Appendix S, but freed of confusing things that don't work and amplified and improved by some of the changes we've seen this afternoon. of course, the general tone remains AMA CPT poliicy style. The macro structure has to stay the same - Assistive, Augmentative, Autonomous, although we now appreciate the problems caused by collapses six or seven axes of variation onto one axis of 3 points. do everything you can to improve Appendix S. If you need to delete paragraphs and insert new paragraphs, and so on, you have permission to do so. Be the best policy writer (and best thinker!) you can be.

###

Below is a fresh rewrite of Appendix S based on the current clean Option B text, the webinar’s stated goals, and the fault lines we identified: the need for an entry gate, clearer distinction between software output and the whole service, less circular reliance on “augmentative” inside “autonomous,” and a clearer statement that embedded software does not by itself reclassify an already-coded physical service. Those are all issues that appear in the current draft, the webinar discussion, and your blog critique.

I have kept the macro structure intact: Assistive, Augmentative, Autonomous. But I have rewritten the front end and the definitions to make the logic more explicit and less path-dependent.


Appendix S

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for determining the appropriate use of the terms assistive, augmentative, and autonomous in CPT code descriptors, related procedural language, and supporting materials for medical services and procedures that include software-enabled analysis.

For purposes of this taxonomy, the terms in Appendix S classify the software output function that is being described for coding. They do not, by themselves, define the entirety of a product, determine whether a service is separately reportable, or require that an already-coded physical service or device be reclassified solely because software is used within that service.

Appendix S does not define “artificial intelligence,” “machine learning,” or any other general class of software technologies. Rather, it provides terminology for describing the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

For use of a term from Appendix S, the software output must be relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and must be pertinent to patient management within the clinical context described by the code. Software that is limited to administrative, financial, scheduling, workflow, or general productivity functions does not fall within this taxonomy solely by reason of being used in health care.

The terms in Appendix S are adjectives that characterize the software-enabled analytical function being described. In some circumstances, that analytical function may correspond closely to the billable service. In other circumstances, it may describe only a component of a broader service. Classification under Appendix S therefore characterizes the software output function at issue; it does not by itself determine whether the broader service is separately reportable, bundled, or more appropriately described elsewhere in CPT.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, where applicable, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter means a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is necessarily a parameter for purposes of this taxonomy; the key consideration is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically refers to the software deriving its output without concurrent human involvement in the analytic process from input to output. Automatic derivation of output may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without independently deriving a distinct reportable parameter, and without independently generating a clinical interpretation or conclusion.

Assistive software output is clinically supportive of physician or QHP performance. It may improve, for example, accuracy, precision, consistency, inter-observer variability, efficiency of review, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

The output from assistive software requires physician or other QHP interpretation and report when the service is reported as a professional service. Assistive software does not, for purposes of this taxonomy, independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they function to draw attention to clinically relevant data rather than to provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. In general, technical or analytical validation demonstrating benefit to clinical performance may be sufficient where the assistive output does not itself alter the reportable clinical conclusion. Where the assistive output uses terminology that implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but the software does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

The augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management within the clinical context of the code. Clinical meaningfulness may be supported by, for example:

  • equivalence to a metric already used in current clinical practice;

  • consistency with clinical practice guidelines, professional standards, or accepted clinical frameworks; or

  • evidence that the novel output has a clinically relevant impact on patient management.

For augmentative services, evidence should demonstrate more than technical or analytical performance alone. The evidence should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The physician work associated with augmentative output may be separately described by the same code, may be captured in another code, or may be incorporated into a broader service, depending on the code structure.

Note: The fact that an augmentative output is used within a broader coded service does not by itself require separate reporting of the software function, nor does it require reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It includes software output that independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the claimed role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation of output is necessary but not sufficient for autonomous classification. A service is not autonomous merely because the software operates automatically. It is autonomous when the software independently provides the interpretation, conclusion, recommendation, or action that is clinically operative.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The recommendation or conclusion is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service includes a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight is exercised at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already-coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation/conclusion/action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed


#

#

#

This version tries to do five things better than the current draft:

  • first, it adds an entry gate;
  • second, it says clearly that the taxonomy classifies the software output function, not necessarily the whole product or service;
  • third, it separates automatic from autonomous more crisply;
  • fourth, it removes the awkward dependence of Autonomous on a prior Augmentative construct;
  • and fifth, it states expressly that embedded software does not automatically pull a broader physical service into a software-only coding framework. Those are all pressure points that emerged from the current draft and the related discussion.

If you want, I’ll next do one of two things:

  1. turn this into a redline-style comparison memo against Option B, or

  2. compress it into a shorter, more committee-friendly version that sounds even more like something AMA staff might circulate.

###
The re write of Appendix S is good but somewhat dry and verbose, can you improve readability and engagement of the reader.
###

Absolutely. Here is a leaner, calmer rewrite of the Appendix S text. I kept the same architecture and policy tone, but tightened sentences, reduced repetition, and tried to make each paragraph carry one clear idea.


Appendix S

Taxonomy for Software Output in Medical Services and Procedures

This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.

The terms in Appendix S classify the software output function described for coding. They do not, by themselves, define the whole product, determine whether a service is separately reportable, or require reclassification of an already coded physical service or device solely because software is used within it.

Appendix S does not define “artificial intelligence,” “machine learning,” or other general classes of software. Its purpose is narrower: to describe the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.

A term from Appendix S should be used only when the software output is relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and is pertinent to patient management in the clinical context described by the code. Software used only for administrative, financial, scheduling, workflow, or general productivity purposes does not fall within this taxonomy solely because it is used in health care.

The terms in Appendix S are adjectives describing the software-enabled analytical function at issue. In some cases, that function may closely correspond to the reportable service. In other cases, it may describe only one part of a broader service. Classification under Appendix S therefore describes the software output function, not necessarily the whole service.

Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, when appropriate, evidence of impact on patient management or clinical utility.

For purposes of this taxonomy, a parameter is a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is a parameter. The key question is whether the output is a distinct reportable result used in clinical care.

For purposes of this taxonomy, automatic or automatically means that software derives its output without concurrent human involvement in the analytic process from input to output. Automatic derivation may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.

Assistive

Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without deriving a distinct reportable parameter and without independently generating a clinical interpretation or conclusion.

Assistive software supports physician or QHP performance. It may improve accuracy, precision, consistency, efficiency of review, inter-observer variability, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.

When the service is reported as a professional service, the output from assistive software requires physician or other QHP interpretation and report. Assistive software does not independently establish a diagnosis, recommendation, or management action.

Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they serve to draw attention to clinically relevant data rather than provide a definitive interpretation, conclusion, or management recommendation.

Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. Where the assistive output does not itself alter the reportable clinical conclusion, technical or analytical validation showing benefit to clinical performance may be sufficient. Where the output implies a clinically significant inference, additional clinical validation may be appropriate.

Augmentative

Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.

An augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.

Augmentative output is clinically meaningful when the derived parameter contributes to patient management in the clinical context described by the code. Clinical meaningfulness may be supported by equivalence to a metric already used in current practice, consistency with clinical practice guidelines or accepted clinical frameworks, or evidence that a novel output has clinically relevant impact on patient management.

For augmentative services, evidence should show more than technical or analytical performance alone. It should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.

Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The related physician work may be described by the same code, captured in another code, or incorporated into a broader service, depending on code structure.

Note: The presence of an augmentative output within a broader coded service does not by itself require separate reporting of the software function or reclassification of the broader service.

Autonomous

Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.

Autonomous output goes beyond derivation of a parameter alone. It independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.

Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.

Automatic derivation is necessary but not sufficient for autonomous classification. A service is not autonomous merely because software operates automatically. It is autonomous when the software independently provides the clinically operative interpretation, conclusion, recommendation, or action.

There are three levels of autonomous medical services and procedures:

Level I

The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The conclusion or recommendation is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.

Level II

The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service provides a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.

Level III

The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight occurs at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.

Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.


Summary Table

AssistiveAugmentativeAutonomous
Primary role of software outputDetects, highlights, prioritizes, or otherwise brings clinically relevant data to attentionDerives a distinct quantitative or categorical parameter used in clinical careIndependently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action
Distinct reportable parameter derived by softwareNoYesMay include derived parameters, but autonomous status depends on the independent interpretation, conclusion, recommendation, or action
Independent clinical interpretation or conclusion by softwareNoNoYes
Machine-initiated management actionNoNoMay occur in Levels II and III
Physician or other QHP interpretation/reportRequired for the professional service when applicableMay be required, or may be incorporated into another code or broader serviceNot required for the autonomous output itself, although physician or other QHP oversight may still be required
Evidence neededAppropriate to the claimed clinical support function; often technical or analytical performance plus benefit to clinical performanceEvidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical contextEvidence sufficient to support the independent interpretation, recommendation, or management action claimed

This version trims some of the longer explanatory passages, reduces repeated qualifiers, and tries to make each section easier to read in one pass. It also keeps the key improvements from the earlier rewrite: an entry gate, a distinction between software output and the whole service, a cleaner separation of automatic from autonomous, and explicit protection against dragging every embedded-software device into a separate coding bucket.

I can also do one more pass that is even closer to official AMA committee prose — slightly stiffer, slightly shorter, and more “editorial-panel-ready.”