At a 4 16 2026 webinar on Appendix S, AMA management said they may someday do "beta testing" with sample codes and see if they are classifiable clearly, and or make a logic flow chart (if possible) from Appendix S for classification by rules.
Here's what Chat GPT 5 made of the test on 4 23 2026, using the most recent clean copy of the appendix S revision proposal.
##
##
Yes — a provisional decision tree is possible, and your instinct is basically right: the cleanest place to start is usually Autonomous first, because that category is the most outcome-oriented and easiest to test at the top level: does the software itself generate a clinically meaningful interpretation/conclusion and then recommend or initiate management? If yes, you are in Autonomous-land; if not, you fall back to the Assistive/Augmentative split. That structure fits both the text of the clean Appendix S.
But the exercise also shows why the document is hard to operationalize.
The taxonomy mixes several different axes at once:
- detection vs parameter generation vs interpretation;
- physician involvement vs machine action;
- output vs report; and
- evidentiary sufficiency for “clinical meaningfulness.”
- Your own blog puts this well: AMA is trying to classify several disjunct concepts simultaneously, and that makes clean bucket logic unstable.
Here is the best decision tree I can derive from the current proposal:
Proposed Appendix S decision tree
Step 0. Is there a software output relevant to clinical care?
Before using Appendix S at all, ask whether the software output is actually used in diagnosis, cure, mitigation, treatment, or prevention, and whether it is being characterized as part of a medical service/procedure rather than merely a productivity or workflow tool. The clean draft says the software output must be useful in clinical care, with the role of the output in patient management / medical management.
If no, Appendix S may not apply, or at least the service may not be codifiable in the contemplated way. A healthcare productivity tool is generally not codifiable in CPT.
If yes, continue.
Step 1. Does the software itself generate a clinically meaningful interpretation or conclusion?
This is the best opening gate for Autonomous. The clean text defines Autonomous as software that automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.
If yes, go to Step 1A.
If no, skip to Step 2.
Step 1A. Does the software merely recommend, or does it act?
This splits the three levels of Autonomous.
Level I Autonomous: software offers a definitive diagnostic or management recommendation, but physician/QHP judgment is required to implement or reject it.
Level II Autonomous: software initiates a medical management action, but there is a reasonable chance to negate/override before implementation.
Level III Autonomous: software automatically initiates management actions that continue unless the physician/QHP intervenes.
This is also where the protocol emphasizes the crucial distinction between automaticity and autonomy. A tool can be automatic in getting from input to output without being Autonomous in the Appendix S sense.
Step 2. If not Autonomous: does the software output derive a new quantitative or categorical parameter distinct from the input?
This is the key gate for Augmentative. The clean draft says Augmentative exists when the output represents a quantitative or categorical parameter qualitatively different from the input, and more than adding/averaging/descriptive statistics.
If yes, go to Step 2A.
If no, go to Step 3.
Step 2A. Is the output clinically meaningful?
Under the proposal, Augmentative — and by extension Autonomous — hinges on “clinical meaningfulness.” The clean draft says the output must contribute to patient management and be supported by evidence beyond merely technical/analytical validation, beyond mere statistical significance, and directly pertinent to the code characteristics.
If yes, classify as Augmentative.
If no, the taxonomy becomes unstable: the text suggests you are not yet entitled to call it Augmentative, but it is also more than merely Assistive if it truly derives a new parameter. This is one of the places where the logic frays.
Step 3. If no new parameter is derived: does the software merely detect clinically relevant data, without interpretation or conclusions?
This is the cleanest definition of Assistive. The clean text says Assistive software detects clinically relevant data without deriving a parameter, generating an interpretation, or providing conclusions. It improves physician/QHP performance, while the primary service outcome may remain unchanged.
If yes, classify as Assistive.
If no, the taxonomy again becomes unstable, because there are outputs that may structure, summarize, triage, or otherwise transform information without fitting neatly into pure “detection” or pure “parameter generation.”
The same logic as short yes/no questions
Is the software output part of patient care / medical management rather than mere workflow?
Does the software itself generate a clinically meaningful interpretation or conclusion?
If not, does it derive a new quantitative or categorical parameter qualitatively different from the input?
If not, does it merely detect clinically relevant data, leaving interpretation/report to the physician/QHP?
If none of the above fits cleanly, the taxonomy is under-specified for that use case.
Why the logic partly works
It works because the current draft does contain a rough hierarchy of outputs:
Assistive = detects clinically relevant data, no new parameter, no conclusion.
Augmentative = produces a distinct parameter, but not a definitive interpretation/conclusion.
Autonomous = goes further and independently interprets/concludes, then may recommend or initiate management.
That is why starting with Autonomous makes sense. It tests the most advanced function first, then backs down to parameter generation, then backs down to detection.
Why the logic also collapses
Here is where your skepticism is justified.
1. The tree is not built on a single dimension
The categories are not based on one clean axis. They mix:
output type,
evidentiary burden,
physician involvement,
whether the machine acts,
and whether the software is a full service or only a component of a service.
That is why the same service can feel “assistive” in one respect and “autonomous” in another unless you are very strict about what exactly is being classified.
2. “Output” versus “report” versus “service” is slippery
Does Appendix S classifies the whole service, the algorithmic analysis, or just the software output component. The "Triple A" terms are adjectives of the algorithmic analysis/output, not necessarily of the entire service. But this variance may introduce logical wobble.
3. “Clinically meaningful” is doing too much work
The clean draft makes clinical meaningfulness central for Augmentative and, by extension, Autonomous. However, this extension from Augmentative to Autonomous (of clinically meaningful) is implied..
That means your tree has to ask not only “what does the software do?” but also “has enough evidence been shown for me to call that output clinically meaningful?” That is not a pure logical branch; it is partly an evidentiary judgment.
4. Automatic is not Autonomous
This is important and non-intuitive. Software can run automatically from input to output and still not be Autonomous. That means a flow chart that begins with “Is it automatic?” would mislead users.
5. Borderline cases do not fit neatly
As the eligible services and codes expand, so will the gray zones:
detection plus notification/triage,
structuring or summarizing evidence rather than simple detection,
intraoperative or intraprocedural support,
tools embedded inside broader physical services.
Those are exactly the cases likely to “break” a decision tree. They point to areas where the prose can be clarified and the logic tree improved thereafter.
Your hypothesis about path dependence
I think you are right that the logic is path dependent, and in two ways.
First, Autonomous-first is the cleaner routing logic, because Autonomous is partly defined by the presence of functions that exceed Augmentative: clinically meaningful interpretation/conclusion plus recommendation/initiation of management. If those are absent, you can step down to Augmentative and then Assistive.
Second, the answer can depend on which question is asked first. For example:
If you ask first, “Does it generate a parameter?”, you may classify something as Augmentative before noticing that it also independently concludes and initiates action, which would push it to Autonomous.
If you ask first, “Does it improve physician performance?”, almost everything starts sounding Assistive.
If you ask first, “Is there physician involvement?”, you can get misled by the automatic/autonomous distinction.
So yes: the same service can appear to move differently through the tree depending on the entry point, which is a sign the prose taxonomy is not yet logically orthogonal.
My bottom-line judgment
Today, a usable working flow chart is possible, but only as a triage tool, not as a fully determinate classifier.
The best working version is:
Does the software independently generate a clinically meaningful interpretation/conclusion and recommend/initiate management?
→ Autonomous.
If not, does it generate a distinct quantitative/categorical parameter beyond descriptive statistics?
→ Augmentative, if clinically meaningful.
If not, does it merely detect clinically relevant data for physician interpretation/report?
→ Assistive.
If still unclear, the case exposes a drafting gap.
That would be illuminating precisely because it shows where the taxonomy works and where it does not.
And if the flow chart repeatedly crashes on structured-evidence, triage, embedded-software, or mixed-function tools, that failure (of today's chart) would be as valuable as success — exactly the point you made.
The most important practical fix, in my view, would be to preface any flow chart with one explicit rule:
“Classify the software output function being claimed for coding — not the mere presence of software, not the whole physical service unless that software function itself is the claimed service.”
###
I gave Chat GPT a couple recent Cat III codes to assess as a beta test.
###
+0764T / 0765T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, tied to concurrent or prior ECG use. CMS assigned OPPS/ASC payment status effective 2025 according to vendor coding guides and public reporting.
0903T–0905T — algorithmically generated 12-lead ECG from a reduced-lead ECG, with separate codes for global/tracing/interpretation-only configurations.
###
Using only the code text you gave, plus the current Appendix S logic, I’d characterize them this way:
0764T / 0765T
Best fit: Assistive.
In fact, the current clean Appendix S table itself uses “Algorithmic electrocardiogram risk-based assessment for cardiac dysfunction (0764T, 0765T)” as the example under Assistive.
Why that is a bit surprising, but still the document’s answer:
-
Assistive is defined as software that detects clinically relevant data without deriving a parameter, interpretation, or conclusion, and it requires physician/QHP interpretation and report.
-
The drafters intentionally allow Assistive outputs to include terms like “likelihood of,” “suggestive of,” or “risk for.”
-
Your code summary for 0764T/0765T says AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report. That lines up closely with the Assistive bucket as the draft currently describes it.
So, under the present proposal, 0764T/0765T = Assistive, even if a lot of readers might instinctively think “risk-based assessment” sounds more augmentative. The draft has effectively pre-decided that one example.
Autonomous Level 1/2/3?
No. Not autonomous. The code text expressly keeps the physician/QHP interpretation/report in the loop, and Autonomous in Appendix S requires the software to independently generate clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement.
So there is no Level I/II/III assignment here.
0903T–0905T
Best fit from the code text alone: probably Augmentative — but this is much less certain.
My reasoning:
-
These codes describe an algorithmically generated 12-lead ECG from a reduced-lead ECG.
-
That sounds like more than mere detection. The software is not just flagging a feature; it is transforming limited input into a more elaborate clinically usable output.
-
Yet the code text, as you summarized it, does not sound like the software is independently issuing a diagnostic interpretation or management conclusion. So it does not sound Autonomous.
-
That leaves Augmentative as the closest fit, because Augmentative covers software whose output is a quantitative or categorical parameter qualitatively different from the input, and more than simple descriptive reporting, but without a definitive interpretation or conclusion.
That said, there is an awkwardness here: a reconstructed 12-lead tracing is not obviously the kind of “parameter” Appendix S talks about, like an index, score, classification, risk score, predictive/prognostic metric. That is why I say probably Augmentative, but not cleanly.
In plain English:
-
It is not Assistive, because it seems to do more than detect.
-
It is not Autonomous, because it does not appear to independently conclude/diagnose/manage.
-
So it lands closest to Augmentative, but as an example of how Appendix S fits some services awkwardly.
Autonomous Level 1/2/3?
Again, no. Nothing in the code text you gave suggests the software:
-
makes a definitive diagnosis recommendation,
-
initiates a management action with override,
-
or automatically initiates ongoing management.
So no autonomous level.
Bottom line
My classification from the code text alone would be:
-
0764T / 0765T → Assistive
-
and not just by inference: the Appendix S draft’s own table uses them as the Assistive example.
-
0903T–0905T → probably Augmentative
-
but with a real caveat that these codes expose a weakness in the Appendix S prose, because a reconstructed waveform is not a neat “score/index/classification” style output.
-
Neither appears Autonomous, so no Level I/II/III for either code family.
The truly interesting point is that 0903T–0905T may be exactly the sort of example that would stress-test the framework. It does something clearly more than detection, but not obviously what Appendix S writers have in mind when they talk about “clinically meaningful parameter generation.”
###
STRESS TEST LOGIC WITH TEN EXAMPLES
###
Yes — here is a fire test with 12 real examples. The broad result is that the framework works cleanly for a handful of services whose descriptors already use words like augmentative or match the Appendix S example table, but it starts to wobble when a code describes reconstruction, mapping, indexing, or alerting rather than straightforward detection, parameter-generation, or independent conclusion/action.
That is very much in line with the tensions you flagged in your blog.
I’d use this shorthand:
Clean fit = Appendix S handles it fairly well.
Wobble = the code can be placed, but only by inference or with some strain.
Major wobble = the code exposes a real drafting gap.
The fire test
1) 0764T — AI/computerized ECG analysis for detection of cardiac pathology/cardiac dysfunction, with physician/QHP interpretation/report, performed with a concurrent ECG
Best fit: Assistive. Clean fit. The official Appendix S example table itself places 0764T/0765T under Assistive, and the descriptor retains physician/QHP interpretation/report rather than independent machine conclusion. (American Medical Association)
2) 0765T — same family, but based on a prior ECG
Best fit: Assistive. Clean fit. Same reasoning as 0764T. The code family is effectively the canonical Assistive ECG example. (American Medical Association)
3) 75580 — Noninvasive estimate of coronary fractional flow reserve (FFR) derived from augmentative software analysis of coronary CTA data, with interpretation/report
Best fit: Augmentative. Very clean fit. The descriptor itself uses the phrase “derived from augmentative software analysis,” and the Appendix S example table also places 75580 under Augmentative. (UHC Provider)
4) 92229 — Retinal imaging with point-of-care automated analysis and diagnostic report
Best fit: Autonomous, Level I. Very clean fit. The Appendix S example table places 92229 under Autonomous, so we can identify 92229 as the then-current Autonomous example. It produces an automated diagnostic report, but the code does not describe machine-initiated treatment, so it fits Level I rather than II or III. (American Medical Association)
5) 0903T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; with interpretation and report
Best fit: Probably Augmentative. Wobble. The algorithm transforms limited input into a richer output, which feels more like parameterized/derived output than mere detection, but the output is a reconstructed tracing rather than a classic “index, score, or classification.” It does not appear to independently generate a definitive conclusion, so it falls short of Autonomous. (AliveCor | AliveCor Home)
6) 0904T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; tracing only
Best fit: Probably Augmentative. Major wobble. This may be the most awkward ECG example. It is clearly more than Assistive detection, but it is only a tracing and not an interpretation. It seems most consistent with Augmentative, yet it reveals that Appendix S does not naturally handle “algorithmic waveform reconstruction” very elegantly. (AliveCor | AliveCor Home)
7) 0905T — Algorithmically generated 12-lead ECG from a reduced-lead ECG; interpretation and report only
Best fit: Probably Augmentative. Wobble. Same problem as 0903T/0904T: the service depends on algorithmic reconstruction, but the descriptor does not say the software itself independently concludes/acts in the Autonomous sense. So I would still place it in Augmentative, albeit not comfortably. (AliveCor | AliveCor Home)
8) 1036T — Noninvasive hemodynamic assessment with pulmonary pressures and ejection fraction, including passive signal acquisition, augmentative algorithmic analysis, and generation of a clinical report with physician review/integration
Best fit: Augmentative. Very clean fit. The descriptor literally says “augmentative algorithmic analysis.” The physician still reviews, interprets, and clinically integrates the report, which keeps it out of Autonomous. (American Medical Association)
9) 1041T — Augmentative algorithmic analysis of EEG waveforms to identify source and propagation of epileptiform activity, with interpretation/report
Best fit: Augmentative. Very clean fit. Again, the descriptor expressly uses augmentative, and the physician/QHP interpretation/report remains part of the service. (American Medical Association)
10) 1039T — Connectomic analysis of previously performed multimodal brain MRI, requiring physician/QHP analysis of software- and physician-generated maps, with interpretation/report
Best fit: Probably Augmentative. Wobble. This looks like algorithmic production of clinically useful maps/structures rather than mere detection. But the descriptor emphasizes joint software-plus-physician analysis and physician interpretation/report, so it is not Autonomous. It fits Augmentative better than Assistive, but only because “map generation/integration” gets forced into the parameter/output bucket. (American Medical Association)
11) 1043T — Quantitative magnetic resonance, without imaging, for analysis of liver tissue, assessing one or more parameters, with automatically generated report; no further professional interpretation required
Best fit: Probably Augmentative, with an Autonomous argument. Major wobble. This is one of the best stress fractures in the framework. On one hand, it measures one or more parameters and generates a report, which sounds Augmentative. On the other hand, the guidance says it does not require further professional interpretation, which starts to sound Autonomous. But the descriptor does not clearly say the software independently renders a clinically meaningful interpretation/conclusion or initiates action. I would place it in Augmentative, but with a bright red asterisk. (American Medical Association)
12) 1052T — Interrogation/evaluation of an insertable subcutaneous heart failure decompensation monitor; output includes daily physiologic data, historical trends, and an algorithmically derived decompensation index; threshold crossing sends clinician alerts
Best fit: Probably Augmentative. Wobble. The algorithm produces a composite decompensation index, so this is more than Assistive detection. But the alerts go to the clinician, and the report is still reviewed/interpreted by the physician/QHP; the machine is not itself initiating care in the Level II or III sense. So this lands best in Augmentative. (American Medical Association)
What the fire test shows
A few examples are easy:
0764T/0765T = Assistive
75580 = Augmentative
92229 = Autonomous Level I
1036T/1041T = Augmentative
Those are easy either because the descriptor itself uses the taxonomy word or because AMA’s own example table pre-places them. (American Medical Association)
But several examples expose the weak spots:
First, the framework handles classic outputs better than reconstructed artifacts. The 0903T–0905T family is not mere detection, but it is not obviously an “index, score, classification, risk score, predictive/prognostic metric” either. It is a generated tracing. That is not a natural fit for the prose. (AliveCor | AliveCor Home)
Second, the framework is shaky when the code describes parameter generation plus automated reporting but not obvious autonomous conclusion/action. That is why 1043T is so useful as a stress test. It may be exactly the kind of service where a flow chart would either collapse or reveal that the prose needs another branch. (American Medical Association)
Third, the framework struggles with alerts and triage-like behavior. 1052T uses a derived index and threshold-triggered alerts, yet the clinician remains in the loop. That makes it more than Assistive, but not truly Autonomous as Appendix S defines it. (American Medical Association)
My practical summary
If I had to sort these 12 today under the current revised Appendix S logic, I would say:
So the overall verdict is: Appendix S can classify a lot of existing codes, but not gracefully. It works best where the descriptor already contains the answer. It becomes much less stable when a service is about reconstruction, mapping, indexing, alerting, or automated reporting rather than straightforward detection or independent conclusion. That is exactly why your idea of “fire testing” with 20–30 examples is so strong. It is not just a nice exercise; it is a way to reveal where the taxonomy is genuinely underpowered.
I can turn this next into a one-page blog-ready table with three columns: code, my classification, why the framework wobbles or holds.
###
a pacemaker with automatic defibrillator monitors the ecg and if a sufficient major problem is detected, it automaticaly emits a cardioversion shock (no asking the doctor!). This sounds like autonomous software to me, and would it be placed here and put on the future CMAA code list? I don't think anyone would expect that, but why or why not? 'Tis a puzzlement!
###
Yes — conceptually, your ICD example sounds very much like Autonomous, and probably Level III under the current draft language. But coding-wise, I do not think that means it would migrate into a future CMAA bucket just because autonomous software is embedded inside the device. That is exactly the kind of paradox that shows where Appendix S logic and real-world CPT architecture can diverge.
Why it sounds Autonomous first. The clean draft says Autonomous software automatically derives parameters and independently generates clinically meaningful interpretations or conclusions without concurrent physician/QHP involvement, and Level III covers software that automatically initiates management actions and continues unless the physician intervenes. Your example — device monitors rhythm, detects a qualifying malignant arrhythmia, and fires a shock with no doctor asked — sounds much closer to machine-initiated therapeutic action than to Assistive or Augmentative. In the draft’s own terms, that is the most natural place to put it, and more specifically it sounds closer to Level III than Level I or II because the therapy is actually delivered, not merely recommended or queued for override.
But the second question is the real one: would that mean it should appear as a new CMAA-style software service? I think probably not, for a practical CPT reason. Implantable cardioverter-defibrillators are already recognized in Medicare/CPT as physical device procedures and device-management services. CMS materials describe an ICD as an implanted device that continuously monitors cardiac activity and delivers therapy when it detects life-threatening arrhythmias, and Medicare coding/payment infrastructure already exists for ICD implantation and device evaluation/programming rather than treating the sensing/shocking logic as a separately coded AI service. CMS and manufacturer reimbursement materials show established ICD procedure families and device-evaluation families, including implantation codes in the 33202–33275 range and device evaluation/programming codes such as 93282–93284, 93287, 93289, 93295, and 93296. (Medtronic)
So the clean answer is:
As a pure Appendix S thought experiment: yes, the device’s embedded decision-and-shock logic looks like Autonomous, probably Level III.
As a CPT coding reality: no, I would not expect the whole service to be pulled into CMAA merely because autonomous logic exists inside the implanted device. It is more naturally understood as part of the already-coded device therapy/service architecture. (Medtronic)
That is also very consistent with your own blog instinct that physical services should remain coded by the physical service component, and a folded-inside AI component should not automatically drag everything onto CMAA. You made that point about WGS, but the same structural logic applies even more strongly to an implanted therapeutic device.
There is also support for this from the webinar’s framing. The speakers repeatedly said Appendix S adjectives are meant to describe the algorithmic analysis/output, not necessarily to reclassify an entire broader service. They also said the terms can be used in many parts of CPT “wherever those would be used,” which implies a descriptor-level tool, not a command that every device containing autonomy must be uprooted into a software-only section.
So the puzzlement is real, but I think the resolution is:
Appendix S may classify the software function as Autonomous Level III, while CPT may still leave the billable service in the traditional implanted-device code family.
That is not actually inconsistent. It just means the taxonomy and the code placement are not the same question.
In fact, your ICD example is a very good stress fracture in the framework, because it exposes an unstated rule that probably needs to be said out loud:
“Appendix S can characterize a software function without requiring that the containing physical device/service be rehomed into CMAA.”
Without a rule like that, absurd results follow. A defibrillator, insulin pump, smart neurostimulator, or ventilator with closed-loop logic would all threaten to become “software codes,” which I agree no one actually expects. Your example therefore does not refute the taxonomy, but it strongly suggests the prose needs a boundary principle between:
software as the service, and
software embedded in an already-coded physical therapeutic device/service.
So my net answer is:
Three-A label: Autonomous
Level: probably Level III
Would it therefore move to CMAA? Probably no
Why not? Because the better coding principle is that the autonomous logic is intrinsic to the already-coded implanted therapeutic device service, not a separately reported software service. (Medtronic)
This is exactly the kind of example that would make a great paragraph in your blog, because it shows the taxonomy can have descriptive value while still needing a scope-limiting rule to prevent category mistakes.
###
It highlights that the Appendix S does not define what enters into it, does not define AI or ML, etc.
This is compatible with the current committee zeitgeist and groupthink but it may bite them in the butt later.
###
Yes — I think that is a fair criticism.
The current approach seems to be: do not define AI, do not define the outer boundary of what enters Appendix S, and instead focus on classifying the software output once a case is already in front of the committee. The clean draft says exactly that there is no single product or service for which the term “AI” is sufficient or necessary, and therefore “AI” is not defined in this taxonomy. It then moves immediately to classifying outputs as assistive, augmentative, or autonomous.
That may be politically convenient in the short run. It avoids fights over definitions, avoids having to say what is “really AI,” and gives the committee flexibility. AMA wants Appendix S as a tool to help panel reviewers judge the role of the output in patient management, not as a general theory of AI or a boundary-setting document.
But your concern is the stronger one: if you do not define the domain, then the taxonomy may become unstable at the edges. That is exactly why examples like ICDs, reconstructed ECGs, WGS-plus-AI interpretation, intraoperative guidance, or structured-evidence tools become so troublesome. The committee can say, case by case, “we know it when we see it,” but that works only until the edge cases pile up.
So I think your point is:
the non-definition strategy buys flexibility now, but at the cost of scope creep, inconsistency, and future surprises.
That is also very close to what you wrote in your blog: Appendix S expects precise downstream categorization while leaving cloudy the upstream question of what falls into the universe in the first place.
My own phrasing would be:
The current committee view is understandable: keep Appendix S functional, not philosophical.
But refusing to define the entry gate may work only as long as the examples remain easy. As software-intensive services proliferate, the lack of a scope rule may become the document’s biggest weakness.
Or even more sharply:
A taxonomy without an entry criterion is not just incomplete; it risks becoming path-dependent, ad hoc, and vulnerable to exactly the odd results no one intended.
That feels like a strong blog line.
APPENDIX S - THE FINAL REWRITE BY AI
Excellent concise summary. Now, here's a task for you. i reattach the clean Appendix S version b for the upcoming CPT meeting. You've seen numerous collateral documents and done much thinking and logic yourself. I want you to make a substantial rewrite of Appendix S. You don't need to follow the template paragraph by paragraph, take all you know now and produce "an" Appendix S, but freed of confusing things that don't work and amplified and improved by some of the changes we've seen this afternoon. of course, the general tone remains AMA CPT poliicy style. The macro structure has to stay the same - Assistive, Augmentative, Autonomous, although we now appreciate the problems caused by collapses six or seven axes of variation onto one axis of 3 points. do everything you can to improve Appendix S. If you need to delete paragraphs and insert new paragraphs, and so on, you have permission to do so. Be the best policy writer (and best thinker!) you can be.
###
Below is a fresh rewrite of Appendix S based on the current clean Option B text, the webinar’s stated goals, and the fault lines we identified: the need for an entry gate, clearer distinction between software output and the whole service, less circular reliance on “augmentative” inside “autonomous,” and a clearer statement that embedded software does not by itself reclassify an already-coded physical service. Those are all issues that appear in the current draft, the webinar discussion, and your blog critique.
I have kept the macro structure intact: Assistive, Augmentative, Autonomous. But I have rewritten the front end and the definitions to make the logic more explicit and less path-dependent.
Appendix S
Taxonomy for Software Output in Medical Services and Procedures
This taxonomy provides guidance for determining the appropriate use of the terms assistive, augmentative, and autonomous in CPT code descriptors, related procedural language, and supporting materials for medical services and procedures that include software-enabled analysis.
For purposes of this taxonomy, the terms in Appendix S classify the software output function that is being described for coding. They do not, by themselves, define the entirety of a product, determine whether a service is separately reportable, or require that an already-coded physical service or device be reclassified solely because software is used within that service.
Appendix S does not define “artificial intelligence,” “machine learning,” or any other general class of software technologies. Rather, it provides terminology for describing the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.
For use of a term from Appendix S, the software output must be relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and must be pertinent to patient management within the clinical context described by the code. Software that is limited to administrative, financial, scheduling, workflow, or general productivity functions does not fall within this taxonomy solely by reason of being used in health care.
The terms in Appendix S are adjectives that characterize the software-enabled analytical function being described. In some circumstances, that analytical function may correspond closely to the billable service. In other circumstances, it may describe only a component of a broader service. Classification under Appendix S therefore characterizes the software output function at issue; it does not by itself determine whether the broader service is separately reportable, bundled, or more appropriately described elsewhere in CPT.
Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, where applicable, evidence of impact on patient management or clinical utility.
For purposes of this taxonomy, a parameter means a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is necessarily a parameter for purposes of this taxonomy; the key consideration is whether the output is a distinct reportable result used in clinical care.
For purposes of this taxonomy, automatic or automatically refers to the software deriving its output without concurrent human involvement in the analytic process from input to output. Automatic derivation of output may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.
Assistive
Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without independently deriving a distinct reportable parameter, and without independently generating a clinical interpretation or conclusion.
Assistive software output is clinically supportive of physician or QHP performance. It may improve, for example, accuracy, precision, consistency, inter-observer variability, efficiency of review, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.
The output from assistive software requires physician or other QHP interpretation and report when the service is reported as a professional service. Assistive software does not, for purposes of this taxonomy, independently establish a diagnosis, recommendation, or management action.
Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they function to draw attention to clinically relevant data rather than to provide a definitive interpretation, conclusion, or management recommendation.
Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. In general, technical or analytical validation demonstrating benefit to clinical performance may be sufficient where the assistive output does not itself alter the reportable clinical conclusion. Where the assistive output uses terminology that implies a clinically significant inference, additional clinical validation may be appropriate.
Augmentative
Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but the software does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.
The augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.
Augmentative output is clinically meaningful when the derived parameter contributes to patient management within the clinical context of the code. Clinical meaningfulness may be supported by, for example:
equivalence to a metric already used in current clinical practice;
consistency with clinical practice guidelines, professional standards, or accepted clinical frameworks; or
evidence that the novel output has a clinically relevant impact on patient management.
For augmentative services, evidence should demonstrate more than technical or analytical performance alone. The evidence should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.
Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The physician work associated with augmentative output may be separately described by the same code, may be captured in another code, or may be incorporated into a broader service, depending on the code structure.
Note: The fact that an augmentative output is used within a broader coded service does not by itself require separate reporting of the software function, nor does it require reclassification of the broader service.
Autonomous
Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.
Autonomous output goes beyond derivation of a parameter alone. It includes software output that independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.
Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the claimed role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.
Automatic derivation of output is necessary but not sufficient for autonomous classification. A service is not autonomous merely because the software operates automatically. It is autonomous when the software independently provides the interpretation, conclusion, recommendation, or action that is clinically operative.
There are three levels of autonomous medical services and procedures:
Level I
The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The recommendation or conclusion is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.
Level II
The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service includes a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.
Level III
The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight is exercised at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.
Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already-coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.
Summary Table
| Assistive | Augmentative | Autonomous |
|---|
| Primary role of software output | Detects, highlights, prioritizes, or otherwise brings clinically relevant data to attention | Derives a distinct quantitative or categorical parameter used in clinical care | Independently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action |
| Distinct reportable parameter derived by software | No | Yes | May include derived parameters, but autonomous status depends on the independent interpretation/conclusion/action |
| Independent clinical interpretation or conclusion by software | No | No | Yes |
| Machine-initiated management action | No | No | May occur in Levels II and III |
| Physician or other QHP interpretation/report | Required for the professional service when applicable | May be required, or may be incorporated into another code or broader service | Not required for the autonomous output itself, although physician or other QHP oversight may still be required |
| Evidence needed | Appropriate to claimed clinical support function; often technical or analytical performance plus benefit to clinical performance | Evidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical context | Evidence sufficient to support the independent interpretation, recommendation, or management action claimed |
#
#
#
This version tries to do five things better than the current draft:
- first, it adds an entry gate;
- second, it says clearly that the taxonomy classifies the software output function, not necessarily the whole product or service;
- third, it separates automatic from autonomous more crisply;
- fourth, it removes the awkward dependence of Autonomous on a prior Augmentative construct;
- and fifth, it states expressly that embedded software does not automatically pull a broader physical service into a software-only coding framework. Those are all pressure points that emerged from the current draft and the related discussion.
If you want, I’ll next do one of two things:
turn this into a redline-style comparison memo against Option B, or
compress it into a shorter, more committee-friendly version that sounds even more like something AMA staff might circulate.
###
The re write of Appendix S is good but somewhat dry and verbose, can you improve readability and engagement of the reader.
###
Absolutely. Here is a leaner, calmer rewrite of the Appendix S text. I kept the same architecture and policy tone, but tightened sentences, reduced repetition, and tried to make each paragraph carry one clear idea.
Appendix S
Taxonomy for Software Output in Medical Services and Procedures
This taxonomy provides guidance for the use of the terms assistive, augmentative, and autonomous in CPT code descriptors and related procedural language for medical services and procedures that include software-enabled analysis.
The terms in Appendix S classify the software output function described for coding. They do not, by themselves, define the whole product, determine whether a service is separately reportable, or require reclassification of an already coded physical service or device solely because software is used within it.
Appendix S does not define “artificial intelligence,” “machine learning,” or other general classes of software. Its purpose is narrower: to describe the clinical role of software output in a medical service or procedure. The relevant question is not whether a product is labeled “AI,” but what the software output does in clinical care.
A term from Appendix S should be used only when the software output is relevant to the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and is pertinent to patient management in the clinical context described by the code. Software used only for administrative, financial, scheduling, workflow, or general productivity purposes does not fall within this taxonomy solely because it is used in health care.
The terms in Appendix S are adjectives describing the software-enabled analytical function at issue. In some cases, that function may closely correspond to the reportable service. In other cases, it may describe only one part of a broader service. Classification under Appendix S therefore describes the software output function, not necessarily the whole service.
Use of a term from Appendix S should be supported by evidence appropriate to the role of the software output in clinical care and consistent with CPT code criteria. Depending on the claimed function of the software output, such evidence may include technical validity, analytical validity, clinical validity, and, when appropriate, evidence of impact on patient management or clinical utility.
For purposes of this taxonomy, a parameter is a quantitative or categorical output derived by software and reported for clinical use, such as an index, score, classification, risk estimate, measurement, or other metric. Not every transformed output is a parameter. The key question is whether the output is a distinct reportable result used in clinical care.
For purposes of this taxonomy, automatic or automatically means that software derives its output without concurrent human involvement in the analytic process from input to output. Automatic derivation may occur in either augmentative or autonomous services and does not, by itself, establish autonomous status.
Assistive
Services and procedures performed by software for the physician or other qualified health care professional (QHP) are assistive when the software output detects, identifies, highlights, prioritizes, or otherwise brings clinically relevant data to the attention of the physician or other QHP without deriving a distinct reportable parameter and without independently generating a clinical interpretation or conclusion.
Assistive software supports physician or QHP performance. It may improve accuracy, precision, consistency, efficiency of review, inter-observer variability, or timeliness of recognition of clinically relevant data, while the reportable outcome of the primary clinical service may remain unchanged.
When the service is reported as a professional service, the output from assistive software requires physician or other QHP interpretation and report. Assistive software does not independently establish a diagnosis, recommendation, or management action.
Assistive outputs may include indications such as the presence, location, or salience of clinically relevant data. Terms such as “likelihood of,” “suggestive of,” or “risk for” may be consistent with assistive output only when they serve to draw attention to clinically relevant data rather than provide a definitive interpretation, conclusion, or management recommendation.
Evidence supporting assistive software should be appropriate to the claimed role of the output in clinical care. Where the assistive output does not itself alter the reportable clinical conclusion, technical or analytical validation showing benefit to clinical performance may be sufficient. Where the output implies a clinically significant inference, additional clinical validation may be appropriate.
Augmentative
Services and procedures performed by software for the physician or other QHP are augmentative when the software output derives a distinct quantitative or categorical parameter that is qualitatively different from the input and is used in clinical care, but does not independently provide a definitive clinical interpretation, conclusion, or management recommendation.
An augmentative output must be more than a restatement, summation, averaging, display, transmission, or descriptive presentation of input data. It must provide a distinct reportable result, such as a scale, index, classification, measurement, score, probability estimate, risk estimate, predictive measure, prognostic measure, mapping output, or other metric used in clinical care.
Augmentative output is clinically meaningful when the derived parameter contributes to patient management in the clinical context described by the code. Clinical meaningfulness may be supported by equivalence to a metric already used in current practice, consistency with clinical practice guidelines or accepted clinical frameworks, or evidence that a novel output has clinically relevant impact on patient management.
For augmentative services, evidence should show more than technical or analytical performance alone. It should support the clinical relevance of the derived parameter for the patient population, service, and clinical context described by the code.
Software with augmentative outputs may operate automatically or may require physician or other QHP interaction during the analytic process. The related physician work may be described by the same code, captured in another code, or incorporated into a broader service, depending on code structure.
Note: The presence of an augmentative output within a broader coded service does not by itself require separate reporting of the software function or reclassification of the broader service.
Autonomous
Services and procedures performed by software for the physician or other QHP are autonomous when the software independently generates a clinically meaningful interpretation, conclusion, recommendation, or management action from clinical input data without concurrent physician or other QHP involvement in the analytic process.
Autonomous output goes beyond derivation of a parameter alone. It independently interprets data or derived parameters and produces a clinically operative result, such as a definitive diagnostic conclusion, a specific management recommendation, or initiation of a management action.
Clinical meaningfulness for autonomous output should be supported by evidence appropriate to the role of the software in patient care. Such support may include consistency with clinical practice guidelines or accepted clinical standards, direct evidence of impact on patient management, or other evidence sufficient to support the degree of autonomy being described.
Automatic derivation is necessary but not sufficient for autonomous classification. A service is not autonomous merely because software operates automatically. It is autonomous when the software independently provides the clinically operative interpretation, conclusion, recommendation, or action.
There are three levels of autonomous medical services and procedures:
Level I
The output of autonomous software includes a definitive diagnostic conclusion and/or a specific management recommendation. The conclusion or recommendation is contestable and requires physician or other QHP judgment to implement, reject, or otherwise act upon it.
Level II
The output of autonomous software includes initiation of a management action based on the software’s interpretation or conclusion, but the service provides a reasonable opportunity for physician or other QHP review, interruption, or override before the action is carried out.
Level III
The output of autonomous software includes automatic initiation and continuation of a management action based on the software’s interpretation or conclusion, such that physician or other QHP oversight occurs at a supervisory level rather than through case-by-case concurrence before each action. Level III services require ongoing physician or other QHP oversight appropriate to the clinical circumstance and the management action being performed.
Note: Autonomous classification describes the role of the software output in the service at issue. It does not by itself require that a broader physical device, procedure, or other already coded service be separately reclassified or reported under a software-specific code family solely because autonomous software is embedded within that service.
Summary Table
| Assistive | Augmentative | Autonomous |
|---|
| Primary role of software output | Detects, highlights, prioritizes, or otherwise brings clinically relevant data to attention | Derives a distinct quantitative or categorical parameter used in clinical care | Independently interprets data or derived parameters to generate a clinically operative conclusion, recommendation, or action |
| Distinct reportable parameter derived by software | No | Yes | May include derived parameters, but autonomous status depends on the independent interpretation, conclusion, recommendation, or action |
| Independent clinical interpretation or conclusion by software | No | No | Yes |
| Machine-initiated management action | No | No | May occur in Levels II and III |
| Physician or other QHP interpretation/report | Required for the professional service when applicable | May be required, or may be incorporated into another code or broader service | Not required for the autonomous output itself, although physician or other QHP oversight may still be required |
| Evidence needed | Appropriate to the claimed clinical support function; often technical or analytical performance plus benefit to clinical performance | Evidence sufficient to support clinical meaningfulness of the derived parameter in the code’s clinical context | Evidence sufficient to support the independent interpretation, recommendation, or management action claimed |
This version trims some of the longer explanatory passages, reduces repeated qualifiers, and tries to make each section easier to read in one pass. It also keeps the key improvements from the earlier rewrite: an entry gate, a distinction between software output and the whole service, a cleaner separation of automatic from autonomous, and explicit protection against dragging every embedded-software device into a separate coding bucket.
I can also do one more pass that is even closer to official AMA committee prose — slightly stiffer, slightly shorter, and more “editorial-panel-ready.”