2025 to 0326 summary (about 150 words):
The March 26, 2026 revision of Appendix S is not just an edit of the 2025 version; it is a substantial effort to turn Appendix S from a simple taxonomy into a more operational CPT policy framework for software-intensive services. The 2025 version mainly defined assistive, augmentative, and autonomous services at a high level. The 0326 version keeps those categories but adds much more about software outputs, reference services in current clinical practice, and the types of evidence needed to justify each category. It narrows assistive by warning that terms like “risk for” or “suggestive of” may require clinical validation. It raises the threshold for augmentative by demanding outputs that are not merely statistical but clinically meaningful, clinically important, and pertinent to the CPT descriptor. It also tightens autonomous claims by emphasizing transparency, guidelines, and clinical utility, suggesting the drafters want stricter boundaries and stronger evidentiary discipline.
0204 to 0326 summary (about 150 words):
The March 26, 2026 version is best seen as a tightening and sharpening of the February 4, 2026 draft rather than a wholesale rewrite. By February, Appendix S had already begun evolving beyond a simple AI taxonomy toward a framework about software outputs, evidence, and coding boundaries. The March draft pushes this further. It drops more of the device-oriented/FDA-style language and speaks more clearly in CPT terms, focusing on software outputs and their relationship to a reference service in current clinical practice. It more carefully restricts assistive status, especially for outputs using predictive language like “likelihood of” or “risk for.” It makes augmentative more demanding by tying clinical meaningfulness directly to the CPT code characteristics. It also removes February language that gave Category III applicants a more permissive developmental pathway. Overall, March appears more conservative, more evidence-calibrated, and more focused on preventing applicants from overclaiming sophistication or autonomy.
PUBLISHED VERSION VERSUS MARCH 2026 BALLOTT
What jumps out first is that the March 26 ballot draft is not a light cleanup of the 2025 Appendix S. It is an attempt to re-found the appendix on a more operational theory of software services. The 2025 text was short, elegant, and high-level: assistive detects data, augmentative analyzes/quantifies to yield clinically meaningful output, and autonomous interprets and independently generates clinically meaningful conclusions, with three escalating levels of action. The new draft keeps that skeleton, but wraps it in a much thicker framework about software outputs, evidentiary standards, clinical meaningfulness, equivalence to reference services, and CPT code criteria. In other words, the authors seem to be trying to turn Appendix S from a taxonomy into something closer to a gatekeeping policy for future code applications.
The single biggest conceptual change is the move away from talking mainly about “AI” and “work done by machines” toward talking about software, software outputs, and the service performed by the software to produce the desired outputs. The 2025 text framed the issue as classification of “AI medical services and procedures” based on the work performed by the machine on behalf of the physician or QHP. The ballot draft repeatedly substitutes the language of software outputs and adds the statement that in CPT, software outputs are recognized as useful in diagnosis, cure, mitigation, treatment, or prevention, and that their clinical meaningfulness is established through clinical evidence. That change looks intentional and strategic. It deemphasizes the fashionable word AI and shifts the debate toward the thing CPT actually codes: a clinical service with a defined output and evidentiary basis. It also makes Appendix S more future-proof, because the policy can govern software-intensive services whether or not applicants call them AI.
A second major change is that the draft inserts an explicit reference-service concept. The new text says classification is based on equivalence to a primary or usual reference service in current clinical practice and that use of an Appendix S term should be supported by evidence of analytical validity, clinical validity, or clinical utility, as appropriate to the choice of term in the code descriptor and consistent with CPT code criteria. That is a very important move. It suggests the authors are trying to prevent Appendix S from becoming a free-floating vocabulary for novel products. Instead, they want it anchored to something familiar in CPT logic: what is the usual clinical service, what is this software doing relative to it, and what level of evidence matches that claim? Apparent intention: make it harder for applicants to jump too quickly from technical performance to broad claims of augmentative or autonomous status.
The assistive section is probably the clearest example of tightening. In 2025, assistive was simply software that detects clinically relevant data without analysis or generated conclusions, and it required physician interpretation and report. The draft keeps that bottom-line concept but elaborates it substantially. It now says assistive software may draw attention to clinically relevant data without deriving a new parameter, generating an interpretation, or providing conclusions; it clarifies that parameters include indexes, scores, or classifications; and it adds an evidence discussion saying that assistive outputs are clinically supportive if they improve physician/QHP performance, with benefit to the patient substantiated by technical or analytical validation. But then it introduces an important caveat: if the output uses language such as “likelihood of,” “suggestive of,” or “risk for,” then those terms should be substantiated by clinical validation rather than mere analytical validation. That is a significant policy move. It looks like the drafters are worried that applicants have been trying to package quasi-interpretive risk language as if it were only assistive triage. The message seems to be: you may stay assistive only if you truly remain non-interpretive; once you start implying clinical inference, your evidentiary burden rises.
The augmentative rewrite is even more consequential. The 2025 definition was compact: the machine analyzes and/or quantifies data to yield clinically meaningful output; physician interpretation/report remains required. The new draft expands this into a mini-doctrine. It says the output must be a quantitative or categorical parameter qualitatively different from the input, and not merely descriptive statistics such as adding or averaging. It says augmentative output does not include a definitive interpretation or conclusion, which is reserved for autonomous. It says augmentative outputs may be reported as clinical scales, indexes, categorical classifications, or other metrics in common clinical use, or may be novel predictive/prognostic indices validated for impact on patient care. It then defines “clinically meaningful” through a three-part test: the output must be clinically validated, clinically important, and directly pertinent to the CPT code characteristics. This is much more than clarification. It is a deliberate elevation of the threshold for what counts as augmentative. The apparent intention is to stop applicants from saying, in effect, “our software produces a score, therefore it is augmentative.” Under this markup, a score alone is not enough; it must be meaningful in clinical practice, not just mathematically nontrivial.
That augmentative section also contains an underappreciated policy signal: it says software may require physician or QHP interaction during the process between input and output, for example adjusting settings based on clinical context, and then notes in a footnote that physician work related to augmentative services may already be captured by existing codes, such as E/M or presurgical planning. That looks like a direct attempt to separate two questions that otherwise get tangled: what category is the software output, and where is any physician work paid. The drafters seem to be saying that Appendix S should classify the software piece, but not automatically create new physician-work value around it. For future CPT applicants, this is potentially quite important: even if a service is accepted as augmentative, that does not mean CPT will agree there is separately payable physician work embedded in the same code.
The autonomous section is also being disciplined, though less radically than augmentative. The 2025 text already defined autonomous as software that automatically interprets data and independently generates clinically meaningful conclusions without concurrent physician/QHP involvement, and then divided autonomy into three levels. The new draft keeps the levels, but it tightens the entrance criteria. It now says autonomous software derives parameters similarly to augmentative outputs and independently generates clinically meaningful interpretations or conclusions in accordance with clinical scales/metrics in common use, clinical practice guidelines, or direct demonstration of impact on patient care. It then adds that reporting of derived parameters is essential for transparency and explicability, and states that recommendations for definitive diagnosis, specific management, or interventions should be validated for clinical utility. This is an unmistakable attempt to constrain bold autonomous claims. It implies that if software is going to generate conclusions or management recommendations, the bar is not merely analytical performance or even clinical association; it trends toward utility, guideline anchoring, and explainable transparency.
The draft’s treatment of the three autonomous levels is revealing. The levels themselves are broadly familiar: Level I recommends and requires physician action to implement; Level II initiates action with alert/opportunity to negate; Level III continues unless the physician intervenes. But the draft rewrites the prefatory language to emphasize outputs that include recommendations of definitive diagnosis and/or specific management, or medical management actions, or automatically initiated management actions. That phrasing makes the levels feel less like abstract AI maturity and more like a graded ladder of clinical authority and workflow control. The likely intention is to tie autonomy to what the software actually gets to do in the clinical workflow, not just to the sophistication of the model. That should make debates at CPT more practical: not “how smart is it?” but “what exactly does it conclude, recommend, initiate, and who has to stop it?”
The new ballot also adds a new summary table on primary objective, required evidence of clinical meaningfulness, and whether physician/QHP interpretation/report is required. This table is especially important because it exposes the authors’ true architecture. In the 2025 appendix, the summary table was descriptive. In the new draft, the new table becomes almost quasi-regulatory. It explicitly says assistive does not require evidence of clinical meaningfulness in the same way augmentative and autonomous do, although it still requires evidence of benefit to patient care; by contrast, augmentative and autonomous do require evidence of clinical meaningfulness. That is a new and consequential distinction. It formalizes a step-up in evidentiary burden across the taxonomy. I suspect the authors added this because earlier versions of Appendix S did not give enough practical help when panelists asked, “What kind of evidence is enough for each label?”
Stepping back, the draft seems to pursue at least five apparent intentions.
First, to make Appendix S more usable for actual CPT decision-making by connecting taxonomy terms to evidence standards. The old text classified. The new text classifies and tells you what sort of validation must back the classification.
Second, to shift focus from the hype term AI to the more durable concept of software outputs in clinical services. That makes the appendix harder to game with branding and easier to apply across AI, algorithms, rules engines, and software-intensive services generally.
Third, to draw a firmer line between assistive and augmentative by blocking the quiet smuggling of predictive or inferential language into assistive territory. The new “likelihood of / risk for / suggestive of” language is almost certainly there for that reason.
Fourth, to prevent weakly justified scoring systems from claiming augmentative status unless they are truly clinically meaningful, clinically important, and pertinent to the coded service. That feels aimed at the many software products that can generate a score but have a shakier claim to real-world medical relevance.
Fifth, to cabin autonomous claims by demanding more explicit linkage to guidelines, patient-care impact, transparency, and in some cases clinical utility. That makes autonomous feel less like a prestige label and more like a serious claim that must be earned.
As for impact, I think the draft, if adopted in something like this form, would make life harder for applicants seeking ambitious software codes, but easier for the CPT Panel and staff who need a principled vocabulary for saying yes, no, or not yet. It will likely favor applicants whose services resemble existing reference services, whose outputs are well-specified, and whose evidence packages are aligned to the exact claim made in the descriptor. It will be less friendly to applicants who rely on broad claims of “AI-enabled” improvement, opaque risk outputs, or arguments that a score is self-evidently meaningful because it is statistically significant. It may also reduce category inflation: fewer things called autonomous, some things pushed down from augmentative to assistive, and many proposed descriptors forced to become more precise.
There are, however, a few tensions in the markup. One is that the document is trying to do two jobs at once: define taxonomy and articulate evidence policy. That may be exactly what the drafters want, but it also risks making Appendix S more contestable, because parties may disagree not only about category assignment but about the embedded evidentiary doctrines. Another is that phrases like clinical equivalence to current clinical practice, clinically important, and directly pertinent are sensible but still somewhat elastic. Those phrases invite interpretation, which means future debates will move from “what category is this?” to “what counts as clinically important enough?” That may be unavoidable, but it means the revised Appendix S could generate more argument, not less, even as it becomes more rigorous.
My overall read is that the markup authors are trying to make Appendix S do what many coding policies eventually do after a few years in the wild: move from a clean conceptual taxonomy to a workable adjudication framework. The 2025 version was elegant and easy to summarize. The March 26 version is denser because it is trying to answer the hard questions the 2025 version left open: What exactly is the software output? How different is it from the input? Is it only drawing attention, or deriving a parameter, or making a conclusion? What evidence is needed for that level of claim? How does it relate to existing clinical practice? Who still interprets it? Who acts? Who can override it? That is why the markup feels so heavily “lawyered” and policy-laden. The authors appear to be building a document meant not just to describe AI/software services, but to control the boundary-setting battles that are now arriving at CPT.
If you want, I can turn this next into a blog-ready essay in your usual paragraph style, or a side-by-side change matrix with “old text / new text / probable purpose / coding impact.”
##
##
Feb 2026 versus March 2026 Ballott
###
Yes. Comparing the February 4 draft with the March 26 draft, the March text again looks like a document that is trying to move Appendix S away from being merely a taxonomy of AI types and toward being a more operational CPT policy for software outputs, evidence, and coding boundaries. But this time the key point is slightly different from the 2025-to-0326 comparison: the February draft had already moved in that direction. So the March draft is not a revolution from February. It is better understood as a further tightening, sharpening, and legalizing of ideas that were already present on February 4.
The broadest change is at the level of framing and ontology. The February 4 version still spoke in the language of a “software device” and explicitly defined software in FDA-like terms as software intended for diagnosis, cure, mitigation, treatment, or prevention of disease. It also said that software integral to a total service or functioning only as support was not separately codifiable unless distinct as a service with beginning, middle, and end. That language felt like an effort to define what kind of thing CPT was even talking about: a software device, perhaps regulatable, perhaps separately codable only under certain structural conditions. By March 26, that device-oriented language has been stripped back and replaced with a cleaner emphasis on software output(s) and on the service performed by the software to produce desired outputs on behalf of the physician or QHP. The March draft also adds the idea that classification is based on equivalence to a primary or usual reference service in current clinical practice. So the apparent intention was to shift from a somewhat product-centered formulation to a service-and-output-centered formulation more native to CPT. That is an important conceptual refinement. February still sounded partly like a hybrid of CPT and device-regulatory language; March sounds much more like CPT trying to speak in its own voice.
Related to that, March is markedly more explicit about evidence standards tied to terminology choice. The February version said clinical evidence should demonstrate that the output of the software device benefits patient care, and then, in its “clinically meaningful output” section, said such output must be clinically validated beyond technical or analytical validation, clinically important beyond statistical significance, and directly relevant to intended use of the CPT code. March preserves that structure but tightens it by saying that, to use a term from Appendix S, evidence should demonstrate analytical validity, clinical validity, or clinical utility, as appropriate to the choice of term in the code descriptor and consistent with CPT code criteria. That is more disciplined and more tactical. It appears designed to align Appendix S terminology directly with the evidentiary threshold implied by the descriptor claim. The likely purpose is to prevent overclaiming: an applicant should not be able to select a stronger Appendix S term than its evidence package can justify.
The treatment of assistive is also more carefully delimited in March. In February, assistive meant drawing attention to clinically relevant data without deriving a new parameter, making an interpretation, or providing conclusions, and required physician/QHP interpretation and report. It also said that an assistive device improves physician/QHP performance and that improvement should be substantiated by clinical evidence. March keeps the same basic idea, but it adds several guardrails. It now explicitly defines parameters as quantitative or categorical outputs such as an index, score, or classification. It states that assistive outputs are clinically supportive because they improve physician/QHP performance, and that the improvement should be a patient benefit substantiated by technical or analytical validation where the primary service output is unchanged. But it then adds a very important qualification: if the output uses language such as “likelihood of,” “suggestive of,” or “risk for,” those terms should be substantiated by clinical validation. This is one of the clearest March-over-February moves. February’s assistive language still left room for products to flirt with predictive or inferential terminology while claiming low-level assistive status. March closes that gap. The apparent intention is to stop the semantic creep by which quasi-interpretive outputs masquerade as simple detection aids.
In augmentative, the March draft becomes more exacting and more practical than February. February said augmentative output derives a quantitative or categorical parameter qualitatively different from the input, that it must be more than descriptive statistics, and that it does not include an interpretation or conclusion. It also allowed expression through common clinical scales or other metrics, or validated predictive/prognostic indices. That was already fairly strong. But March goes a step farther in several ways. First, it keeps the requirement that output be qualitatively different and more than mere summation, adding more emphatic language that it must provide something beyond adding, averaging, or otherwise reporting descriptive statistics. Second, March explicitly says that the designation of software output as augmentative is based on demonstration that it is clinically meaningful and distinct from the input function. Third, it refines the three-part test for clinical meaningfulness by tying the output directly to the CPT code characteristics, including typical patient, procedure description, and descriptor. This makes the test more CPT-specific and less abstract than February’s intended-use wording.
Another notable deletion is that the February draft included special language for Category III codes, stating that use of “augmentative” could be substantiated by the design of the product and the design of ongoing clinical trials intended to yield Category I-level validation. A parallel Category III accommodation also appeared in autonomous. That language is absent from the March 26 text. I think that is one of the most consequential edits in the whole comparison. It suggests that between February and March, the drafters pulled back from giving what might have looked like a special evidentiary lane for emerging technologies. The likely reason is concern that such language could be read as an invitation to claim a higher Appendix S category based on future evidence plans rather than current evidence. Removing it makes the document more conservative and more immediate: classification should reflect what the software output can substantiate now, not what trials may later show. That deletion is highly consistent with a general March pattern of tightening access to stronger labels.
March also reframes the question of physician work more pointedly. February said augmentative output may involve non-traditional physician/QHP interaction and that most augmentative services do not involve traditional interpretive physician work; the output may serve as a data element in E/M, a factor in surgical planning, or input to another interpretive service. March retains this logic but makes it even more explicit that physician work related to augmentative services may already be captured by existing codes. This is a subtle but important hardening. It sounds like the authors want to ensure that Appendix S does not become a Trojan horse for arguments that every sophisticated software output should carry newly recognized physician work or stand-alone reimbursement logic. The intention seems to be to separate the classification of the software service from the valuation of physician effort, keeping both questions analytically distinct.
The March version of autonomous is likewise a refinement rather than a full rewrite, but it is a meaningful refinement. February defined autonomous as automatic derivation of parameters and independent generation of interpretations or conclusions in accordance with clinical scales, practice guidelines, or direct demonstration of impact on patient care. It also required reporting of derived parameters for oversight, transparency, and explicability, and said recommendations for diagnoses or interventions should be based on parameters and reported within the context of epidemiologic data, practice guidelines, or evidence for clinical utility. March keeps much of that structure but makes some of the language more pointed. It says recommendations for definitive diagnostic conclusions, specific management, or interventions should be validated for clinical utility, and notes that these standards are especially important for Levels II and III. The drift is toward a more explicit gradient of evidentiary seriousness as the software’s practical authority increases. February hinted at that when it said higher autonomy levels require higher evidentiary standards due to increasing patient risk. March operationalizes the point more concretely within the main autonomous text.
The treatment of the three autonomous levels is also revealing. February described them in cleaner prose: Level I provides recommendations requiring physician/QHP judgment; Level II initiates management actions with compulsory alert and opportunity for override; Level III automatically initiates management unless or until physician/QHP action reverses it. March keeps the same staircase but rewrites the wording to emphasize outputs that include recommendations of definitive diagnosis and/or specific management, then outputs that include medical management actions, and finally outputs that automatically initiate management actions that continue unless the physician intervenes. This makes the levels feel slightly less like design categories and more like an escalating sequence of clinical control and workflow consequence. The likely intention is to help the Panel judge not just sophistication, but how much authority the software is exercising in the patient-care chain.
One of the most striking March additions is the new summary table distinguishing primary objective, required evidence of clinical meaningfulness, and whether physician/QHP interpretation and report is required. February’s summary table was more conventional: primary objective, independent diagnosis/management, analyzes data, requires interpretation, evidence of patient benefit required. March’s table is more doctrinal. It distinguishes assistive from augmentative/autonomous by stating that assistive does not require evidence of clinical meaningfulness in the same way, though it does require evidence of benefit to patient care, while augmentative and autonomous do require evidence of clinical meaningfulness. It also gives examples of language associated with each category, such as assistive outputs drawing attention to data and even including terms like “likelihood of” or “risk for,” while augmentative includes outputs “predictive of” or “prognostic of.” That table does a lot of work. It is almost a cheat-sheet for future coding debates. Compared with February, it shows the authors trying to convert Appendix S into something more adjudicative and scalable.
So what are the apparent intentions behind the March revisions relative to February?
First, to make Appendix S sound less like a statement about software products/devices and more like a statement about software services and outputs in CPT terms. February still carried some residual FDA/device flavor; March is more squarely CPT.
Second, to narrow the assistive lane and stop inferential or risk-bearing outputs from receiving low-level categorization without stronger validation. The “likelihood of / suggestive of / risk for” language is central here.
Third, to make augmentative a higher and more disciplined threshold, not merely “any score or classification,” but a clinically meaningful parameter linked tightly to the descriptor, typical patient, and procedure.
Fourth, to eliminate what may have been perceived in February as a too-generous Category III glide path. By deleting those passages, March seems to reject the idea that design and planned trials are enough to justify stronger Appendix S terminology.
Fifth, to align increasing autonomy with increasing demands for transparency, guideline anchoring, and clinical utility, especially as the software moves from informing decisions to initiating actions.
In terms of practical impact, I think March will make Appendix S more useful to CPT leadership and staff, but more demanding for applicants. February already set out a serious framework. March turns the screws. It will likely reduce the room for applicants to rely on broad AI rhetoric, product design, or future-study arguments. It will favor services with crisp output definitions, solid present-tense evidence, and a clear relationship to existing clinical practice. It will also likely make coding debates more explicit around the questions the Panel actually cares about: what is the output, how different is it from the input, what exactly is the clinical claim, what evidence supports that claim, and where does physician work sit, if anywhere?
My overall read is that the February version was a serious and already mature draft, but it still carried traces of an effort to accommodate innovation by describing device structure, codifiability, and Category III developmental pathways. The March version is noticeably more guarded, CPT-native, evidence-calibrated, and boundary-conscious. It is not trying to be friendlier to applicants. It is trying to give the Panel a sturdier vocabulary for saying, with more confidence and less ambiguity, this output is assistive, this one is augmentative, this one is autonomous, and this is the level of proof needed for each claim.
If you’d like, I can next turn this into a side-by-side change matrix with columns for 0204 text, 0326 text, likely reason for revision, and likely impact at the April 30 meeting.