Tuesday, June 9, 2026

Names of 87798-billing labs in 2024 vs names of other labs not billing 87798

Claude Opus 6 9 2026


 I dug into both tabs. A quick structural note first: the "Clean_Labs" tab is actually the full universe of 1,561 labs that don't bill 87798 (zero overlap with the 590 on the 87798 tab), so the two lists are cleanly disjoint and directly comparable. Two confounds are worth flagging up front, because they shape the raw word counts:

  • The 87798 list contains the national giants — Quest Diagnostics (≈19 entity variants) and LabCorp / "Laboratory Corporation of America Holdings" (≈20 variants). These are not fly-by-night labs; they're legitimate megalabs that of course run NAAT. So apparent enrichment of "America," "Quest," and "Holdings" in the 87798 group is an artifact of corporate-subsidiary naming, not a fraud signal. I stripped them out and the real signal held.
  • The clean set carries a large Puerto Rico contingent (~13%, "Laboratorio Clínico…"), which is entirely absent from the 87798 list and inflates the clean group's distinctiveness. Stripped that too.

With those removed, here are the differences that survive.

Entity type is the single sharpest split. The 87798 labs are overwhelmingly LLCs; the clean labs lean toward Inc.

Suffix 87798 labs Clean labs
LLC 50.5% 30.0%
Inc / Incorporated 27.8% 39.4%
PC / PA / LLP / PLLC ~1.0% ~5.7%

An LLC is the cheapest, fastest entity to spin up and dissolve, which fits the pop-up profile. Notably, the professional-corporation forms (PC, PA, PLLC) that signal physician-owned pathology/medical practices are almost entirely on the clean side.

"Modern molecular" branding vs. traditional/institutional naming. This is the clearest onomastic pattern. After removing the confounds above:

  • Words like Diagnostics, Molecular, Genomics, Genetic, Precision, DX, Bio, Sciences, Solutions, Advanced, Labs appear in ~39% of 87798 names vs. ~21% of clean names (roughly 2×).
  • Words signaling an established institution — Pathology, Pathologists, Physicians, University, Hospital, Medical, Associates, Clinic, Family, Community, plus personal surnames — appear in ~33% of clean names vs. ~19% of 87798 names.

So the 87798 set tilts toward the language of a new molecular-testing venture; the clean set tilts toward the language of a practice, hospital lab, or reference lab embedded in a care setting.

Register: "Labs/Lab/Diagnostics" over "Laboratory/Pathology." The 87798 group favors the informal, brand-forward Labs (10.3% vs 4.5%), Lab (11% vs 6.2%), and Diagnostics/Diagnostic (21% vs 10%). The clean group is where Pathology (7.0% vs 2.9%), Physicians, University, and Pathologists concentrate.

The "Dx" tell and coined brand names. "Dx" as a token is ~3.5× enriched (1.4% vs 0.3%), and qualitatively the 87798 list is where you find the coined/portmanteau startup-style names with "Dx" fused on or abstract single-word brands: Sozo Dx, Citizensdx, Helixbiodx, Northstar Dx, Ipsum Diagnostics, Prescience Diagnostics, Proteus Molecular, Truth Diagnostics, Gezunt Diagnostics, Applied Ingenuity Diagnostics. This invented-brand register is much rarer in the clean set, which skews toward eponymous (surname-based) and place-or-institution-based names.

What does not discriminate. I'd caution against over-reading aspirational adjectives — premier, elite, precision, advanced, prime, apex — as a fraud marker. They appear in both groups at similar rates (e.g., "precision" is actually slightly more common in the clean set, 10 vs 4), so they're noise here despite the intuition. Name length and word count are also essentially identical (≈3.6 vs 3.8 words). And explicit state words (Texas, Florida) are only mildly enriched in the names themselves — the TX/FL concentration you've described lives in the address/PECOS data, not the lab name.

The honest summary: the differentiating signal is entity form (LLC) plus a "molecular-startup" lexical register (Diagnostics/Molecular/Genomics/DX/Labs, often as a coined brand), as against the clean set's institutional/eponymous/professional-corporation register (Pathology/Physicians/University/Medical/surnames, plus PC/PA forms). 

It's a meaningful descriptive contrast and a reasonable screening prior, but it's correlational onomastics, not a classifier — plenty of legitimate molecular labs (and the national giants sitting right there in your 87798 list) use the exact same modern vocabulary.

If useful, I can build a scored feature set from this (LLC flag + molecular-token flag + Dx flag + absence-of-institutional-token flag) and run it back over the 590 to rank the names by how strongly they fit the pattern — though you'd want to validate any such score against your billing-fingerprint and geographic-concentration signals rather than lean on the name alone.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.