Tuesday, May 20, 2014

Nerdy Text Only Version: How To Use the Giant Medicare Physician Database (2 GB, April 2014)

Separately, I have posted a main article on the giant Medicare physician data base, here.

From that webpage, you can access a 40-slide PowerPoint explanation, and also access a 20 minute YouTube video in which I narrate the PowerPoint.

If you don't want the PowerPoint and/or the inspiring online video with the eloquent soundtrack I recorded, I am also posting some step by step instructions here...in writing alone.

GETTING THE DATA

For the CMS homepage for all the data files, click here.


  • The comprehensive database is a 1.7 GB data base that can't be used in Excel, but only in dedicated database programs.  (I didn't touch it.)

  • The aggregated data files are of two types:

    • PHYSICIAN AGGREGATED DATA.   This is each physician in the US, with his NPI number, address, gender, total billings, total beneficiaries he saw, etc.   
    • HCPCS AGGREGATED DATA.   This is every CPT and HCPCS (Supply/drug) code used in CY2012.by both USA aggregate and by state. 
  • The Physician/Supplier Data is so large that CMS breaks it down alphabetically.
    • The A physicians.
    • The B physicians.
    • The C-D physicians.
    • And so on.
    • Each is small enough it will open (although it may take 2-3 minutes) in Excel.  Each data base is less than 200 MB and less than 1M lines.
OPENING THE DATA

At least on my Windows 7 system, when I download the files (3-5 minutes), and click to open in Excel (another 2-3 minutes), I get a yellow bar warning the file is downloaded from the internet and may not be safe.  If you click on that, to use the file, my system AGAIN takes 3-4 minutes to save the file before I can manipulate and use it.  So there went 10 minutes of so.  At this point, I click "Save" so I don't have to go through that again, but of course, it triggers a couple minute Save.  But from then on the files will be faster to use and will be ready on your hard drive.

How big is the data?  The largest files approach 200 MB.  If a file is 900,000 lines long x 27 columns, it is 24.3 million cells.  And there are about 10 such files, so the CMS physician data files may be close to 200 million cells in total.

FILTER AND SORT THE DATA

FILTER.  The physician supplier databases are 27 columns wide.   Each column contains a "filter" box at the top.  For example, you can filter first on provider type "Allergist" and then on CPT code 99202 (20 minute new patient office visit.)   If you were in the "A" provider file, you now are looking at all "A" name allergists who billed for a 99202 type new patient visit.

SORT.  If you want to sort columns, I suggest first deleting Row 1, an AMA Copyright line, since it may mess up the sorting process.

To sort, click your cursor in the top data box of the column you want to sort on, go to top right, and select Sort, Z to A (to sort from small to large).  You can also use the Sort/Custom Sort feature to select the column you want to sort on from the menu options found there.

REDUCE THE DATA TO MANAGEABLE SIZE (e.g. 200KB from 20 MB)

After you filter and sort, the database will still be as huge as you started with.  To copy only what you see:
  • Click Ctl-A to select All.   Go to Find & Select at top right of the Home tab.   Click Go To Special, click Visible Cells Only.  Now click Ctl-C to copy everything.  Go to a new spreadsheet and paste.   Often, you will have reduced a 20M to 200M dataset to only a few hundred Kb by this maneuver.
From A to B to C to....Z, instead just "A-Z".   
Since you are working with one provider alphabetic group at a time, if you have just done this for "A" providers, repeat it for B providers and then the C providers and so on.  Then collate them all into one A-Z data page.  Even if it's now 500 KB or 1 MB, you've pulled that out of the original 1.7 GB of total data spread across the ten or twelve Excel files.

Skim the Data.
If you research a topic in the "A" file, it is about 5% of the total data.  The HIJ or KL files are about 10% of the total data.   So, for example, if 50 doctors under "HIJ" used a certain code, the total number in all the files will likely be ~500 doctors.  

Leave your Master Files intact.
I suggest if you have manipulated and filtered the original 200MB database files, do NOT save them unless these are the filter settings that you want to see when you open it again.   Just close them so they stay in their original state.

Procedures or Costs per 1000 Beneficiaries
If you have state level data, normalizing it by the number of beneficiaries per state is often helpful.  Beneficiaries per state can be found here which is one of the tables in a large set, here.  For example, for one service I researched, by state, variably 0.5 to 5 Medicare enrollees per 1000 got the service.  Spending on the service (per 1000 patients) varied by 10X among the 50 states.

How to Record PowerPoints as Videos, my lessons learned are here.

No comments:

Post a Comment