Importing the National Provider Database, Part II

Hello, I'm Jacob. I perform technology and strategy consulting, including data analytics and machine learning. I blog about various projects I'm working on. If this topic is interesting or relevant to your organization let's talk - hello at jacob dot vi

In Importing the National Provider Database, Part I, I reviewed how to wrangle SQL Server Import and Export Wizard into handling the 6GB NPPES file provided by CMS. To make this data set more usable, and to save disk space, we can pull out the Licensing Taxonomy data and the Provider Identification data. In the original data set this information is stored as 60 and 200 sparse columns, respectively. By creating tables for this data we can then remove those columns from the base table.

First, create two new tables

CREATE TABLE [Provider Taxonomy Licenses] (
	[NPI] [nvarchar](10) NULL,
	[Healthcare Provider Taxonomy Code] [nvarchar](10) NULL,
	[Provider License Number] [nvarchar](20) NULL,
	[Provider License Number State Code] [nvarchar](2) NULL,
	[Healthcare Provider Primary Taxonomy Switch] [nvarchar](1) NULL,
	[Healthcare Provider Primary Taxonomy Group] [nvarchar](70) NULL,
)

CREATE TABLE [Other Provider Identifiers](
	[NPI] [nvarchar](10) NULL,
	[Other Provider Identifier] [nvarchar](50) NULL,
	[Other Provider Identifier Type Code] [nvarchar](2) NULL,
	[Other Provider Identifier State] [nvarchar](2) NULL,
	[Other Provider Identifier Issuer] [nvarchar](80) NULL
)

Next, we load the data from NPPES_Import into these tables, ignoring any empty values. While other people on the Internet use external tools to do this step, such as R or Python, I have implemented it in pure SQL to be as stream-lined as possible. As a cautionary note, I am using dynamic SQL here, which should generally be avoided due to the possibility of SQL Injection attacks. However, since I am not taking user-input and thus have full control of all variables used within the dynamic statement, this is not an issue. Seriously though, don't use dynamic SQL on any user-facing system!

DECLARE @LoopVar int = 1
DECLARE @LoopStr nvarchar(3)
DECLARE @Query nvarchar(max)

WHILE	@LoopVar <= 15
BEGIN
	SET		@LoopStr = CONVERT(nvarchar(3), @LoopVar)
	SET		@Query = N'INSERT INTO [Provider Taxonomy Licenses]
SELECT	[NPI],
	[Healthcare Provider Taxonomy Code_' + @LoopStr + '],
	[Provider License Numbe_' + @LoopStr + '],
	[Provider License Number State Code_' + @LoopStr + '],
	[Healthcare Provider Primary Taxonomy Switch_' + @LoopStr + '],
	[Healthcare Provider Taxonomy Group_' + @LoopStr + ']
FROM	[Medicare_Import].dbo.[NPPES_Import]
WHERE	[Healthcare Provider Taxonomy Code_' + @LoopStr + '] <> '''''
	EXECUTE sp_executesql @Query
	--PRINT @Query
	SET @LoopVar = @LoopVar + 1    END
END

WHILE	@LoopVar <= 50
BEGIN
	SET		@Query = N'INSERT INTO [Other Provider Identifiers]
SELECT	[NPI],
	[Other Provider Identifier_' + @LoopStr + '],
	[Other Provider Identifier Type Code_' + @LoopStr + '],
	[Other Provider Identifier State_' + @LoopStr + '],
	[Other Provider Identifier Issuer_' + @LoopStr + ']
FROM	[NPPES_Import]
WHERE	[Other Provider Identifier_' + @LoopStr + '] <> '''''
	EXECUTE sp_executesql @Query
	--PRINT @Query
	SET @LoopVar = @LoopVar + 1
END
GO

Now we can create a new providers table and drop the original import table. Note that the insert statement uses a SELECT DISTINCT due to duplicates in the NPPES dataset.

-- create Provider Set
CREATE TABLE [dbo].[Providers](
[NPI] [nvarchar](10) NULL,
[Entity Type Code] [nvarchar](1) NULL,
[Replacement NPI] [nvarchar](10) NULL,
[Employer Identification Number (EIN)] [nvarchar](9) NULL,
[Provider Organization Name (Legal Business Name)] [nvarchar](70) NULL,
[Provider Last Name (Legal Name)] [nvarchar](50) NULL,
[Provider First Name] [nvarchar](50) NULL,
[Provider Middle Name] [nvarchar](50) NULL,
[Provider Name Prefix Text] [nvarchar](5) NULL,
[Provider Name Suffix Text] [nvarchar](5) NULL,
[Provider Credential Text] [nvarchar](50) NULL,
[Provider Other Organization Name] [nvarchar](70) NULL,
[Provider Other Organization Name Type Code] [nvarchar](1) NULL,
[Provider Other Last Name] [nvarchar](50) NULL,
[Provider Other First Name] [nvarchar](50) NULL,
[Provider Other Middle Name] [nvarchar](50) NULL,
[Provider Other Name Prefix Text] [nvarchar](5) NULL,
[Provider Other Name Suffix Text] [nvarchar](5) NULL,
[Provider Other Credential Text] [nvarchar](50) NULL,
[Provider Other Last Name Type Code] [nvarchar](1) NULL,
[Provider First Line Business Mailing Address] [nvarchar](256) NULL,
[Provider Second Line Business Mailing Address] [nvarchar](256) NULL,
[Provider Business Mailing Address City Name] [nvarchar](50) NULL,
[Provider Business Mailing Address State Name] [nvarchar](50) NULL,
[Provider Business Mailing Address Postal Code] [nvarchar](50) NULL,
[Provider Business Mailing Address Country Code (If outside U S )] [nvarchar](2) NULL,
[Provider Business Mailing Address Telephone Number] [nvarchar](50) NULL,
[Provider Business Mailing Address Fax Number] [nvarchar](50) NULL,
[Provider First Line Business Practice Location Address] [nvarchar](256) NULL,
[Provider Second Line Business Practice Location Address] [nvarchar](256) NULL,
[Provider Business Practice Location Address City Name] [nvarchar](50) NULL,
[Provider Business Practice Location Address State Name] [nvarchar](50) NULL,
[Provider Business Practice Location Address Postal Code] [nvarchar](50) NULL,
[Provider Business Practice Location Address Country Code (If outside U S )] [nvarchar](2) NULL,
[Provider Business Practice Location Address Telephone Number] [nvarchar](50) NULL,
[Provider Business Practice Location Address Fax Number] [nvarchar](50) NULL,
[Provider Enumeration Date] [date] NULL,
[Last Update Date] [date] NULL,
[NPI Deactivation Reason Code] [nvarchar](2) NULL,
[NPI Deactivation Date] [date] NULL,
[NPI Reactivation Date] [date] NULL,
[Provider Gender Code] [nvarchar](1) NULL,
[Authorized Official Last Name] [nvarchar](50) NULL,
[Authorized Official First Name] [nvarchar](50) NULL,
[Authorized Official Middle Name] [nvarchar](50) NULL,
[Authorized Official Title or Position] [nvarchar](50) NULL,
[Authorized Official Telephone Number] [nvarchar](50) NULL,
[Is Sole Proprietor] [nvarchar](1) NULL,
[Is Organization Subpart] [nvarchar](1) NULL,
[Parent Organization LBN] [nvarchar](70) NULL,
[Parent Organization TIN] [nvarchar](50) NULL,
[Authorized Official Name Prefix Text] [nvarchar](5) NULL,
[Authorized Official Name Suffix Text] [nvarchar](5) NULL,
[Authorized Official Credential Text] [nvarchar](50) NULL
)
GO

INSERT INTO Providers
SELECT	DISTINCT [NPI],
	[Entity Type Code],
	[Replacement NPI],
	[Employer Identification Number (EIN)],
	[Provider Organization Name (Legal Business Name)],
	[Provider Last Name (Legal Name)],
	[Provider First Name],
	[Provider Middle Name],
	[Provider Name Prefix Text],
	[Provider Name Suffix Text],
	[Provider Credential Text],
	[Provider Other Organization Name],
	[Provider Other Organization Name Type Code],
	[Provider Other Last Name],
	[Provider Other First Name],
	[Provider Other Middle Name],
	[Provider Other Name Prefix Text],
	[Provider Other Name Suffix Text],
	[Provider Other Credential Text],
	[Provider Other Last Name Type Code],
	[Provider First Line Business Mailing Address],
	[Provider Second Line Business Mailing Address],
	[Provider Business Mailing Address City Name],
	[Provider Business Mailing Address State Name],
	[Provider Business Mailing Address Postal Code],
	[Provider Business Mailing Address Country Code (If outside U S )],
	[Provider Business Mailing Address Telephone Number],
	[Provider Business Mailing Address Fax Number],
	[Provider First Line Business Practice Location Address],
	[Provider Second Line Business Practice Location Address],
	[Provider Business Practice Location Address City Name],
	[Provider Business Practice Location Address State Name],
	[Provider Business Practice Location Address Postal Code],
	[Provider Business Practice Location Address Country Code (If outside U S )],
	[Provider Business Practice Location Address Telephone Number],
	[Provider Business Practice Location Address Fax Number],
	[Provider Enumeration Date],
	[Last Update Date],
	[NPI Deactivation Reason Code],
	[NPI Deactivation Date],
	[NPI Reactivation Date],
	[Provider Gender Code],
	[Authorized Official Last Name],
	[Authorized Official First Name],
	[Authorized Official Middle Name],
	[Authorized Official Title or Position],
	[Authorized Official Telephone Number],
	[Is Sole Proprietor],
	[Is Organization Subpart],
	[Parent Organization LBN],
	[Parent Organization TIN],
	[Authorized Official Name Prefix Text],
	[Authorized Official Name Suffix Text],
	[Authorized Official Credential Text]
FROM [NPPES_Import]

DROP TABLE [NPPES_Import]

You now have a useable and relational Medicare Provider dataset.

Show Comments