Deadlifts and Derivatives

Updates and Research from Steve Bronder

The purpose of this site is to give current information on Steve Bronder's (me) research and personal life

Visualizing Payments to Doctors With Package Statebins

total_doc10414.png

Introduction

The purpose of this post is to explain how to graph topological data with the statebins package. To do this we will play with General Payment Data for non-research/ownership payments to physicians and teaching hospitals. This data was recently released and, in short, contains the data for “gifts” pharma companies and others give to doctors and teaching hospitals because they are just great people. The data used throughtout this tutorial can be found on the open payments data section of the center for Medicare and Medicaid Services here.

Statebins is an R package that produces choropleth maps for US states. These maps preserve the geographic placement of states, but have the look and feel of a traditional heatmap. This package is based on work by the Washington Post graphics department in their report on The States Most Threatened by Trade. Functions allow binned, discrete, and continuous scale heat maps. We will example two of these through binning the number of doctors receiving payments in each state as well as the mean and total payments in each state.

Loading Data

Lets get started, lets load in the data. The data is one GB in size so we will use the fread() function in the package data.table. stringsAsFactors is by default FALSE, however, call me old fashioned because we are going to put it in anyway.

library(statebins)
library(data.table)
pharm.data <- fread("./General_Payment_Data_2013.csv",header=TRUE,stringsAsFactors=FALSE,showProgress=FALSE)

Once the data is loaded note that using str(pharm.data) reveals a $ in one of our variables of interest, Total_Amount_of_Payment_USDollars. To make this into a numeric data type we will use the substring function starting at the second character to convert this column into numeric.

pharm.data$Total_Amount_of_Payment_USDollars <-  as.numeric(substring(pharm.data$Total_Amount_of_Payment_USDollars,2))

Aggregating Data

Now we can start messing with the data. Lets create three different subsets of the data. All of these will use data.table’s aggregation technique. The functions used in aggregate are the normal R mean(), sum(), and length() functions. Notice we use the data.table function setkey() to tell R we want the Recipient_State as the key for aggregating.

setkey(pharm.data,Recipient_State )

# Aggregate means of each state
pharm.data.mean <- as.data.frame(pharm.data[, mean(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate totals of each state
pharm.data.total <- as.data.frame(pharm.data[, sum(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate number of doctors in each state
# Notice na.omit() in length() instead of na.rm=TRUE
pharm.data.docs <- as.data.frame(pharm.data[, length(na.omit(Physician_Last_Name)),by = Recipient_State])

To get these aggregates we tell R to make a data frame out of pharm.data that consists of either the mean, sum, or length of the column choice variable. We tell R to group each of these by the recipient state. Now that we have all the data formatted we have to ask another question. Did we time travel to some wacky future where America has gained eight new states?? I don’t think so, but why do we have 59 observations? This dataset also includes some US army bases and territories. The only way I know how to remove these is to do them individually, but if someone knows an easier way please leave a comment!

pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-3,]
pharm.data.mean<-pharm.data.mean[-12,]
pharm.data.mean<-pharm.data.mean[-41,]
pharm.data.mean<-pharm.data.mean[-38,]
pharm.data.mean<-pharm.data.mean[-47,]

pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-3,]
pharm.data.total<-pharm.data.total[-12,]
pharm.data.total<-pharm.data.total[-41,]
pharm.data.total<-pharm.data.total[-38,]
pharm.data.total<-pharm.data.total[-47,]

pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-3,]
pharm.data.docs<-pharm.data.docs[-12,]
pharm.data.docs<-pharm.data.docs[-41,]
pharm.data.docs<-pharm.data.docs[-38,]
pharm.data.docs<-pharm.data.docs[-47,]


colnames(pharm.data.mean)<- c("state","value")
colnames(pharm.data.total)<- c("state","value")
colnames(pharm.data.docs)<- c("state","length")

Notice at the end I slipped in a column name change. This is just to make the step of plotting a little easier for me and is not necessary.

Creating Visualization

We have data munged and paid our data dues, lets make some pretty graphs. We’ll use the statebins_continuous() function for our mean and total payments graphs. This function’s ability to go from simple to complex is noted by our ability to attach additional ggplot2 arguments to the function. This means that you can use the base function, but if you want to customize something in particular its very doable.

State.Payment.mean <- statebins_continuous(pharm.data.mean, "state", "value",legend_title="Mean of Money Transferred From Pharma companies to Doctors By State", font_size=3,brewer_pal="PuRd", text_color="black",plot_title="Mean Transfers of money from Pharmaceutical Companies to Doctors in each state", legend_position="bottom",title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.mean
mean_tran_10414.png

First we specify our dataset, pharm.data.mean, then tell the function the name of the column of states (which can be an abbreviation like our example or full names) and name of the value to place in the heat map. Note, both of these columns are called by placing the column name as a string. brewer_pal sets the color gradient for each state. These are palettes from the RColorBrewer package. This guide from supstat.com gives details on what other pallets are possible. We increase the size of the guide through the ggplot2 function guides(). barwidth and barheight specify the size of the bar in the guide.

State.Payment.total <- statebins_continuous(pharm.data.total, "state", "value",legend_title="Total of Money Transferred From Pharma companies to Doctors By State", font_size=3, brewer_pal="Greens", text_color="black",  plot_title="Total Transfers of money from Pharmaceutical Companies to Doctors in each state", legend_position="bottom", title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.total
total_tran_10414.png

For the number of doctors who receive payments from pharma companies per state we’ll use the statebins() function. The only changes to this function are where we specify the number of breaks breaks=6 and the labels for each break. labels=c("1", "2", "3", "4","5","6").

State.docs <- statebins(pharm.data.docs, "state", "length", breaks=6, labels=c("1", "2", "3", "4","5","6"), legend_title="Rank of states by number of doctors who receive payments from Pharma companies", font_size=3, brewer_pal="PuBu", text_color="black", plot_title="Number of doctors who receive transfers of money from Pharmaceutical Companies to Doctors in each state", title_position="top", legend_position="bottom")

State.docs
total_doc10414.png

Conclusion

The purpose of this post is to explain how to graph topological data with the statebins package. To do this we play with General Payment Data for non-research/ownership payments to physicians and teaching hospitals. This data was recently released and, in short, contains the data for “gifts” from pharma companies to doctors. Population needs to be accounted for as our graph of number of doctors receiving payments is skewed towards states with larger populations. Note that the max mean transfer of payments (800 dollars) is settled around California, Vermont, Colorado. I would love to hear if someone has a reasonable explenation for why these states in particular have the highest average payments.

I hope you enjoyed this post! My friend Andrew Bryk just recently wrote a tutorial for Shiny that we recently converted to Rmarkdown. We hope to have it on the site within the next week so stay posted and have a good time making pretty graphs!!