Data Science for Economists.

Project RapidCharts github

Project: US Criminal Recidivism

The Federal Bureau of Prisons (BoP) state on their website that "It is the mission of the Federal Bureau of Prisons to ... provide work and other self-improvement opportunities to assist offenders in becoming law-abiding citizens". This final part relates to Recidivism, defined as "the tendency of a convicted criminal to reoffend" and is measured by rearrest,reconviction or return to prison within 3 years of release, according to the National Institute of Justice. Criminal recidivism is essentially a measure of prison rehabilitation capabilities.

Aims: To determine whether State-wide factors influence criminal recidivism.

Chart 1:

My first chart is a map, where different fields can be selected to view.

Chart 2:

My second chart plots each state's recidivism rate against the spending per prisoner, with the bubbles indicating the prisoners per 100,000 population.

There does not appear to be a strong correlation, suggesting that the amount of spending per prisoner alone does not directly impact recidivism..

Chart 3:

For my third chart, I wanted to regress some of my variables against the recidivism rate.

No strong relationships were observed.

Chart 4:

For my fourth chart I wanted to see if these variables could accurately predict the Recidivism Rate using the sklearn library. The Google Colab Notebook can be accessed here.

The model did not have much success in predicting the Recidivism Rate from the data, with a calculated mean squared error of 285, suggesting these State-wide factors do not influence criminal recidivism.

Chart 5:

For my final chart, I wanted to look at recidivism over longer periods of time, to determine whether the standard 3-year reported recidivism gives a good insight into overall recidivism.

After 3 years the cumulative recidivism rate across all age groups was 68.4%, but at the 9 year mark it reaches 83.3%. This suggests that the true scale of recidivism is underestimated by State governments and current rehabilitation strategies are mostly ineffective.

Conclusions:

It seems like State Recidivism cannot be accurately predicted by State-wide factors such as those tested, using the rates reported in the source used. However, as seen in my final chart, it seems that States are largely underestimating the recidivism rates, and more accurate figures could potentially allow significant statistical analysis. Therefore, State-level data should be collected over longer time periods where possible to accurately assess current rates and analyse causes. More information about individual circumstances of prisoners who reoffend will also likely prove key in addressing recidivism.

Data Gathering:

The dataset for most of my charts was created from a range of sources listed below, before being merged together. The data for State-by-State criminal recidivism was published in 2018, so for each data source I have found data from 2018 or as close as possible. This also avoids any effects from Covid-19 interfering in the analysis.

Sources:

Recidivism Rate: Vadoc (Virginia Deparment of Corrections) State recidivism comparison report 2018 available here.

State Tax Revenue: Scraped from here.

Unemployment Rate: Scraped (same Colab notebook as above) from here.

Educational data: Scraped from Wikipedia.

Average Income: Scraped (same Colab notebook as above) from Wikipedia.

Corrections Expenditure: BJS Justice Expenditures and Employment in the United States, 2017, available here.

Prison Population: BJS Prisoners in 2018, available here.

Gini Coefficients: SSTI.

Prison Operational Capacity: State Prison Overcrowding and Capacity Data (University of Nebraska Omaha), available here.

State Population: US Census Bureau, available here.

The data for my final chart can be found here. I imported the data into a Colab notebook from the downloaded csv file in order to convert to long format.

Challenges in Data Cleaning and/or Analysis:

For State Tax Revenue data I had to research how to scrape a website which actively prevented web-scraping attempts.

For the Recidivism Rate in the Vadoc report, certain states were excluded due to differences in reporting standards ("The above states were excluded in the comparison for the following reasons: California's rate excludes parole violations; Georgia only reports re-conviction information; Hawaii only reports re-arrest information; Oregon's rate is based on a six-month release cohort and includes releases from prison and felons released from jails; Tennessee's rate combines re-arrest, re-conviction, and re-incarceration; Texas' rate includes both felons and misdemeanants; Utah's rate only includes releases to parole"), but I wanted to include data for these States in my analysis, so I found data for each from their respective State Corrections departments.

I had to create a separate html page for my final chart and embed it into this page because I used the "xOffset" encoding to group the bars by year, which only works with Vega-Lite libraries 5.2.0 or later, but when I tried to change the library version for the entire page my other charts broke.