Is crime on the rise? My buddy’s catalytic converter was stolen last night (~$2800 to replace including the damaged O2 sensor), and it got me thinking. The current politically charged environment suggests that everything is going down the drain, so I did the responsible thing and decided to do my own analysis.
I also had an hour of free time, and it’s always fun to play with data. Let’s dig in.
Prep:
I started by searching Google for a relevant dataset. This was time-consuming and often led to cumbersome government-created pages that mixed aggregate reports and raw figures. I also needed to input a myriad of “boxes” to get some resemblance of useful data.
After 10 minutes of endless clicking, I gave up and asked ChatGPT to find me data sources that were:
Recent
Official government sourced
Detailed / raw / row level (non-aggregated)
Within a few minutes, this led to the dataset at: https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8/about_data.
We’re in business! I wrote a quick notebook to clean up the data and export it in parquet, which took 5 minutes.
Stolen Catalytic Converters:
Let’s start by looking at some general trends for all crime types with more than 1k occurrences:
Interesting but difficult to interpret - what category does the stolen catalytic converter fall? I filtered for a few potential categories and did some ad hoc analysis. We can revisit the generated knowledge graph through the generated HyperGraph here:
Now, we can ask HyperGraph to summarize the analysis. I ask: “What's the trend for categories incolving stolen catalytic converters?”
We get back the following response associated with a few charts:
One of the charts returned:
The analysis considers my exploration of relevant and related queries during the exploration. It seems reasonable that there’s a slight increasing trend followed by heavy seasonal variability. Most importantly, it returned relevant and citeable queries. It appears that vehicle theft-related crimes (excluding GRAND THEFT AUTO) have risen slightly compared to four years ago.
Homocide:
I spent the next 5 min exploring the LA crime data specifically for “CRIMINAL HOMICIDE.” This resulted in a few interesting charts (higlighting this one for visual):
Now let’s ask it a few questions:
Q: “Tell me about the homicide rates in LA”
Q: “Tell me about the victims and locations of the homicides”
Q: “What advice do you have for when I visit LA?”
What’s interesting is that our answers extend beyond the original question. Our belief is that data analysis is often not one-dimensional - it’s seldom a single “value.” Instead, the best analysts think about orthogonal data and questions to help portray a more complete picture.
Conclusion:
As far as I can tell, crime in LA is trending mostly sideways. It's not much safer, but it’s not much worse off either. The reporting of theft-related crimes has increased compared to during the pandemic, but not by a significant amount. This fits my personal experience, too.
We spent 20 min exploring the data, and we’re able to revisit and summarize the actual insights derived from the tends of charts that we quickly created. The best part of all of this? By simply exploring the crime data, we’re generating more data than anyone in “how to do data analysis.” Imagine your analysts working on your tool, and it organically creates analysis that aids your business. We believe that it’s time to go beyond close ended questions like “how many CRIMINIAL HOMICIDE occurred in LA,” and empower an analyst to revisit and share contexts of their analysis. Reach out for a demo if you want to learn more at hyperarc.com.
Side Note:
The data looked a bit funky for the year 2024, partly due to a notice on the site about an update that suggested “(t)his new system is being implemented to comply with the FBI's mandate to collect NIBRS-only data (NIBRS — FBI - https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr/nibrs). I truncated the data to exclude reports after March 1st, 2024.
I couldn’t find clean and easy-to-analyze data before 2020. That data would paint a more complete picture.
We’re looking for people to help us analyze public data with us! If you are interested, ping me on LinkedIn.