Does Social Isolation Affect Every Area Equally in the US?
Comparing Urban, Suburban, and Rural areas with some AI assistance.
Author: David Wang | https://www.linkedin.com/in/davidwang001/
Dashboard Link: https://app.hyperarc.com/?isEmbed=true&embedId=8a5ea5fa-9394-4b89-b304-c56d938fbe95#/davidblog/cdc-isolation-blog/dashboard/cdc-isolation-dashboard
It seems like every other week, I stumble across a headline about social isolation and how young people prefer spending time with their cell phones rather than with each other.
I don’t doubt that social isolation is generally a widespread problem, but I wonder if it affects people in different geographical areas (i.e., urban vs. suburban vs. rural). Before I pull any data, I hypothesize (including my own experiences) that urban areas will be less impacted, given the reduced friction of making social contact.
To answer the hypothesis, I looked at the CDC website, where I found map data on social isolation.
You can follow the first link to enter a map view of the US with adjusted social isolation (zoom into where you live to see how your county stacks up!). https://www.arcgis.com/home/item.html?id=41179c913a404456bc5d16f419f66d28
Use this second link to access the csv / api endpoint to download the dataset. https://data.cdc.gov/500-Cities-Places/PLACES-County-Data-GIS-Friendly-Format-2024-releas/i46a-9kgh/about_data
I downloaded the CSV into Google Sheets since the dataset was quite manageable. I cleaned up the columns and kept only the most interesting ones for analysis. Looking at the data, I was surprised to see the number of physical and mental health-related columns in addition to social isolation, so I thought it could be interesting to try and establish a correlation for some other combinations in addition to testing our original hypothesis.
Here are the list of columns I chose to keep and why:
StateAbbr, StateDesc, and County for general info on each row
TotalPopulation, TotallPop18Pllus as the main dimension to group the health measures
Depression, GHLTH, LPA, MHLTH, Obesity, Isolation, and EmotionSPT were the health measures. I took crude and adjusted measures but used adjusted for my analysis. I wondered if poorer mental or physical health would lead to more isolation since lower mobility and mental well-being could easily prevent people from meeting others.
Oops..forgot something important…
The population counts are absolute numbers and can’t be used to define categories. I found county categorizations on Jed Kolko’s website and joined it with my current table, matching counties in my table with the classifications in his dataset. Below are the classifications (I added the ‘Class’ column to be more descriptive):
Ok! We finally have our final table, and we are ready to analyze. Plugging it into HyperArc was easy:
Uploaded Dataset
Initial Explorations:
Open a new query and plug in the desired fields/groupings. I decided to focus on Social Isolation and Depression, as they were the most relevant measures of isolation.
I found statistically significant differences between the county classifications, and it became clear that my original hypothesis was wrong. Surprisingly, urban cities have the most social isolation, and the metric for social isolation is lowest in suburbs but rises back up for lower-density areas.
If I had to guess, social isolation in cities is high because there are more career-focused single people who prioritize genuine social relationships less. In the suburbs, I would guess social isolation is lowest because those with families tend to live there but still live near others to have a nice balance of friends and family. As the distance between people increases, the isolation metric rises again because it is more effort to hang out with others.
I wanted to see if any other interesting trends existed, so I asked HyperArc’s AI for help.
Ok - the New Mexico social isolation average is way higher than the next one. That’s worth exploring, for sure. Let’s ask the AI a more direct question.
Make sense! Older folks tend to be lower energy, less mobile, and have fewer friends. =(more rural areas will also mean more distance required to socialize, adding another point of friction. This is something that would have taken time to hypothesize, but with the AI it was much faster.
You can access the public dashboard here (Direct Link: https://app.hyperarc.com/?isEmbed=true&embedId=8a5ea5fa-9394-4b89-b304-c56d938fbe95#/davidblog/cdc-isolation-blog/dashboard/cdc-isolation-dashboard) and play around with the dataset yourself. How do isolation and depression in your state and county compare to others?
Hope you found this analysis interesting!
Appendix
Methodology for Pulling Data
CDC Data - download into csv then export into google sheets. https://data.cdc.gov/500-Cities-Places/PLACES-County-Data-GIS-Friendly-Format-2024-releas/i46a-9kgh/about_data
Manually deleted all the columns I wasn’t going to use.
Jed Kolko’s County Classification Data - download into same csv into sheets file. https://jedkolko.com/datasets/
Used Jupyter notebook + python / pandas to merge the two datasets by county name. Created a new column in sheets for descriptive name of county classification codes.
Calculating Statistical Significance of County Classification
Used ANOVA test among the different county classifications. Results show statistically significant differences between all county classes.
F-statistic: 11.550692300229647
P-value: 4.8413173711350395e-11
Correlation of Other Health Factors
I also ran some Pearson Correlation Coefficients for the Isolation measure versus other health issues. Notably, there was a medium to strong correlation with lack of social and emotional support, poor general health, poor mental health, depression, and even obesity. I believe this makes sense, given interacting with others gets us moving more and satisfies a biological need to socialize.
Future Explorations
Having more metadata for the CDC data would have been nice - I found more information via Chat GPT, but it would have been nice to have it directly associated with the dataset. I.e., How are the different health issues calculated? What is the difference between crude and adjusted?
An interesting follow-up is to find a dataset for social isolation paired with demographic information to conduct a similar analysis.
I felt isolated just reading this