Here is a list of the ten most Hispanic counties in New York State from the 2020 US Census
Here is a list of the ten most Hispanic counties in New York State from the 2020 US Census.
County | Percent Hispanic |
Bronx | 54.7625579396111 |
Queens | 27.7643315385306 |
Westchester | 26.8138904900857 |
New York | 23.7650737700612 |
Orange | 22.3627619545987 |
Suffolk | 21.8202133794694 |
Rockland | 19.6409412140254 |
Richmond | 19.5583634394157 |
Kings | 18.8747087980808 |
Nassau | 18.3715271956635 |
Here is how you can create this list using PANDAS. You will need to get the PL-94 171 Redistricting data, the Legacy File Format Header Records, and expand the ZIP file and place in the appropriate directory described below.
import pandas as pd
import geopandas as gpd
# path where 2020_PLSummaryFile_FieldNames.xlsx XX=State Code
# and XXgeo2020.pl, xx000012020.pl through XX000032020.pl
# reside on your hard drive
path='/home/andy/Desktop/2020pl-94-171/'
# state code
state='ny'
# header file, open with all tabs as an dictionary of dataframes
field_names=pd.read_excel(path+'2020_PLSummaryFile_FieldNames.xlsx', sheet_name=None)
# load the geoheader, force as str type to mixed types on certain fields
# ensure GEOIDs are properly processed avoids issues with paging
gh=pd.read_csv( path+state+'geo2020.pl',delimiter='|',
header=None,
names=field_names['2020 P.L. Geoheader Fields'].columns,
index_col='LOGRECNO',
dtype=str )
# load segment 1 of 2020 PL 94-171 which is racial data
segNum=1
seg=pd.read_csv( path+state+'0000'+str(segNum)+'2020.pl',delimiter='|',
header=None,
names=field_names['2020 P.L. Segment '+str(segNum)+' Fields'].columns,
index_col='LOGRECNO',
)
# discard FILEID, STUSAB, CHARITER, CIFSN as duplicative after join
seg=seg.iloc[:,4:]
# join seg to geoheader
seg=gh.join(seg)
# Calculate the population of New York Counties that is African American
# using County SUMLEVEL == 50 (see Census Docts)
ql="SUMLEV=='050'"
# Create a DataFrame with the County and Percent Hispani
# You can get the fields list from 2020 PL Summary FieldNames.xlsx
# under the 2020 P.L. Segment 1 Definitions tab
his=pd.DataFrame({ 'County': seg.query(ql)['BASENAME'],
'Percent Hispanic': seg.query(ql)['P0020002'] / seg.query(ql)['P0020001'] *100})
# Sort and print most Hispanic Counties
his.sort_values(by="Percent Hispanic", ascending=False).head(10).to_csv('/tmp/hispanics.csv')
1 Trackback or Pingback