Bulk Download NYS Election District Enrollments

For some of my projects, I need access to the full set of enrollments from the State Board of Elections. While you can download them one at a time, it sure is nice to have all of them for statewide coverage to easy processing. You can get them using Beautiful Soup.

import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

yr = '12'
mon = 'nov'
ext = '.pdf' #.xlsx

url = "https://www.elections.ny.gov/20"+yr+"EnrollmentED.html"

#If there is no such folder, the script will create one automatically
folder_location = r''+mon+yr+'-Enrollment'
if not os.path.exists(folder_location):os.mkdir(folder_location)

response = requests.get(url, { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103197 Safari/537.36' })
soup= BeautifulSoup(response.text, "html.parser")     
for link in soup.select("a[href$='"+mon+yr+ext+"']"):
    #Name the pdf files using the last portion of each link which are unique in this case
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f:
        f.write(requests.get(urljoin(url,link['href'])).content)

What can you do with them? If you are looking at the numbers from 2018 and later, you can load them directly in PANDAS.

import glob as g
import pandas as pd

df = pd.concat((pd.read_excel(f, header=4) for f in g.glob('/home/andy/2021-Enrollment/*xlsx')))

df = df[(df['STATUS']=='Active')]

df.insert(1,'Municipality',df['ELECTION DIST'].str[:-7].str.title())
df.insert(2,'Ward',df['ELECTION DIST'].str[-6:-3])
df.insert(3,'ED', df['ELECTION DIST'].str[-3:])

df=df.dropna()
df=df.sort_values(by=['COUNTY','ELECTION DIST'])

df

	COUNTY	Municipality	Ward	ED	ELECTION DIST	STATUS	DEM	REP	CON	WOR	OTH	BLANK	TOTAL
1	Albany	Albany	001	001	ALBANY 001001	Active	74.0	9.0	0.0	0.0	0.0	16.0	99.0
5	Albany	Albany	001	002	ALBANY 001002	Active	311.0	16.0	2.0	1.0	10.0	47.0	387.0
9	Albany	Albany	001	003	ALBANY 001003	Active	472.0	26.0	5.0	1.0	27.0	121.0	652.0
13	Albany	Albany	001	004	ALBANY 001004	Active	437.0	30.0	2.0	3.0	12.0	92.0	576.0
17	Albany	Albany	001	005	ALBANY 001005	Active	13.0	0.0	0.0	0.0	0.0	0.0	13.0
…	…	…	…	…	…	…	…	…	…	…	…	…	…
53	Yates	Milo	000	006	Milo 000006	Active	204.0	409.0	13.0	3.0	50.0	182.0	861.0
57	Yates	Potter	000	001	Potter 000001	Active	144.0	460.0	25.0	1.0	51.0	226.0	907.0
61	Yates	Starkey	000	001	Starkey 000001	Active	187.0	370.0	15.0	5.0	57.0	209.0	843.0
65	Yates	Starkey	000	002	Starkey 000002	Active	189.0	433.0	18.0	4.0	46.0	201.0	891.0
69	Yates	Torrey	000	001	Torrey 000001	Active	185.0	313.0	9.0	3.0	39.0	159.0	708.0

And then do all kinds of typical PANDAS processing. For example to categorize the counties into three pots — based on a comparison of the number of Democrats to Republicans, you could run this command:

tdf=df.groupby(by=['COUNTY']).sum()
pd.cut((tdf.div(tdf.iloc[:,:-1].sum(axis=1),axis=0)*100).sort_values(by='BLANK',ascending=False)['DEM'],3,labels=['Rep','Mix','Dem']).sort_values()

COUNTY
Rensselaer      Rep
Genesee         Rep
Oswego          Rep
Livingston      Rep
Schuyler        Rep
Orleans         Rep
Seneca          Rep
Otsego          Rep
Warren          Rep
St.Lawrence     Rep
Cattaraugus     Rep
Chemung         Rep
Cayuga          Rep
Tioga           Rep
Yates           Rep
Broome          Rep
Franklin        Rep
Oneida          Rep
Fulton          Rep
Allegany        Rep
Essex           Rep
Niagara         Rep
Steuben         Rep
Herkimer        Rep
Lewis           Rep
Delaware        Rep
Wyoming         Rep
Chenango        Rep
Hamilton        Rep
Greene          Rep
Sullivan        Rep
Wayne           Rep
Jefferson       Rep
Ontario         Rep
Chautauqua      Rep
Putnam          Rep
Madison         Rep
Washington      Rep
Schoharie       Rep
Dutchess        Rep
Montgomery      Rep
Clinton         Rep
Orange          Rep
Cortland        Rep
Suffolk         Rep
Onondaga        Rep
Saratoga        Rep
Erie            Mix
Schenectady     Mix
Ulster          Mix
Rockland        Mix
Columbia        Mix
Monroe          Mix
Westchester     Mix
Richmond        Mix
Albany          Mix
Tompkins        Mix
Nassau          Mix
Queens          Dem
Bronx           Dem
New York        Dem
Kings           Dem
Name: DEM, dtype: category
Categories (3, object): ['Rep' < 'Mix' < 'Dem']

If you are using the pre-2018 data, I suggest converting the PDF to text documents using the pdftotext -layout which is available on most Linux distributions as part of the poppler-utils package. This converts the PDF tables to text files, which you can process like this:

import pandas as pd
df = pd.read_csv('/home/andy/Desktop/AlbanyED_nov16.txt',
   skiprows=3,
   sep='\s{2,}',
   engine='python',
   error_bad_lines=False)
df=df[df['STATUS']=='Active']
df=df.dropna()

Bulk Download NYS Election District Enrollments

Leave a Reply Cancel reply