CS301_Proj3
Description
The City of Madison has many different
agencies providing a variety
of services. In this project, you’ll analyze real spending data from
2015 to 2018 for five of the largest agencies: police, fire, streets,
library, and parks. You’ll get practice calling functions from a
project
module, which we’ll provide, and practice writing your own
functions.
Start by downloading project.py
, test.py
and madison.csv
.
Double check that these files don’t get renamed by your browser (by
running ls
in the terminal from your p3
project directory).
You’ll do all your work in a new main.ipynb
notebook that you’ll
create and hand in when you’re done (please do not write your
functions in a separate .py file). You’ll test as usual by running
python test.py
(or similar, depending on your laptop setup). Before
handing in, please put the project, submitter, and partner info in a
comment in the first cell, in the same format you used for previous
projects (please continue doing so for all projects this semester).
We won’t explain how to use the project
module here (the code is in the
project.py
file). The lab this week is designed to teach you how it
works, so be sure to do the lab from home (if you missed it) before
starting the project.
This project consists of writing code to answer 20 questions. If
you’re answering a particular question in a cell in your notebook, you
need to put a comment in the cell so we know what you’re answering.
For example, if you’re answering question 13, the first line of your
cell should contain #q13
.
Dataset
The data looks like this:
agency_id | agency | 2015 | 2016 | 2017 | 2018 |
---|---|---|---|---|---|
5 | police | 68.06346877 | 71.32575615000002 | 73.24794765999998 | 77.87553504 |
6 | fire | 49.73757877 | 51.96834048 | 53.14405332 | 55.215007260000014 |
9 | library | 16.96543425 | 18.12552139 | 19.13634773 | 19.845065799999997 |
12 | parks | 18.371421039999998 | 19.159243279999995 | 19.316837019999994 | 19.7607100000000 |
15 | streets | 25.368879940000006 | 28.2286218 | 26.655754419999994 | 27.798933740000003 |
The dataset is in the madison.csv
file. We’ll learn about CSV files
later in the semester. For now, you should know this about them:
- it’s easy to create them by exporting from Excel
- it’s easy to use them in Python programs
- we’ll give you a
project.py
module to help you extract data from CSV files until we teach you to do it directly yourself
All the numbers in the dataset are in millions of dollars. Answer questions in millions of dollars unless we specify otherwise.
# project: p3
# submitter: naixinzhang
# partner: none
import project
project.init("madison.csv")
streets_id = project.get_id("streets")
police_id = project.get_id("police")
fire_id = project.get_id("fire")
library_id = project.get_id("library")
parks_id = project.get_id("parks")
#q1 What is the agency ID of the parks agency?
parks_id
12
#q2 How much did the agency with ID 6 spend in 2018?
project.get_spending(6, 2018)
55.215007260000014
#q3 How much did "streets" spend in 2017?
project.get_spending(streets_id, 2017)
26.655754419999994
#Function 1: year_max(year)
def year_max(year):
# grab the spending by each agency in the given year
police_spending = project.get_spending(project.get_id("police"), year)
fire_spending = project.get_spending(project.get_id("fire"), year)
library_spending = project.get_spending(project.get_id("library"), year)
parks_spending = project.get_spending(project.get_id("parks"), year)
streets_spending = project.get_spending(project.get_id("streets"), year)
# use builtin max function to get the largest of the five values
return max(police_spending, fire_spending, library_spending, parks_spending, streets_spending)
#q4 What was the most spent by a single agency in 2015?
year_max(2015)
68.06346877
#q5 What was the most spent by a single agency in 2018?
year_max(2018)
77.87553504
# Function 2: agency_min(agency)
def agency_min(agency):
agency_id = project.get_id(agency)
y15 = project.get_spending(agency_id, 2015)
y16 = project.get_spending(agency_id, 2016)
# grab the other years
y17 = project.get_spending(agency_id, 2017)
y18 = project.get_spending(agency_id, 2018)
# use the min function (similar to the max function)
# to get the minimum across the four years, and return
# that value
return min(y15, y16, y17, y18)
#q6 What was the least the police ever spent in a year?
agency_min(agency = 'police')
68.06346877
#q7 What was the least that library ever spent in a year?
agency_min(agency = 'library')
16.96543425
#q8 What was the least that parks ever spent in a year?
agency_min(agency = 'parks')
18.371421039999998
#Function 3: agency_avg(agency)
def agency_avg(agency):
agency_id = project.get_id(agency)
y15 = project.get_spending(agency_id, 2015)
y16 = project.get_spending(agency_id, 2016)
y17 = project.get_spending(agency_id, 2017)
y18 = project.get_spending(agency_id, 2018)
num = [y15, y16, y17, y18]
return sum(num)/4
#q9 How much is spent per year on streets, on average?
agency_avg(agency='streets')
27.013047475
#q10 How much is spent per year on fire, on average?
agency_avg(agency='fire')
52.5162449575
#q11 How much did the police spend above their average in 2018?
y18 = project.get_spending(police_id, 2018)
average = agency_avg(agency = 'police')
(y18 -average) / average * 100
7.224961934351909
# Function 4: change_per_year(agency, start_year=2015, end_year=2018)
def change_per_year(agency, start_year= 2015, end_year = 2018):
agency_id = project.get_id(agency)
spending_startyear = project.get_spending(agency_id,start_year)
spending_endyear = project.get_spending(agency_id,end_year)
return (spending_endyear-spending_startyear)/(end_year-start_year)
#q12 how much has spending increased per year (on average) for police from 2015 to 2018?
change_per_year(agency ='police')
3.2706887566666674
#q13 how much has spending increased per year (on average) for police from 2017 to 2018?
change_per_year(agency = 'police', start_year = 2017)
4.627587380000023
#q14 how much has spending increased per year (on average) for streets from 2016 to 2018?
change_per_year(agency = 'streets', start_year = 2016)
-0.2148440299999983
#Function 5: extrapolate(agency, year1, year2, year3)
def extrapolate(agency, year1, year2, year3):
change = change_per_year(agency, start_year = year1, end_year = year2)
agency_id = project.get_id(agency)
spending_year2 = project.get_spending(agency_id,year2)
return spending_year2+ (year3 - year2)*change
#q15 how much will library spend in 2019?
extrapolate(agency ='library',year1 = 2015, year2 = 2018, year3 = 2019)
20.80494298333333
#q16 how much will library spend in 2100?
extrapolate(agency ='library', year1=2015, year2=2018, year3=2100)
98.55499483333321
#q17 how much will library spend in 2100?
extrapolate(agency = 'library', year1 =2017, year2=2018, year3=2100)
77.95994753999969
#Function 6: extrapolate_error
def extrapolate_error(agency, year1, year2, year3):
agency_id = project.get_id(agency)
predict = extrapolate(agency, year1, year2, year3)
actual = project.get_spending(agency_id, year3)
return predict - actual
#q18 what is the error if we extrapolate to 2018 from the 2015-to-2017 data for police?
extrapolate_error(agency ='police', year1=2015, year2=2017, year3=2018)
-2.0353479350000327
#q19 what is the error if we extrapolate to 2018 from the 2015-to-2016 data for streets?
extrapolate_error(agency='streets', year1=2015, year2=2016, year3=2018)
6.149171779999982
#q20 what is the standard deviation for library spending over the 4 years?
def std_cal(agency, year1, year2, year3, year4):
library_id = project.get_id(agency)
library_1= project.get_spending(library_id, year1)
library_2= project.get_spending(library_id, year2)
library_3= project.get_spending(library_id, year3)
library_4= project.get_spending(library_id, year4)
mean = (library_1+library_2+library_3+library_4)/4
var = ((library_1 -mean)**2+(library_2 -mean)**2+(library_3 -mean)**2+(library_4 -mean)**2)/4
return var ** (1/2)
1.0848913984858986
#q20
std_cal(agency = 'library',year1 = 2015,year2=2016, year3=2017,year4=2018)
1.0848913984858986