ave_hi_nyc_jan_1895-2018.csv
DataFrame
¶ave_hi_nyc_jan_1895-2018.csv
, rename the 'Value'
column to 'Temperature'
, remove 01
from the end of each date value and display a few data samples:We added %matplotlib inline
to enable Matplotlib in this notebook.
%matplotlib inline
import pandas as pd
nyc = pd.read_csv('ave_hi_nyc_jan_1895-2018.csv')
nyc.head(3)
nyc.columns = ['Date', 'Temperature', 'Anomaly']
nyc.Date = nyc.Date.floordiv(100)
nyc.head(3)
LinearRegression
estimator from sklearn.linear_model
Date
here) as the independent variableDataFrame
is a one-dimensional Series
Series
of n elements, into two dimensions containing n rows and one column nyc.Date.values
returns NumPy array containing Date
column’s valuesreshape(-1, 1)
tells reshape
to infer the number of rows, based on the number of columns (1
) and the number of elements (124) in the arrayfrom sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
nyc.Date.values.reshape(-1, 1), nyc.Temperature.values, random_state=11)
X_train.shape
X_test.shape
from sklearn.linear_model import LinearRegression
linear_regression = LinearRegression()
linear_regression.fit(X=X_train, y=y_train)
LinearRegression
estimator iteratively adjusts the slope and intercept to minimize the sum of the squares of the data points’ distances from the line coeff_
attribute (m in the equation) intercept_
attribute (b in the equation)linear_regression.coef_
linear_regression.intercept_
X_test
and check some of the predictionspredicted = linear_regression.predict(X_test)
expected = y_test
for p, e in zip(predicted[::5], expected[::5]): # check every 5th element
print(f'predicted: {p:.2f}, expected: {e:.2f}')
# lambda implements y = mx + b
predict = (lambda x: linear_regression.coef_ * x +
linear_regression.intercept_)
predict(2019)
predict(1890)
Cooler temperatures shown in darker colors
Instructor Note: All code that modifies a graph must be in the same notebook cell
import seaborn as sns
axes = sns.scatterplot(data=nyc, x='Date', y='Temperature',
hue='Temperature', palette='winter', legend=False)
axes.set_ylim(10, 70) # scale y-axis
import numpy as np
x = np.array([min(nyc.Date.values), max(nyc.Date.values)])
y = predict(x)
import matplotlib.pyplot as plt
line = plt.plot(x, y)
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.