c = 5 / 9 * (f - 32)
f
(the Fahrenheit temperature) is the independent variablec
(the Celsius temperature) is the dependent variablec
depends on the value of f
used in the calculation# enable high-res images in notebook
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
c = lambda f: 5 / 9 * (f - 32)
temps = [(f, c(f)) for f in range(0, 101, 10)]
DataFrame
, then use its plot
method to display the linear relationship between the temperaturesstyle
keyword argument controls the data’s appearance'.-'
indicates that each point should appear as a dot, and that lines should connect the dotsimport pandas as pd
temps_df = pd.DataFrame(temps, columns=['Fahrenheit', 'Celsius'])
axes = temps_df.plot(x='Fahrenheit', y='Celsius', style='.-')
y_label = axes.set_ylabel('Celsius')
The points along any straight line can be calculated with:
\begin{equation} y = m x + b \end{equation}slope
,linregress
from the SciPy’s stats
Module¶linregress
from the SciPy’s stats
Module (cont.)¶linregress
function (from the scipy.stats
module) performs simple linear regression for youave_hi_nyc_jan_1895-2018.csv
in the ch10
examples folderDate
—A value of the form 'YYYYMM’
(such as '201801'
). MM
is always 01
because we downloaded data for only January of each year. Value
—A floating-point Fahrenheit temperature.Anomaly
—The difference between the value for the given date and average values for all dates (not used in this example)DataFrame
¶nyc = pd.read_csv('ave_hi_nyc_jan_1895-2018.csv')
nyc.head()
nyc.tail()
'Value'
column as 'Temperature'
nyc.columns = ['Date', 'Temperature', 'Anomaly']
nyc.head(3)
Date
values01
(for January), so we’ll remove it from each Date
nyc.Date.dtype
Series
method floordiv
performs integer division on every element of the Series
nyc.Date = nyc.Date.floordiv(100)
nyc.head(3)
describe
on the Temperature
columnpd.set_option('precision', 2)
nyc.Temperature.describe()
stats
module provides function linregress
, which calculates a regression line’s slope and intercept from scipy import stats
linear_regression = stats.linregress(x=nyc.Date,
y=nyc.Temperature)
linregress
receives two one-dimensional arrays of the same length representing the data points’ x- and y-coordinatesx
and y
represent the independent and dependent variables, respectivelyslope
and intercept
linear_regression.slope
linear_regression.intercept
linear_regression.slope
is m, 2019
is x (the date value for which you’d like to predict the temperature), and linear_regression.intercept
is b:linear_regression.slope * 2019 + linear_regression.intercept
linear_regression.slope * 1890 + linear_regression.intercept
regplot
function plots each data point with the dates on the x**-axis and the temperatures on the y-axisTemperature
s for the given Date
s and adds the regression lineregplot
’s x
and y
keyword arguments are one-dimensional arrays of the same length representing the x-y coordinate pairs to plotimport seaborn as sns
sns.set_style('whitegrid')
axes = sns.regplot(x=nyc.Date, y=nyc.Temperature)
axes.set_ylim(10, 70)
Sources time-series dataset |
---|
https://data.gov/ |
This is the U.S. government’s open data portal. Searching for “time series” yields over 7200 time-series datasets. |
https://www.ncdc.noaa.gov/cag/` |
The National Oceanic and Atmospheric Administration (NOAA) Climate at a Glance portal provides both global and U.S. weather-related time series. |
https://www.esrl.noaa.gov/psd/data/timeseries/ |
NOAA’s Earth System Research Laboratory (ESRL) portal provides monthly and seasonal climate-related time series. |
https://www.quandl.com/search |
Quandl provides hundreds of free financial-related time series, as well as fee-based time series. |
https://datamarket.com/data/list/?q=provider:tsdl |
The Time Series Data Library (TSDL) provides links to hundreds of time series datasets across many industries. |
http://archive.ics.uci.edu/ml/datasets.html |
The University of California Irvine (UCI) Machine Learning Repository contains dozens of time-series datasets for a variety of topics. |
http://inforumweb.umd.edu/econdata/econdata.html |
The University of Maryland’s EconData service provides links to thousands of economic time series from various U.S. government agencies. |
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.