import pandas as pd
titanic = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/carData/TitanicSurvival.csv')
DataFrame
shows only the first 30 rows, followed by “…” and the last 30 rowsDataFrame
methods head
and tail
pd.set_option('precision', 2) # format for floating-point values
titanic.head()
titanic.tail()
1305
is NaN
(not a number), indicating a missing value in the dataset'Unnamed: 0'
)titanic.columns = ['name', 'survived', 'sex', 'age', 'class']
titanic.head()
describe
on a DataFrame
containing both numeric and non-numeric columns produces descriptive statistics only for the numeric columnsage
columntitanic.describe()
count
(1046
) vs. the dataset’s number of rows (1309—the last row’s index was 1308
when we called tail
)1046
(the count
above) of the records contained an ageNaN
NaN
) by default1046
people with valid agesmean
) age was 29.88
years oldmin
) was just over two months old (0.17 * 12
is 2.04
)max
) was 8028
(indicated by the 50%
quartile)25%
quartile is the median age in the first half of the passengers (sorted by age)75%
quartile is the median of the second half of passengerssurvived
column to 'yes'
to get a new Series
containing True/False
values, then use describe
to summarize the results(titanic.survived == 'yes').describe()
describe
displays different descriptive statistics:count
is the total number of items in the resultunique
is the number of unique values (2
) in the result—True
(survived) and False
(died)top
is the most frequently occurring value in the resultfreq
is the number of occurrences of the top
value%matplotlib inline
DataFrame
’s hist
method analyzes each numerical column’s data and produces a separate histogram for each numerical columnhistogram = titanic.hist()
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.