An edition of Data science from scratch (2015)

Data science from scratch

first principles with Python

First edition.
  • 5.0 (1 rating)
  • 166 Want to read
  • 16 Currently reading
  • 2 Have read

My Reading Lists:

Create a new list

  • 5.0 (1 rating)
  • 166 Want to read
  • 16 Currently reading
  • 2 Have read

Buy this book

Last edited by ImportBot
December 20, 2023 | History
An edition of Data science from scratch (2015)

Data science from scratch

first principles with Python

First edition.
  • 5.0 (1 rating)
  • 166 Want to read
  • 16 Currently reading
  • 2 Have read

This work doesn't have a description yet. Can you add one?

Publish Date
Publisher
O'Reilly Media
Language
English
Pages
311

Buy this book

Previews available in: English

Edition Availability
Cover of: Data Science from Scratch
Data Science from Scratch: First Principles with Python
2019, O'Reilly Media, Incorporated
in English
Cover of: Data Science from Scratch
Data Science from Scratch: First Principles with Python
2015, O'Reilly Media, Incorporated
in English
Cover of: Data Science from Scratch
Data Science from Scratch: First Principles with Python
2015, O'Reilly Media, Incorporated
in English
Cover of: Data science from scratch
Data science from scratch: first principles with Python
2015, O'Reilly Media
in English - First edition.

Add another edition?

Book Details


Table of Contents

Preface
Page xi
1. Introduction
Page 1
The Ascendance of Data
Page 1
What Is Data Science?
Page 1
Motivating Hypothetical: DataSciencester
Page 2
Finding Key Connectors
Page 3
Data Scientists You May Know
Page 6
Salaries and Experience
Page 8
Paid Accounts
Page 11
Topics of Interest
Page 11
Onward
Page 13
2. A Crash Course in Python
Page 15
The Basics
Page 15
Getting Python
Page 15
The Zen of Python
Page 16
Whitespace Formatting
Page 16
Modules
Page 17
Arithmetic
Page 18
Functions
Page 18
Strings
Page 19
Exceptions
Page 19
Lists
Page 20
Tuples
Page 21
Dictionaries
Page 21
Sets
Page 24
Control Flow
Page 25
Truthiness
Page 25
The Not-So-Basics
Page 26
Sorting
Page 27
List Comprehensions
Page 27
Generators and Iterators
Page 28
Randomness
Page 29
Regular Expressions
Page 30
Object-Oriented Programming
Page 30
Functional Tools
Page 31
enumerate
Page 32
zip and Argument Unpacking
Page 33
args and kwargs
Page 35
Welcome to DataSciencester!
Page 35
For Further Exploration
Page 35
3. Visualizing Data
Page 37
matplotlib
Page 37
Line Charts
Page 39
Bar Charts
Page 43
Scatterplots
Page 44
For Further Exploration
Page 47
4. Linear Algebra
Page 49
Vectors
Page 49
Matrices
Page 53
For Further Exploration
Page 55
5. Statistics
Page 57
Describing a Single Set of Data
Page 57
Central Tendencies
Page 59
Dispersion
Page 61
Correlation
Page 62
Simpson's Paradox
Page 65
Some Other Correlational Caveats
Page 66
Correlation and Causation
Page 67
For Further Exploration
Page 68
6. Probability
Page 69
Dependence and Independence
Page 69
Conditional Probability
Page 70
Bayes's Theorem
Page 72
Random Variables
Page 73
Continuous Distributions
Page 74
The Normal Distribution
Page 75
The Central Limit Theorem
Page 78
For Further Exploration
Page 80
7. Hypothesis and Inference
Page 81
Statistical Hypothesis Testing
Page 81
Example: Flipping a Coin
Page 81
Confidence Intervals
Page 85
P-hacking
Page 86
Example: Running an A/B Test
Page 87
Bayesian Inference
Page 88
For Further Exploration
Page 92
8. Gradient Descent
Page 93
The Idea Behind Gradient Descent
Page 93
Estimating the Gradient
Page 94
Using the Gradient
Page 97
Choosing the Right Step Size
Page 97
Putting It All Together
Page 98
Stochastic Gradient Descent
Page 99
For Further Exploration
Page 100
9. Getting Data
Page 103
stdin and stdout
Page 103
Reading Files
Page 105
The Basics of Text Files
Page 105
Delimited Files
Page 106
Scraping the Web
Page 108
HTML and the Parsing Thereof
Page 108
Example: O'Reilly Books About Data
Page 110
Using APIs
Page 114
JSON (and XML)
Page 114
Using an Unauthenticated API
Page 115
Finding APIs
Page 116
Example: Using the Twitter APIs
Page 117
Getting Credentials
Page 117
For Further Exploration
Page 120
10. Working with Data
Page 121
Exploring Your Data
Page 121
Exploring One-Dimensional Data
Page 121
Two Dimensions
Page 123
Many Dimensions
Page 125
Cleaning and Munging
Page 127
Manipulating Data
Page 129
Rescaling
Page 132
Dimensionality Reduction
Page 134
For Further Exploration
Page 139
11. Machine Learning
Page 141
Modeling
Page 141
What Is Machine Learning?
Page 142
Overfitting and Underfitting
Page 142
Correctness
Page 145
The Bias-Variance Trade-off
Page 147
Feature Extraction and Selection
Page 148
For Further Exploration
Page 150
12. k-Nearest Neighbors
Page 151
The Model
Page 151
Example: Favorite Languages
Page 153
The Curse of Dimensionality
Page 156
For Further Exploration
Page 163
13. Naive Bayes
Page 165
A Really Dumb Spam Filter
Page 165
A More Sophisticated Spam Filter
Page 166
Implementation
Page 168
Testing Our Model
Page 169
For Further Exploration
Page 172
14. Simple Linear Regression
Page 173
The Model
Page 173
Using Gradient Descent
Page 176
Maximum Likelihood Estimation
Page 177
For Further Exploration
Page 177
15. Multiple Regression
Page 179
The Model
Page 179
Further Assumptions of the Least Squares Model
Page 180
Fitting the Model
Page 181
Interpreting the Model
Page 182
Goodness of Fit
Page 183
Digression: The Bootstrap
Page 183
Standard Errors of Regression Coefficients
Page 184
Regularization
Page 186
For Further Exploration
Page 188
16. Logistic Regression
Page 189
The Problem
Page 189
The Logistic Function
Page 192
Applying the Model
Page 194
Goodness of Fit
Page 195
Support Vector Machines
Page 196
For Further Investigation
Page 200
17. Decision Trees
Page 201
What Is a Decision Tree?
Page 201
Entropy
Page 203
The Entropy of a Partition
Page 205
Creating a Decision Tree
Page 206
Putting It All Together
Page 208
Random Forests
Page 211
For Further Exploration
Page 212
18. Neural Networks
Page 213
Perceptrons
Page 213
Feed-Forward Neural Networks
Page 215
Backpropagation
Page 218
Example: Defeating a CAPTCHA
Page 219
For Further Exploration
Page 224
19. Clustering
Page 225
The Idea
Page 225
The Model
Page 226
Example: Meetups
Page 227
Choosing k
Page 230
Example: Clustering Colors
Page 231
Bottom-up Hierarchical Clustering
Page 233
For Further Exploration
Page 238
20. Natural Language Processing
Page 239
Word Clouds
Page 239
n-gram Models
Page 241
Grammars
Page 244
An Aside: Gibbs Sampling
Page 246
Topic Modeling
Page 247
For Further Exploration
Page 253
21. Network Analysis
Page 255
Betweenness Centrality
Page 255
Eigenvector Centrality
Page 260
Matrix Multiplication
Page 262
Directed Graphs and PageRank
Page 264
For Further Exploration
Page 266
22. Recommender Systems
Page 267
Manual Curation
Page 268
Recommending What's Popular
Page 268
User-Based Collaborative Filtering
Page 269
Item-Based Collaborative Filtering
Page 272
For Further Exploration
Page 274
23. Databases and SQL
Page 275
CREATE TABLE and INSERT
Page 275
UPDATE
Page 277
DELETE
Page 278
SELECT
Page 278
GROUP BY
Page 280
ORDER BY
Page 282
JOIN
Page 283
Subqueries
Page 285
Indexes
Page 285
Query Optimization
Page 286
NoSQL
Page 287
For Further Exploration
Page 287
24. MapReduce
Page 289
Example: Word Count
Page 289
Why MapReduce?
Page 291
MapReduce More Generally
Page 292
Example: Analyzing Status Updates
Page 293
Example: Matrix Multiplication
Page 294
An Aside: Combiners
Page 296
For Further Exploration
Page 296
25. Go Forth and Do Data Science
Page 299
IPython
Page 299
Mathematics
Page 300
Not from Scratch
Page 300
NumPy
Page 301
pandas
Page 301
scikit-learn
Page 301
Visualization
Page 301
R
Page 302
Find Data
Page 302
Do Data Science
Page 303
Hacker News
Page 303
Fire Trucks
Page 303
T-shirts
Page 304
And You?
Page 304
Index
Page 305

Edition Notes

Includes index.

Subtitle from cover.

Classifications

Library of Congress
QA76.73.P98 G78 2015, QA76.9.D343, QA76.9.D3 G78 2015eb

The Physical Object

Pagination
xvi, 311 pages
Number of pages
311

Edition Identifiers

Open Library
OL27186779M
Internet Archive
datasciencefroms0000grus
ISBN 10
149190142X
ISBN 13
9781491901427
LCCN
2015472852
OCLC/WorldCat
898161437, 907532468
Wikidata
Q106987804

Work Identifiers

Work ID
OL20006690W

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

Download catalog record: RDF / JSON