Doing data science

Schutt, Rachel, 1976- author.

Book 2013

A guide to the usefulness of data science covers such topics as algorithms, logistic regression, financial modeling, data visualization, and data engineering.

Item Details & Discovery

Explore Topics & Themes:

Big data

Cyberinfrastructure

Data mining

Data structures (Computer science)

Database management

Information science

View technical details & catalog information

ISBN:

9781449358655
1449358659

Edition:

First edition

Description:

xxiv, 375 pages : illustrations ; 23 cm

Notes:

Includes index

Contents:

Machine generated contents note: Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
The Current Landscape (with a Little History)
Data Science Jobs
A Data Science Profile
Thought Experiment: Meta-Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
Statistical Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
A Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k-Nearest Neighbors (k-NN)
k-means
Exercise: Basic Machine Learning Algorithms
Thought Experiment
Financial Modeling
In-Sample, Out-of-Sample, and Causality
Preparing Financial Data
Log Returns
Example: The S and P Index
Working out a Volatility Measurement
Exponential Downweighting
The Financial Modeling Feedback Loop
Why Regression?
Adding Priors
A Baby Model
Exercise: GetGlue and Timestamped Event Data
Exercise: Financial Data
William Cukierski
Background: Data Science Competitions
Background: Crowdsourcing
The Kaggle Model
A Single Contestant
Their Customers
Thought Experiment: What Are the Ethical Implications of a Robo-Grader?
Feature Selection
Example: User Retention
Filters
Wrappers
Embedded Methods: Decision Trees
Entropy
The Decision Tree Algorithm
Handling Continuous Variables in Decision Trees
Random Forests
User Retention: Interpretability Versus Predictive Power
David Huffaker: Google's Hybrid Approach to Social Research
Moving from Descriptive to Predictive
Social at Google
Privacy
Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
A Real-World Recommendation Engine
Nearest Neighbor Algorithm Review
Some Problems with Nearest Neighbors
Beyond Nearest Neighbor: Machine Learning Classification
The Dimensionality Problem
Singular Value Decomposition (SVD)
Important Properties of SVD
Principal Component Analysis (PCA)
Alternating Least Squares
Fix V and Update U
Last Thoughts on These Algorithms
Thought Experiment: Filter Bubbles
Exercise: Build Your Own Recommendation System
Sample Code in Python
Data Visualization History
Gabriel Tarde
Mark's Thought Experiment
What Is Data Science, Redux?
Processing
Franco Moretti
A Sample of Data Visualization Projects
Mark's Data Visualization Projects
New York Times Lobby: Moveable Type
Project Cascade: Lives on a Screen
Cronkite Plaza
eBay Transactions and Books
Public Theater Shakespeare Machine
Goals of These Exhibits
Data Science and Risk
About Square
The Risk Challenge
The Trouble with Performance Estimation
Model Building Tips
Data Visualization at Square
Ian's Thought Experiment
Data Visualization for the Rest of Us
Data Visualization Exercise
Social Network Analysis at Morning Analytics
Case-Attribute Data versus Social Network Data
Social Network Analysis
Terminology from Social Networks
Centrality Measures
The Industry of Centrality Measures
Thought Experiment
Morningside Analytics
How Visualizations Help Us Find Schools of Fish
More Background on Social Network Analysis from a Statistical Point of View
Representations of Networks and Eigenvalue Centrality
A First Example of Random Graphs: The Erdos-Renyi Model
A Second Example of Random Graphs: The Exponential Random Graph Model
Data Journalism
A Bit of History on Data Journalism
Writing Technical Journalism: Advice from an Expert
Correlation Doesn't Imply Causation
Asking Causal Questions
Confounders: A Dating Example
OK Cupid's Attempt
The Gold Standard: Randomized Clinical Trials
A/B Tests
Second Best: Observational Studies
Simpson's Paradox
The Rubin Causal Model
Visualizing Causality
Definition: The Causal Effect
Three Pieces of Advice
Madigan's Background
Thought Experiment
Modern Academic Statistics
Medical Literature and Observational Studies
Stratification Does Not Solve the Confounder Problem
What Do People Do About Confounding Things in Practice?
Is There a Better Way?
Research Experiment (Observational Medical Outcomes Partnership)
Closing Thought Experiment
Claudia's Data Scientist Profile
The Life of a Chief Data Scientist
On Being a Female Data Scientist
Data Mining Competitions
How to Be a Good Modeler
Data Leakage
Market Predictions
Amazon Case Study: Big Spenders
A Jewelry Sampling Problem
IBM Customer Targeting
Breast Cancer Detection
Pneumonia Prediction
How to Avoid Leakage
Evaluating Models
Accuracy: Meh
Probabilities Matter, Not 0s and 1s
Choosing an Algorithm
A Final Example
Parting Thoughts
About David Crawshaw
Thought Experiment
MapReduce
Word Frequency Problem
Enter MapReduce
Other Examples of MapReduce
What Can't MapReduce Do?
Pregel
About Josh Wills
Thought Experiment
On Being a Data Scientist
Data Abundance Versus Data Scarcity
Designing Models
Economic Interlude: Hadoop
A Brief Introduction to Hadoop
Cloudera
Back to Josh: Workflow
So How to Get Started with Hadoop?
Process Thinking
Naive No Longer
Helping Hands
Your Mileage May Vary
Bridging Tunnels
Some of Our Work
What Just Happened?
What Is Data Science (Again)?
What Are Next-Gen Data Scientists?
Being Problem Solvers
Cultivating Soft Skills
Being Question Askers
Being an Ethical Data Scientist
Career Advice

Control Number:

1994892

Other Authors:

O'Neil, Cathy, author

View in Classic Catalog