Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science
By Thomas W. Miller
Programs and Data to Accompany "Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R (Revised and Expanded Edition)" Miller (2015) and "Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science" Miller (2015)
Note that many R programs contain library commands for bringing in R functions included in packages. To run these programs, the user needs to first install the packages in his/her R environment. Likewise for Python programs, many utilize data structures and methods that require the prior installation and importing of Python packages.
R programs were tested under R 3.1.1 on Mac OS 10.6.8. Python programs were tested under Enthought Canopy and Python 2.7 on Mac OS 10.6.8.
Book Location | Description of Directory or File | File Name |
Chapter 1 | Programming the Anscombe Quartet (Python) | chapter_1_program.py |
Programming the Anscombe Quartet (R) | chapter_1_program.R | |
Chapter 2 | Shaking Our Bobbleheads Yes and No (data) | dodgers.csv |
Shaking Our Bobbleheads Yes and No (Python) | chapter_2_program.py | |
Shaking Our Bobbleheads Yes and No (R) | chapter_2_program.R | |
Chapter 3 | Questions for Conjoint Survey (documentation) | questions_for_survey.txt |
Measuring and Modeling Individual Preferences (data) | mobile_services_ranking.csv | |
Conjoint Analysis Spine Chart (R) | R_utility_program_1.R | |
Measuring and Modeling Individual Preferences (Python) | chapter_3_program.py | |
Measuring and Modeling Individual Preferences (R) | chapter_3_program.R | |
Chapter 4 | Market Basket Analysis of Grocery Store Data (Python) | chapter_4_program.py |
Market Basket Analysis of Grocery Store Data (R) | chapter_4_program.R | |
Chapter 5 | New Orders for Durable Goods (data) | FRED_DGO_data.csv |
Employment Rate (data) | FRED_ER_data.csv | |
Index of Consumer Sentiment (data) | FRED_ICS_data.csv | |
New Homes Sold (data) | FRED_NHS_data.csv | |
Working with Economic Data (Python) | chapter_5_program.py | |
Working with Economic Data (R) | chapter_5_program.R | |
Chapter 6 | Call Center Shifts and Needs for Wednesdays (data) | data_anonymous_bank_shifts.csv |
Call Center Traffic for February (data) | data_anonymous_bank_february.txt | |
Split-plotting Utilities (R) | R_utility_program_3.R | |
Wait-time Ribbon Plot (R) | R_utility_program_4.R | |
Call Center Scheduling (Python) | chapter_6_program.py | |
Call Center Scheduling (R) | chapter_6_program.R | |
Chapter 7 | Movie Taglines Original Data (text data) | taglines_copy_data.txt |
Movie Tagline Data Preparation Script for Text Analysis (R) | R_utility_program_7.R | |
Movie Taglines Parsed Data (text data) | movie_tagline_data_parsed.csv | |
Split-plotting Utilities (R) | R_utility_program_3.R | |
Text Analysis of Movie Taglines (Python) | chapter_7_program.py | |
Text Analysis of Movie Taglines (R) | chapter_7_program.R | |
Chapter 8 | Sentiment Analysis Negative Word List (text data) | Hu_Liu_negative_word_list.txt |
Sentiment Analysis Positive Word List (text data) | Hu_Liu_positive_word_list.txt | |
Directories and Subdiretories of Movie Reviews (text data) | ||
Training Data - Unsupervised/Unrated Reviews |
reviews/train/unsup | |
Training Data - Positive Reviews |
reviews/train/pos | |
Training Data - Negative Reviews |
reviews/train/neg | |
Test Data - Positive Reviews |
reviews/test/pos | |
Test Data - Negative Reviews |
reviews/test/neg | |
Test Data - Tom's Reviews |
reviews/test/tom | |
Split-plotting Utilities (R) | R_utility_program_3.R | |
Initializer Module (Python) | __init__.py | |
Utility Functions (Python) | python_utilities.py | |
Evaluating the Predictive Accuracy of a Binary Classifier |
||
Text Measures for Sentiment Analysis |
||
Summative Scoring of Sentiment |
||
Sentiment Analysis and Classification of Movie Ratings (Python) | chapter_8_program.py | |
Sentiment Analysis and Classification of Movie Ratings (R) | chapter_8_program.R | |
Chapter 9 | Team Winning Probabilities by Simulation (Python) | chapter_9_program.py |
Team Winning Probabilities by Simulation (R) | chapter_9_program.R | |
Chapter 10 | California Housing Values (data) | houses_data.txt |
Regression Models for Spatial Data (Python) | chapter_10_program.py | |
Regression Models for Spatial Data (R) | chapter_10_program.R | |
Chapter 11 | Computer Choice Study (data) | computer_choice_study.csv |
Market Simulation Utilities (R) | R_utility_program_2.R | |
Training and Testing a Hierarchical Bayes Model (R) | chapter_11a_program.R | |
Preference - Choice - and Market Simulation (R) | chapter_11b_program.R | |
Appendix C | Return of the Bobbleheads (data) | bobbleheads.csv |
DriveTime Sedans (data) | drive_time_sedans.csv | |
Two Month's Salary (data) | two_months_salary.csv | |
Wisconsin Dells (data) | wisconsin_dells.csv | |
Computer Choice Study (data) | computer_choice_study.csv | |
Appendix D | Utility Functions (Python) | python_utilities.py |
Evaluating the Predictive Accuracy of a Binary Classifier |
||
Text Measures for Sentiment Analysis |
||
Summative Scoring of Sentiment |
||
Conjoint Analysis Spine Chart (R) | R_utility_program_1.R | |
Market Simulation Utilities (R) | R_utility_program_2.R | |
Split-plotting Utilities (R) | R_utility_program_3.R | |
Wait-time Ribbon Plot (R) | R_utility_program_4.R | |
Text Scoring Script for Sentiment Analysis (R) | R_utility_program_5.R | |
Utilities for Spatial Data Analysis (R) | R_utility_program_6.R | |
Movie Tagline Data Preparation Script for Text Analysis (R) | R_utility_program_7.R | |
Python Code from Book (text data) | mtpa_Python_code.txt | |
R Code from Book (text data) | mtpa_R_code.txt | |
Making Word Clouds (R) | R_utility_program_8.R |