Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data ScienceModeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science
By Thomas W. Miller

 
Programs and Data to Accompany "Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R (Revised and Expanded Edition)" Miller (2015) and "Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science" Miller (2015)

Note that many R programs contain library commands for bringing in R functions included in packages. To run these programs, the user needs to first install the packages in his/her R environment. Likewise for Python programs, many utilize data structures and methods that require the prior installation and importing of Python packages.

R programs were tested under R 3.1.1 on Mac OS 10.6.8. Python programs were tested under Enthought Canopy and Python 2.7 on Mac OS 10.6.8.


Book Location Description of Directory or File File Name
Chapter 1 Programming the Anscombe Quartet (Python) chapter_1_program.py
  Programming the Anscombe Quartet (R) chapter_1_program.R
     
Chapter 2 Shaking Our Bobbleheads Yes and No (data) dodgers.csv
  Shaking Our Bobbleheads Yes and No (Python) chapter_2_program.py
  Shaking Our Bobbleheads Yes and No (R) chapter_2_program.R
     
Chapter 3 Questions for Conjoint Survey (documentation) questions_for_survey.txt
  Measuring and Modeling Individual Preferences (data) mobile_services_ranking.csv
  Conjoint Analysis Spine Chart (R) R_utility_program_1.R
  Measuring and Modeling Individual Preferences (Python) chapter_3_program.py
  Measuring and Modeling Individual Preferences (R) chapter_3_program.R
     
Chapter 4 Market Basket Analysis of Grocery Store Data (Python) chapter_4_program.py
  Market Basket Analysis of Grocery Store Data (R) chapter_4_program.R
     
Chapter 5 New Orders for Durable Goods (data) FRED_DGO_data.csv
  Employment Rate (data) FRED_ER_data.csv
  Index of Consumer Sentiment (data) FRED_ICS_data.csv
  New Homes Sold (data) FRED_NHS_data.csv
  Working with Economic Data (Python) chapter_5_program.py
  Working with Economic Data (R) chapter_5_program.R
     
Chapter 6 Call Center Shifts and Needs for Wednesdays (data) data_anonymous_bank_shifts.csv
  Call Center Traffic for February (data) data_anonymous_bank_february.txt
  Split-plotting Utilities (R) R_utility_program_3.R
  Wait-time Ribbon Plot (R) R_utility_program_4.R
  Call Center Scheduling (Python) chapter_6_program.py
  Call Center Scheduling (R) chapter_6_program.R
     
Chapter 7 Movie Taglines Original Data (text data) taglines_copy_data.txt
  Movie Tagline Data Preparation Script for Text Analysis (R) R_utility_program_7.R
  Movie Taglines Parsed Data (text data) movie_tagline_data_parsed.csv
  Split-plotting Utilities (R) R_utility_program_3.R
  Text Analysis of Movie Taglines (Python) chapter_7_program.py
  Text Analysis of Movie Taglines (R) chapter_7_program.R
     
Chapter 8 Sentiment Analysis Negative Word List (text data) Hu_Liu_negative_word_list.txt
  Sentiment Analysis Positive Word List (text data) Hu_Liu_positive_word_list.txt
  Directories and Subdiretories of Movie Reviews (text data)  
 
Training Data - Unsupervised/Unrated Reviews
reviews/train/unsup
 
Training Data - Positive Reviews
reviews/train/pos
 
Training Data - Negative Reviews
reviews/train/neg
 
Test Data - Positive Reviews
reviews/test/pos
 
Test Data - Negative Reviews
reviews/test/neg
 
Test Data - Tom's Reviews
reviews/test/tom
  Split-plotting Utilities (R) R_utility_program_3.R
  Initializer Module (Python) __init__.py
  Utility Functions (Python) python_utilities.py
 
Evaluating the Predictive Accuracy of a Binary Classifier
 
 
Text Measures for Sentiment Analysis
 
 
Summative Scoring of Sentiment
 
  Sentiment Analysis and Classification of Movie Ratings (Python) chapter_8_program.py
  Sentiment Analysis and Classification of Movie Ratings (R) chapter_8_program.R
     
Chapter 9 Team Winning Probabilities by Simulation (Python) chapter_9_program.py
  Team Winning Probabilities by Simulation (R) chapter_9_program.R
     
Chapter 10 California Housing Values (data) houses_data.txt
  Regression Models for Spatial Data (Python) chapter_10_program.py
  Regression Models for Spatial Data (R) chapter_10_program.R
     
Chapter 11 Computer Choice Study (data) computer_choice_study.csv
  Market Simulation Utilities (R) R_utility_program_2.R
  Training and Testing a Hierarchical Bayes Model (R) chapter_11a_program.R
  Preference - Choice - and Market Simulation (R) chapter_11b_program.R
     
Appendix C Return of the Bobbleheads (data) bobbleheads.csv
  DriveTime Sedans (data) drive_time_sedans.csv
  Two Month's Salary (data) two_months_salary.csv
  Wisconsin Dells (data) wisconsin_dells.csv
  Computer Choice Study (data) computer_choice_study.csv
     
Appendix D Utility Functions (Python) python_utilities.py
 
Evaluating the Predictive Accuracy of a Binary Classifier
 
 
Text Measures for Sentiment Analysis
 
 
Summative Scoring of Sentiment
 
  Conjoint Analysis Spine Chart (R) R_utility_program_1.R
  Market Simulation Utilities (R) R_utility_program_2.R
  Split-plotting Utilities (R) R_utility_program_3.R
  Wait-time Ribbon Plot (R) R_utility_program_4.R
  Text Scoring Script for Sentiment Analysis (R) R_utility_program_5.R
  Utilities for Spatial Data Analysis (R) R_utility_program_6.R
  Movie Tagline Data Preparation Script for Text Analysis (R) R_utility_program_7.R
  Python Code from Book (text data) mtpa_Python_code.txt
  R Code from Book (text data) mtpa_R_code.txt
  Making Word Clouds (R) R_utility_program_8.R