Web and Network Data Science: Modeling Techniques in Predictive Analytics
By Thomas W. Miller
Programs and Data to Accompany "Web and Network Data Science: Modeling Techniques in Predictive Analytics" Miller (2015)
Note that many R programs contain library commands for bringing in R functions included in packages. To run these programs, the user needs to first install the packages in his/her R environment. Likewise for Python programs, many utilize data structures and methods that require the prior installation and importing of Python packages.
R programs were tested under R 3.1.1 on Mac OS 10.6.8. Python programs were tested under Enthought Canopy and Python 2.7 on Mac OS 10.6.8.
Book Location | Description of Directory or File | File Name |
WNDS Chapter 1 | Browser Usage Data | browser_usage_2008_2014.csv |
Analysis of Browser Usage (Python) | wnds_chapter_1.py | |
Analysis of Browser Usage (R) | wnds_chapter_1.R | |
WNDS Chapter 2 | ToutBay Website Traffic Data | toutbay_begins.csv |
Website Traffic Analysis (R) | wnds_chapter_2.R | |
WNDS Chapter 3 | Extracting and Parsing Web Site Data (Python) | wnds_chapter_3a.py |
Extracting and Parsing Web Site Data (R) | wnds_chapter_3a.R | |
Directory for Simple One-Page Web Scraper (Python) | wnds_chapter_3b | |
Directory for Crawling and Scraping while Napping (Python) | wnds_chapter_3c | |
WNDS Chapter 4 | Identifying Keywords for Testing Performance in Search (R) | wnds_chapter_4.R |
Directory of Keywords Data for the Angels | tickets_angels | |
Directory of Keywords Data for the Dodgers | tickets_dodgers | |
WNDS Chapter 5 | Competitive Intelligence: Spirit Airlines Financial Dossier (R) | wnds_chapter_5.R |
WNDS Chapter 6 | Enron E-Mail Network Data | enron_email_links.txt |
Defining and Visualizing Simple Networks (Python) | wnds_chapter_6a.py | |
Defining and Visualizing Simple Networks (R) | wnds_chapter_6a.R | |
Visualizing Networks-Understanding Organizations (R) | wnds_chapter_6b.R | |
WNDS Chapter 7 | Correlation Heat Map Utility (R) | correlation_heat_map_utility.R |
Wikipedia Votes Data | wiki_edges.txt | |
Networks Models and Measures (R) | wnds_chapter_7a.R | |
Methods of Sampling from Large Networks (R) | wnds_chapter_7b.R | |
WNDS Chapter 8 | Sentiment Analysis Negative Word List (text data) | Hu_Liu_negative_word_list.txt |
Sentiment Analysis Positive Word List (text data) | Hu_Liu_positive_word_list.txt | |
Directories and Subdiretories of Movie Reviews (text data) | ||
Training Data - Unsupervised/Unrated Reviews |
reviews/train/unsup | |
Training Data - Positive Reviews |
reviews/train/pos | |
Training Data - Negative Reviews |
reviews/train/neg | |
Test Data - Positive Reviews |
reviews/test/pos | |
Test Data - Negative Reviews |
reviews/test/neg | |
Test Data - Tom's Reviews |
reviews/test/tom | |
Split-plotting Utilities (R) | R_utility_program_3.R | |
Text Scoring Script for Sentiment Analysis (R) | R_utility_program_5.R | |
Initializer Module (Python) | __init__.py | |
Utility Functions (Python) | python_utilities.py | |
Evaluating the Predictive Accuracy of a Binary Classifier |
||
Text Measures for Sentiment Analysis |
||
Summative Scoring of Sentiment |
||
Sentiment Analysis and Classification of Movie Ratings (Python) | wnds_chapter_8_program.py | |
Sentiment Analysis and Classification of Movie Ratings (R) | wnds_chapter_8_program.R | |
WNDS Chapter 9 | Directory of POTUS Speeches Data Organized by President Name (Oral Addresses Kennedy through Obama) | ALL_POTUS |
Directory of PUTUS Speeches Data (Oral Addresses Kennedy through Obama) | POTUS | |
Discovering Common Themes: POTUS Speeches (Python) | wnds_chapter_9a.py | |
Multidimensional Scaling Results | POTUS_mds.csv | |
Making Word Clouds: POTUS Speeches (R) | wnds_chapter_9b.R | |
From Text Measures to Text Maps: POTUS Speeches (R) | wnds_chapter_9c.R | |
WNDS Chapter 10 | Anonymous Microsoft Web Attribute Data | microsoft_attribute_data.csv |
Anonymous Microsoft Web Test Data | microsoft_test_data.csv | |
Anonymous Microsoft Web Training Data | microsoft_training_data.csv | |
From Rules to Recommendations: The Microsoft Case (R) | wnds_chapter_10.R | |
Anonymous Microsoft Web Data Organized as Transactions (partial output from wnds_chapter10.R) | microsoft_training_transactions.csv | |
WNDS Chapter 11 | Directory of NetLogo Simulation Results | NetLogo_results |
NetLogo Results Data | virus_results.csv | |
Analysis of Agent-Based Simulation Results (Python) | wnds_chapter_11.py | |
Analysis of Agent-Based Simulation Results (R) | wnds_chapter_11.R | |
WNDS Appendix C | E-Mail or Spam Case Study Data | email_or_spam.csv |
ToutBay Website Traffic Data | toutbay_begins.csv | |
Enron E-Mail Network Data | enron_email_links.txt | |
Directory of POTUS State of the Union Addresses (Oral and written, all Presidents) | POTUS_COMPLETE | |
Directory of POTUS Speeches Data Organized by President Name (Oral Addresses Kennedy through Obama) | ALL_POTUS | |
Directory of PUTUS Speeches Data (Oral Addresses Kennedy through Obama) | POTUS | |
Directory of Keywords Data for the Angels | tickets_angels | |
Directory of Keywords Data for the Dodgers | tickets_dodgers | |
Wikipedia Votes Case Study Data | wiki_edges.txt | |
Anonymous Microsoft Web Attribute Data | microsoft_attribute_data.csv | |
Anonymous Microsoft Web Test Data | microsoft_test_data.csv | |
Anonymous Microsoft Web Training Data | microsoft_training_data.csv | |
WNDS Appendix D | D Utility Functions (Python) | python_utilities.py |
Evaluating the Predictive Accuracy of a Binary Classifier |
||
Text Measures for Sentiment Analysis |
||
Summative Scoring of Sentiment |
||
Split-plotting Utilities (R) | R_utility_program_3.R | |
Text Scoring Script for Sentiment Analysis (R) | R_utility_program_5.R | |
Correlation Heat Map Utility (R) | correlation_heat_map_utility.R |