Hugendubel.info - Die B2B Online-Buchhandlung 

Merkliste
Die Merkliste ist leer.
Bitte warten - die Druckansicht der Seite wird vorbereitet.
Der Druckdialog öffnet sich, sobald die Seite vollständig geladen wurde.
Sollte die Druckvorschau unvollständig sein, bitte schliessen und "Erneut drucken" wählen.

Practical Machine Learning with Python

E-BookPDF1 - PDF WatermarkE-Book
530 Seiten
Englisch
Apresserschienen am20.12.20171st ed
Master the essential skills needed to recognize and solve complex problems with machine learning and deep learning. Using real-world examples that leverage the popular Python machine learning ecosystem, this book is your perfect companion for learning the art and science of machine learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute machine learning systems and projects successfully.
Practical Machine Learning with Python follows a structured and comprehensive three-tiered approach packed with hands-on examples and code.


Part 1 focuses on understanding machine learning concepts and tools. This includes machine learning basics with a broad overview of algorithms, techniques, concepts and applications, followed by a tour of the entire Python machine learning ecosystem. Brief guides for useful machine learning tools, libraries and frameworks are also covered.


Part 2 details standard machine learning pipelines, with an emphasis on data processing analysis, feature engineering, and modeling. You will learn how to process, wrangle, summarize and visualize data in its various forms. Feature engineering and selection methodologies will be covered in detail with real-world datasets followed by model building, tuning, interpretation and deployment.






Part 3 explores multiple real-world case studies spanning diverse domains and industries like retail, transportation, movies, music, marketing, computer vision and finance. For each case study, you will learn the application of various machine learning techniques and methods. The hands-on examples will help you become familiar with state-of-the-art machine learning tools and techniques and understand what algorithms are best suited for any problem.









Practical Machine Learning with Python will empower you to start solving your own problems with machine learning today!

What You'll Learn



Execute end-to-end machine learning projects and systems

Implement hands-on examples with industry standard, open source, robust machine learning tools and frameworks

Review case studies depicting applications of machine learning and deep learning on diverse domains and industries
Apply a wide range of machine learning models including regression, classification, and clustering.
Understand and apply the latest models and methodologies from deep learning including CNNs, RNNs, LSTMs and transfer learning.





Who This Book Is For




IT professionals, analysts, developers, data scientists, engineers, graduate students









Dipanjan Sarkar is a Data Scientist at Intel, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in Information Technology with specializations in Data Science and Software Engineering from the International Institute of Information Technology, Bangalore. He is also an avid supporter of self-learning, especially Massive Open Online Courses and also holds a Data Science Specialization from Johns Hopkins University on Coursera. 


Dipanjan has been an analytics practitioner for several years, specializing in statistical, predictive, and text analytics. Having a passion for data science and education, he is a Data Science Mentor at Springboard, helping people up-skill on areas like Data Science and Machine Learning. Dipanjan has also authored several books on R, Python, Machine Learning and Analytics, including Text Analytics with Python, Apress 2016. Besides this, he occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. 




 Raghav Bali has a master's degree (gold medalist) in Information
Technology from International Institute of Information Technology, Bangalore. He is a Data Scientist at Intel, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has also worked as an analyst and developer in domains such as ERP, finance, and BI with some of the leading organizations in the world.


Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He has also authored several books on R, Machine Learning and Analytics. He is a shutterbug, capturing moments when he isn't busy solving problems.


Tushar Sharma has a master's degree from International Institute of Information Technology, Bangalore. He works as a Data Scientist with Intel. His work involves developing analytical solutions at scale using enormous volumes of infrastructure data. In his previous role, he has worked in the financial domain developing scalable machine learning solutions for major financial organizations. He is proficient in Python, R and Big Data frameworks like Spark and Hadoop.


Apart from work Tushar enjoys watching movies, playing badminton and is an avid reader. He has also authored a book on R and social media analytics.
mehr
Verfügbare Formate
BuchKartoniert, Paperback
EUR80,24
E-BookPDF1 - PDF WatermarkE-Book
EUR79,99

Produkt

KlappentextMaster the essential skills needed to recognize and solve complex problems with machine learning and deep learning. Using real-world examples that leverage the popular Python machine learning ecosystem, this book is your perfect companion for learning the art and science of machine learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute machine learning systems and projects successfully.
Practical Machine Learning with Python follows a structured and comprehensive three-tiered approach packed with hands-on examples and code.


Part 1 focuses on understanding machine learning concepts and tools. This includes machine learning basics with a broad overview of algorithms, techniques, concepts and applications, followed by a tour of the entire Python machine learning ecosystem. Brief guides for useful machine learning tools, libraries and frameworks are also covered.


Part 2 details standard machine learning pipelines, with an emphasis on data processing analysis, feature engineering, and modeling. You will learn how to process, wrangle, summarize and visualize data in its various forms. Feature engineering and selection methodologies will be covered in detail with real-world datasets followed by model building, tuning, interpretation and deployment.






Part 3 explores multiple real-world case studies spanning diverse domains and industries like retail, transportation, movies, music, marketing, computer vision and finance. For each case study, you will learn the application of various machine learning techniques and methods. The hands-on examples will help you become familiar with state-of-the-art machine learning tools and techniques and understand what algorithms are best suited for any problem.









Practical Machine Learning with Python will empower you to start solving your own problems with machine learning today!

What You'll Learn



Execute end-to-end machine learning projects and systems

Implement hands-on examples with industry standard, open source, robust machine learning tools and frameworks

Review case studies depicting applications of machine learning and deep learning on diverse domains and industries
Apply a wide range of machine learning models including regression, classification, and clustering.
Understand and apply the latest models and methodologies from deep learning including CNNs, RNNs, LSTMs and transfer learning.





Who This Book Is For




IT professionals, analysts, developers, data scientists, engineers, graduate students









Dipanjan Sarkar is a Data Scientist at Intel, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in Information Technology with specializations in Data Science and Software Engineering from the International Institute of Information Technology, Bangalore. He is also an avid supporter of self-learning, especially Massive Open Online Courses and also holds a Data Science Specialization from Johns Hopkins University on Coursera. 


Dipanjan has been an analytics practitioner for several years, specializing in statistical, predictive, and text analytics. Having a passion for data science and education, he is a Data Science Mentor at Springboard, helping people up-skill on areas like Data Science and Machine Learning. Dipanjan has also authored several books on R, Python, Machine Learning and Analytics, including Text Analytics with Python, Apress 2016. Besides this, he occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. 




 Raghav Bali has a master's degree (gold medalist) in Information
Technology from International Institute of Information Technology, Bangalore. He is a Data Scientist at Intel, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has also worked as an analyst and developer in domains such as ERP, finance, and BI with some of the leading organizations in the world.


Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He has also authored several books on R, Machine Learning and Analytics. He is a shutterbug, capturing moments when he isn't busy solving problems.


Tushar Sharma has a master's degree from International Institute of Information Technology, Bangalore. He works as a Data Scientist with Intel. His work involves developing analytical solutions at scale using enormous volumes of infrastructure data. In his previous role, he has worked in the financial domain developing scalable machine learning solutions for major financial organizations. He is proficient in Python, R and Big Data frameworks like Spark and Hadoop.


Apart from work Tushar enjoys watching movies, playing badminton and is an avid reader. He has also authored a book on R and social media analytics.
Details
Weitere ISBN/GTIN9781484232071
ProduktartE-Book
EinbandartE-Book
FormatPDF
Format Hinweis1 - PDF Watermark
FormatE107
Verlag
Erscheinungsjahr2017
Erscheinungsdatum20.12.2017
Auflage1st ed
Seiten530 Seiten
SpracheEnglisch
IllustrationenXXV, 530 p.
Artikel-Nr.2545822
Rubriken
Genre9200

Inhalt/Kritik

Inhaltsverzeichnis
1;Contents;5
2;About the Authors;16
3;About the Technical Reviewer;18
4;Acknowledgments;19
5;Foreword;21
6;Introduction;22
7;Part I: Understanding Machine Learning;23
7.1;Chapter 1: Machine Learning Basics;24
7.1.1;The Need for Machine Learning;25
7.1.1.1;Making Data-Driven Decisions;25
7.1.1.2;Efficiency and Scale;26
7.1.1.3; Traditional Programming Paradigm;26
7.1.1.4;Why Machine Learning?;27
7.1.2;Understanding Machine Learning;29
7.1.2.1;Why Make Machines Learn?;29
7.1.2.2;Formal Definition;30
7.1.2.2.1;Defining the Task, T;31
7.1.2.2.2; Defining the Experience, E;33
7.1.2.2.3;Defining the Performance, P;33
7.1.2.3;A Multi-Disciplinary Field;34
7.1.3;Computer Science;35
7.1.3.1;Theoretical Computer Science;36
7.1.3.2; Practical Computer Science;36
7.1.3.3;Important Concepts;36
7.1.3.3.1; Algorithms;36
7.1.3.3.2;Programming Languages;37
7.1.3.3.3; Code;37
7.1.3.3.4;Data Structures;37
7.1.4; Data Science;37
7.1.5;Mathematics;39
7.1.5.1;Important Concepts;40
7.1.5.1.1; Scalar;40
7.1.5.1.2;Vector;40
7.1.5.1.3;Matrix;40
7.1.5.1.4;Tensor;42
7.1.5.1.5;Norm;42
7.1.5.1.6;Eigen Decomposition;42
7.1.5.1.7;Singular Value Decomposition;43
7.1.5.1.8;Random Variable;44
7.1.5.1.9;Probability Distribution;44
7.1.5.1.10; Probability Mass Function;44
7.1.5.1.11; Probability Density Function;44
7.1.5.1.12;Marginal Probability;44
7.1.5.1.13;Conditional Probability;44
7.1.5.1.14;Bayes Theorem;45
7.1.6;Statistics;45
7.1.7;Data Mining;46
7.1.8;Artificial Intelligence;46
7.1.9;Natural Language Processing;47
7.1.10; Deep Learning;49
7.1.10.1;Important Concepts;52
7.1.10.1.1;Artificial Neural Networks;52
7.1.10.1.2; Backpropagation;53
7.1.10.1.3;Multilayer Perceptrons;53
7.1.10.1.4;Convolutional Neural Networks;53
7.1.10.1.5; Recurrent Neural Networks;54
7.1.10.1.6;Long Short-Term Memory Networks;55
7.1.10.1.7; Autoencoders;55
7.1.11;Machine Learning Methods;55
7.1.12; Supervised Learning;56
7.1.12.1;Classification;57
7.1.12.2;Regression;58
7.1.13; Unsupervised Learning;59
7.1.13.1; Clustering;60
7.1.13.2;Dimensionality Reduction;61
7.1.13.3;Anomaly Detection;62
7.1.13.4;Association Rule-Mining;62
7.1.14;Semi-Supervised Learning;63
7.1.15; Reinforcement Learning;63
7.1.16;Batch Learning;64
7.1.17;Online Learning;65
7.1.18;Instance Based Learning;65
7.1.19;Model Based Learning;66
7.1.20;The CRISP-DM Process Model;66
7.1.20.1;Business Understanding;67
7.1.20.1.1;Define Business Problem;68
7.1.20.1.2; Assess and Analyze Scenarios;68
7.1.20.1.3;Define Data Mining Problem;69
7.1.20.1.4;Project Plan;69
7.1.20.2;Data Understanding;69
7.1.20.2.1;Data Collection;69
7.1.20.2.2;Data Description;70
7.1.20.2.3;Exploratory Data Analysis;70
7.1.20.2.4;Data Quality Analysis;70
7.1.20.3;Data Preparation;71
7.1.20.3.1;Data Integration;71
7.1.20.3.2; Data Wrangling;71
7.1.20.3.3;Attribute Generation and Selection;71
7.1.20.4;Modeling;72
7.1.20.4.1;Selecting Modeling Techniques;72
7.1.20.4.2;Model Building;72
7.1.20.4.3;Model Evaluation and Tuning;72
7.1.20.4.4;Model Assessment;72
7.1.20.5;Evaluation;73
7.1.20.6;Deployment;73
7.1.21;Building Machine Intelligence;73
7.1.21.1;Machine Learning Pipelines;73
7.1.21.2; Supervised Machine Learning Pipeline;75
7.1.21.3;Unsupervised Machine Learning Pipeline;76
7.1.22;Real-World Case Study: Predicting Student Grant Recommendations;76
7.1.22.1;Objective;77
7.1.22.2;Data Retrieval;77
7.1.22.3;Data Preparation;78
7.1.22.3.1;Feature Extraction and Engineering;78
7.1.22.4;Modeling;81
7.1.22.5;Model Evaluation;82
7.1.22.6;Model Deployment;82
7.1.22.7;Prediction in Action;83
7.1.23;Challenges in Machine Learning;85
7.1.24;Real-World Applications of Machine Learning;85
7.1.25;Summary;86
7.2;Chapter 2: The Python Machine Learning Ecosystem;87
7.2.1;Python: An Introduction;87
7.2.1.1;Strengths;88
7.2.1.2;Pitfalls;88
7.2.1.3;Setting Up a Python Environment;89
7.2.1.3.1; Set Up Anaconda Python Environment;89
7.2.1.3.2; Installing Libraries;91
7.2.1.4;Why Python for Data Science?;91
7.2.1.4.1;Powerful Set of Packages;91
7.2.1.4.2;Easy and Rapid Prototyping;91
7.2.1.4.3;Easy to Collaborate;92
7.2.1.4.4;One-Stop Solution;92
7.2.1.4.5;Large and Active Community Support;92
7.2.2;Introducing the Python Machine Learning Ecosystem;92
7.2.2.1; Jupyter Notebooks;92
7.2.2.1.1;Installation and Execution;93
7.2.2.2;NumPy;95
7.2.2.2.1; Numpy ndarray;95
7.2.2.2.2; Creating Arrays;96
7.2.2.2.3;Accessing Array Elements;97
7.2.2.2.3.1;Basic Indexing and Slicing;97
7.2.2.2.3.2;Advanced Indexing;99
7.2.2.2.4;Operations on Arrays;100
7.2.2.2.5;Linear Algebra Using numpy;102
7.2.2.3;Pandas;104
7.2.2.3.1;Data Structures of Pandas;104
7.2.2.3.2;Series;104
7.2.2.3.3; Dataframe;104
7.2.2.3.4;Data Retrieval;105
7.2.2.3.4.1; List of Dictionaries to Dataframe;105
7.2.2.3.4.2;CSV Files to Dataframe;105
7.2.2.3.4.3;Databases to Dataframe;107
7.2.2.3.5;Data Access;107
7.2.2.3.5.1; Head and Tail;107
7.2.2.3.5.2;Slicing and Dicing;108
7.2.2.3.6;Data Operations;111
7.2.2.3.6.1;Values Attribute;111
7.2.2.3.6.2;Missing Data and the fillna Function;111
7.2.2.3.6.3;Descriptive Statistics Functions;112
7.2.2.3.6.4;Concatenating Dataframes;114
7.2.2.3.6.4.1;Concatenating Using the concat Method;114
7.2.2.3.6.4.2;Database Style Concatenations Using the merge Command;115
7.2.2.4;Scikit-learn;116
7.2.2.4.1;Core APIs;117
7.2.2.4.2;Advanced APIs;118
7.2.2.4.3; Scikit-learn Example: Regression Models;119
7.2.2.4.3.1;The Dataset;119
7.2.2.5;Neural Networks and Deep Learning;122
7.2.2.5.1;Artificial Neural Networks;122
7.2.2.5.2;Deep Neural Networks;124
7.2.2.5.2.1; Number of Layers;124
7.2.2.5.2.2; Diverse Architectures;124
7.2.2.5.2.3; Computation Power;124
7.2.2.5.3; Python Libraries for Deep Learning;124
7.2.2.5.3.1; Theano;125
7.2.2.5.3.1.1;Installation;125
7.2.2.5.3.1.2;Theano Basics (Barebones Version);125
7.2.2.5.3.2; Tensorflow;127
7.2.2.5.3.2.1;Installation;127
7.2.2.5.3.3; Keras;128
7.2.2.5.3.3.1;Installation;128
7.2.2.5.3.3.2;Keras Basics;128
7.2.2.5.3.3.3; Model Building;129
7.2.2.5.3.3.4;Learning an Example Neural Network;129
7.2.2.5.3.3.5;The Power of Deep Learning;131
7.2.2.6;Text Analytics and Natural Language Processing;132
7.2.2.6.1; The Natural Language Tool Kit;133
7.2.2.6.1.1; Installation and Introduction;133
7.2.2.6.1.2; Corpora;134
7.2.2.6.1.3;Tokenization;134
7.2.2.6.1.4;Tagging;134
7.2.2.6.1.5;Stemming and Lemmatization;134
7.2.2.6.1.6;Chunking;135
7.2.2.6.1.7; Sentiment;135
7.2.2.6.1.8; Classification/Clustering;135
7.2.2.6.2;Other Text Analytics Frameworks;135
7.2.2.7;Statsmodels;136
7.2.2.7.1;Installation;136
7.2.2.7.2;Modules;136
7.2.2.7.2.1; Distributions;136
7.2.2.7.2.2; Linear Regression;137
7.2.2.7.2.3; Generalized Linear Models;137
7.2.2.7.2.4;ANOVA;137
7.2.2.7.2.5; Time Series Analysis;137
7.2.2.7.2.6;Statistical Inference;137
7.2.2.7.2.7; Nonparametric Methods;137
7.2.3;Summary;138
8;Part II: The Machine Learning Pipeline;139
8.1;Chapter 3: Processing, Wrangling, and Visualizing Data;140
8.1.1; Data Collection;141
8.1.1.1; CSV;141
8.1.1.2; JSON;143
8.1.1.3;XML;147
8.1.1.4; HTML and Scraping;150
8.1.1.4.1; HTML;150
8.1.1.4.2;Web Scraping;151
8.1.1.5; SQL;155
8.1.2; Data Description;156
8.1.2.1;Numeric;156
8.1.2.2;Text;156
8.1.2.3;Categorical;156
8.1.3; Data Wrangling;157
8.1.3.1; Understanding Data;157
8.1.3.2; Filtering Data;160
8.1.3.3;Typecasting;163
8.1.3.4;Transformations;163
8.1.3.5;Imputing Missing Values;164
8.1.3.6; Handling Duplicates;166
8.1.3.7;Handling Categorical Data;166
8.1.3.8; Normalizing Values;167
8.1.3.9;String Manipulations;168
8.1.4;Data Summarization;168
8.1.5;Data Visualization;170
8.1.5.1;Visualizing with Pandas;171
8.1.5.1.1;Line Charts;171
8.1.5.1.2;Bar Plots;173
8.1.5.1.3;Histograms;174
8.1.5.1.4;Pie Charts;175
8.1.5.1.5;Box Plots;176
8.1.5.1.6;Scatter Plots;177
8.1.5.2;Visualizing with Matplotlib;180
8.1.5.2.1;Figures and Subplots;181
8.1.5.2.2;Plot Formatting;186
8.1.5.2.3;Legends;189
8.1.5.2.4; Axis Controls;191
8.1.5.2.5; Annotations;194
8.1.5.2.6;Global Parameters;195
8.1.5.3; Python Visualization Ecosystem;195
8.1.6;Summary;195
8.2;Chapter 4: Feature Engineering and Selection;196
8.2.1; Features: Understand Your Data Better;197
8.2.1.1; Data and Datasets;197
8.2.1.2;Features;198
8.2.1.3;Models;198
8.2.2;Revisiting the Machine Learning Pipeline;198
8.2.3; Feature Extraction and Engineering;200
8.2.3.1; What Is Feature Engineering?;200
8.2.3.2;Why Feature Engineering?;202
8.2.3.3;How Do You Engineer Features?;203
8.2.4;Feature Engineering on Numeric Data;204
8.2.4.1;Raw Measures;204
8.2.4.1.1; Values;205
8.2.4.1.2; Counts;206
8.2.4.2;Binarization;206
8.2.4.3;Rounding;207
8.2.4.4; Interactions;208
8.2.4.5; Binning;210
8.2.4.5.1;Fixed-Width Binning;211
8.2.4.5.2; Adaptive Binning;213
8.2.4.6;Statistical Transformations;216
8.2.4.6.1;Log Transform;216
8.2.4.6.2; Box-Cox Transform;217
8.2.5;Feature Engineering on Categorical Data;219
8.2.5.1;Transforming Nominal Features;220
8.2.5.2; Transforming Ordinal Features;221
8.2.5.3;Encoding Categorical Features;222
8.2.5.3.1;One Hot Encoding Scheme;222
8.2.5.3.2;Dummy Coding Scheme;225
8.2.5.3.3; Effect Coding Scheme;226
8.2.5.3.4;Bin-Counting Scheme;227
8.2.5.3.5; Feature Hashing Scheme;227
8.2.6;Feature Engineering on Text Data;228
8.2.6.1;Text Pre-Processing;229
8.2.6.2; Bag of Words Model;230
8.2.6.3;Bag of N-Grams Model;231
8.2.6.4;TF-IDF Model;232
8.2.6.5;Document Similarity;233
8.2.6.6; Topic Models;235
8.2.6.7;Word Embeddings;236
8.2.7;Feature Engineering on Temporal Data;239
8.2.7.1;Date-Based Features;240
8.2.7.2; Time-Based Features;241
8.2.8; Feature Engineering on Image Data;243
8.2.8.1; Image Metadata Features;244
8.2.8.2;Raw Image and Channel Pixels;244
8.2.8.3;Grayscale Image Pixels;246
8.2.8.4; Binning Image Intensity Distribution;246
8.2.8.5;Image Aggregation Statistics;247
8.2.8.6; Edge Detection;248
8.2.8.7;Object Detection;249
8.2.8.8;Localized Feature Extraction;250
8.2.8.9;Visual Bag of Words Model;252
8.2.8.10;Automated Feature Engineering with Deep Learning;255
8.2.9;Feature Scaling;258
8.2.9.1;Standardized Scaling;259
8.2.9.2; Min-Max Scaling;259
8.2.9.3; Robust Scaling;260
8.2.10;Feature Selection;261
8.2.10.1;Threshold-Based Methods;262
8.2.10.2; Statistical Methods;263
8.2.10.3;Recursive Feature Elimination;266
8.2.10.4; Model-Based Selection;267
8.2.11;Dimensionality Reduction;268
8.2.11.1; Feature Extraction with Principal Component Analysis;269
8.2.12;Summary;271
8.3;Chapter 5: Building, Tuning, and Deploying Models;273
8.3.1; Building Models;274
8.3.1.1; Model Types;275
8.3.1.1.1; Classification Models;275
8.3.1.1.2; Regression Models;276
8.3.1.1.3; Clustering Models;277
8.3.1.2; Learning a Model;278
8.3.1.2.1; Three Stages of Machine Learning;279
8.3.1.2.1.1; Representation;279
8.3.1.2.1.2; Evaluation;279
8.3.1.2.1.3; Optimization;279
8.3.1.2.2; The Three Stages of Logistic Regression;280
8.3.1.2.2.1; Representation;280
8.3.1.2.2.2; Evaluation;280
8.3.1.2.2.3; Optimization;281
8.3.1.3; Model Building Examples;281
8.3.1.3.1; Classification;282
8.3.1.3.2; Clustering;284
8.3.1.3.2.1; Partition Based Clustering;285
8.3.1.3.2.2; Hierarchical Clustering;287
8.3.2; Model Evaluation;289
8.3.2.1; Evaluating Classification Models;289
8.3.2.1.1; Confusion Matrix;290
8.3.2.1.1.1; Understanding the Confusion Matrix;290
8.3.2.1.1.2; Performance Metrics;292
8.3.2.1.2; Receiver Operating Characteristic Curve;294
8.3.2.2; Evaluating Clustering Models;296
8.3.2.2.1; External Validation;296
8.3.2.2.2; Internal Validation;297
8.3.2.2.2.1; Silhouette Coefficient;298
8.3.2.2.2.2; Calinski-Harabaz Index;298
8.3.2.3; Evaluating Regression Models;299
8.3.2.3.1; Coefficient of Determination or R2;299
8.3.2.3.2; Mean Squared Error;300
8.3.3; Model Tuning;300
8.3.3.1; Introduction to Hyperparameters;301
8.3.3.1.1; Decision Trees;301
8.3.3.2; The Bias-Variance Tradeoff;302
8.3.3.2.1; Extreme Cases of Bias-Variance;305
8.3.3.2.1.1; Underfitting;305
8.3.3.2.1.2; Overfitting;305
8.3.3.2.2; The Tradeoff;305
8.3.3.3; Cross Validation;306
8.3.3.3.1; Cross-Validation Strategies;308
8.3.3.3.1.1; Leave One Out CV;309
8.3.3.3.1.2; K-Fold CV;309
8.3.3.4; Hyperparameter Tuning Strategies;309
8.3.3.4.1; Grid Search;309
8.3.3.4.2; Randomized Search;312
8.3.4; Model Interpretation;313
8.3.4.1; Understanding Skater;315
8.3.4.2; Model Interpretation in Action;316
8.3.5; Model Deployment;320
8.3.5.1; Model Persistence;320
8.3.5.2; Custom Development;321
8.3.5.3; In-House Model Deployment;321
8.3.5.4; Model Deployment as a Service;322
8.3.6; Summary;322
9;Part III: Real-World Case Studies;323
9.1;Chapter 6: Analyzing Bike Sharing Trends;324
9.1.1; The Bike Sharing Dataset;324
9.1.2;Problem Statement;325
9.1.3; Exploratory Data Analysis;325
9.1.3.1; Preprocessing;325
9.1.3.2;Distribution and Trends;327
9.1.3.3;Outliers;329
9.1.3.4;Correlations;331
9.1.4; Regression Analysis;332
9.1.4.1;Types of Regression;332
9.1.4.2; Assumptions;333
9.1.4.3;Evaluation Criteria;333
9.1.4.3.1; Residual Analysis;333
9.1.4.3.2; Normality Test (Q-Q Plot);333
9.1.4.3.3; R-Squared: Goodness of Fit;334
9.1.4.3.4;Cross Validation;334
9.1.5;Modeling;334
9.1.5.1;Linear Regression;336
9.1.5.1.1;Training;337
9.1.5.1.2;Testing;338
9.1.5.2;Decision Tree Based Regression;340
9.1.5.2.1; Node Splitting;341
9.1.5.2.2; Stopping Criteria;342
9.1.5.2.3;Hyperparameters;342
9.1.5.2.4;Decision Tree Algorithms;342
9.1.5.2.5;Training;343
9.1.5.2.6; Testing;346
9.1.6;Next Steps;347
9.1.7; Summary;347
9.2;Chapter 7: Analyzing Movie Reviews Sentiment;348
9.2.1; Problem Statement;349
9.2.2; Setting Up Dependencies;349
9.2.3;Getting the Data;350
9.2.4; Text Pre-Processing and Normalization;350
9.2.5; Unsupervised Lexicon-Based Models;353
9.2.5.1;Bing Liu s Lexicon;354
9.2.5.2;MPQA Subjectivity Lexicon;354
9.2.5.3;Pattern Lexicon;355
9.2.5.4; AFINN Lexicon;355
9.2.5.5; SentiWordNet Lexicon;357
9.2.5.6; VADER Lexicon;359
9.2.6;Classifying Sentiment with Supervised Learning;362
9.2.7; Traditional Supervised Machine Learning Models;363
9.2.8;Newer Supervised Deep Learning Models;366
9.2.9; Advanced Supervised Deep Learning Models;372
9.2.10;Analyzing Sentiment Causation;380
9.2.10.1; Interpreting Predictive Models;380
9.2.10.2; Analyzing Topic Models;385
9.2.11; Summary;389
9.3;Chapter 8: Customer Segmentation and Effective Cross Selling;390
9.3.1; Online Retail Transactions Dataset;391
9.3.2; Exploratory Data Analysis;391
9.3.3;Customer Segmentation;395
9.3.3.1; Objectives;395
9.3.3.1.1;Customer Understanding;395
9.3.3.1.2; Target Marketing;395
9.3.3.1.3;Optimal Product Placement;395
9.3.3.1.4;Finding Latent Customer Segments;396
9.3.3.1.5; Higher Revenue;396
9.3.3.2;Strategies;396
9.3.3.2.1; Clustering;396
9.3.3.2.2; Exploratory Data Analysis;396
9.3.3.2.3;Clustering vs. Customer Segmentation;396
9.3.3.3; Clustering Strategy;397
9.3.3.3.1; RFM Model for Customer Value;397
9.3.3.3.2; Data Cleaning;397
9.3.3.3.2.1;Recency;398
9.3.3.3.2.2;Frequency and Monetary Value;399
9.3.3.3.2.3;Data Preprocessing;400
9.3.3.3.2.4;Clustering for Segments;403
9.3.3.3.2.4.1; K-Means Clustering;403
9.3.3.3.2.5;Cluster Analysis;404
9.3.3.3.2.5.1; Cluster Descriptions;406
9.3.4;Cross Selling;409
9.3.4.1;Market Basket Analysis with Association Rule-Mining;410
9.3.4.2;Association Rule-Mining Basics;411
9.3.4.2.1; FP Growth;412
9.3.4.3;Association Rule-Mining in Action;413
9.3.4.3.1;Exploratory Data Analysis;413
9.3.4.3.2;Mining Rules;417
9.3.4.3.2.1;Orange Table Data Structure;417
9.3.4.3.2.2; Using the FP Growth Algorithm;418
9.3.5;Summary;422
9.4;Chapter 9: Analyzing Wine Types and Quality;423
9.4.1; Problem Statement;423
9.4.2; Setting Up Dependencies;424
9.4.3; Getting the Data;424
9.4.4; Exploratory Data Analysis;425
9.4.4.1; Process and Merge Datasets;425
9.4.4.2; Understanding Dataset Features;426
9.4.4.3; Descriptive Statistics;429
9.4.4.4; Inferential Statistics;430
9.4.4.5; Univariate Analysis;432
9.4.4.6;Multivariate Analysis;435
9.4.5; Predictive Modeling;442
9.4.6; Predicting Wine Types;443
9.4.7; Predicting Wine Quality;449
9.4.8; Summary;462
9.5;Chapter 10: Analyzing Music Trends and Recommendations;463
9.5.1; The Million Song Dataset Taste Profile;464
9.5.2; Exploratory Data Analysis;464
9.5.2.1; Loading and Trimming Data;464
9.5.2.2; Enhancing the Data;467
9.5.2.3; Visual Analysis;468
9.5.2.3.1; Most Popular Songs;468
9.5.2.3.2; Most Popular Artist;470
9.5.2.3.3; User vs. Songs Distribution;471
9.5.3; Recommendation Engines;472
9.5.3.1; Types of Recommendation Engines;473
9.5.3.2; Utility of Recommendation Engines;473
9.5.3.3; Popularity-Based Recommendation Engine;474
9.5.3.4; Item Similarity Based Recommendation Engine;475
9.5.3.5; Matrix Factorization Based Recommendation Engine;477
9.5.3.5.1; Matrix Factorization and Singular Value Decomposition;479
9.5.3.5.2; Building a Matrix Factorization Based Recommendation Engine;479
9.5.4; A Note on Recommendation Engine Libraries;482
9.5.5; Summary;482
9.6;Chapter 11: Forecasting Stock and Commodity Prices;483
9.6.1; Time Series Data and Analysis;483
9.6.1.1; Time Series Components;485
9.6.1.2; Smoothing Techniques;487
9.6.1.2.1; Moving Average;487
9.6.1.2.2; Exponential Smoothing;488
9.6.2; Forecasting Gold Price;490
9.6.2.1; Problem Statement;490
9.6.2.2; Dataset;490
9.6.2.3; Traditional Approaches;490
9.6.2.3.1; Key Concepts;491
9.6.2.3.2; ARIMA;491
9.6.2.4; Modeling;492
9.6.3; Stock Price Prediction;499
9.6.3.1; Problem Statement;500
9.6.3.2; Dataset;500
9.6.3.3; Recurrent Neural Networks: LSTM;501
9.6.3.3.1; Regression Modeling;502
9.6.3.3.2; Sequence Modeling;507
9.6.3.4; Upcoming Techniques: Prophet;511
9.6.4; Summary;513
9.7;Chapter 12: Deep Learning for Computer Vision;514
9.7.1; Convolutional Neural Networks;514
9.7.2; Image Classification with CNNs;516
9.7.2.1; Problem Statement;516
9.7.2.2; Dataset;516
9.7.2.3; CNN Based Deep Learning Classifier from Scratch;517
9.7.2.4; CNN Based Deep Learning Classifier with Pretrained Models;520
9.7.3; Artistic Style Transfer with CNNs;524
9.7.3.1; Background;525
9.7.3.2; Preprocessing;526
9.7.3.3; Loss Functions;528
9.7.3.3.1; Content Loss;528
9.7.3.3.2; Style Loss;528
9.7.3.3.3; Total Variation Loss;529
9.7.3.3.4; Overall Loss Function;529
9.7.3.4; Custom Optimizer;530
9.7.3.5; Style Transfer in Action;531
9.7.4; Summary;535
10;Index;536
mehr

Autor



Dipanjan Sarkar is a Data Scientist at Intel, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in Information Technology with specializations in Data Science and Software Engineering from the International Institute of Information Technology, Bangalore. He is also an avid supporter of self-learning, especially Massive Open Online Courses and also holds a Data Science Specialization from Johns Hopkins University on Coursera.


Dipanjan has been an analytics practitioner for several years, specializing in statistical, predictive, and text analytics. Having a passion for data science and education, he is a Data Science Mentor at Springboard, helping people up-skill on areas like Data Science and Machine Learning. Dipanjan has also authored several books on R, Python, Machine Learning and Analytics, including Text Analytics with Python, Apress 2016. Besides this, he occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning.



Raghav Bali has a master's degree (gold medalist) in Information
Technology from International Institute of Information Technology, Bangalore. He is a Data Scientist at Intel, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has also worked as an analyst and developer in domains such as ERP, finance, and BI with some of the leading organizations in the world.


Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He has also authored several books on R, Machine Learning and Analytics. He is a shutterbug, capturing moments when he isn't busy solving problems.


Tushar Sharma has a master's degree from International Institute of Information Technology, Bangalore. He works as a Data Scientist with Intel. His work involves developing analytical solutions at scale using enormous volumes of infrastructure data. In his previous role, he has worked in the financial domain developing scalable machine learning solutions for major financial organizations. He is proficient in Python, R and Big Data frameworks like Spark and Hadoop.


Apart from work Tushar enjoys watching movies, playing badminton and is an avid reader. He has also authored a book on R and social media analytics.