Einband gross Machine Learning Models and Algorithms for Big Data Classification

LeseprobeLeseprobeGrosses BildInhaltsbild

Machine Learning Models and Algorithms for Big Data Classification

von

Suthaharan, Shan

E-BookPDF1 - PDF WatermarkE-Book

359 Seiten

Englisch

SPRINGER USerschienen am20.10.20151st ed. 2016

This book presents machine learning models and algorithms to address big data classification problems. Existing machine learning techniques like the decision tree (a hierarchical approach), random forest (an ensemble hierarchical approach), and deep learning (a layered approach) are highly suitable for the system that can handle such problems. This book helps readers, especially students and newcomers to the field of big data and machine learning, to gain a quick understanding of the techniques and technologies; therefore, the theory, examples, and programs (Matlab and R) presented in this book have been simplified, hardcoded, repeated, or spaced for improvements. They provide vehicles to test and understand the complicated concepts of various topics in the field. It is expected that the readers adopt these programs to experiment with the examples, and then modify or write their own programs toward advancing their knowledge for solving more complex and challenging problems.

The presentation format of this book focuses on simplicity, readability, and dependability so that both undergraduate and graduate students as well as new researchers, developers, and practitioners in this field can easily trust and grasp the concepts, and learn them effectively. It has been written to reduce the mathematical complexity and help the vast majority of readers to understand the topics and get interested in the field. This book consists of four parts, with the total of 14 chapters. The first part mainly focuses on the topics that are needed to help analyze and understand data and big data. The second part covers the topics that can explain the systems required for processing big data. The third part presents the topics required to understand and select machine learning techniques to classify big data. Finally, the fourth part concentrates on the topics that explain the scaling-up machine learning, an important solution for modern big data problems.

Shan Suthaharan is a Professor of Computer Science at the University of North Carolina at Greensboro (UNCG), North Carolina, USA. He also serves as the Director of Undergraduate Studies at the Department of Computer Science at UNCG. He has more than twenty-five years of university teaching and administrative experience, and has taught both undergraduate and graduate courses. His aspiration is to educate and train students so that they can prosper in the computer field by understanding current real-world and complex problems, and develop efficient techniques and technologies. His current teaching interests include big data analytics and machine learning, cryptography and network security, and computer networking and analysis. He earned his doctorate in Computer Science from Monash University, Australia. Since then, he has been actively working on disseminating his knowledge and experience through teaching, advising, seminars, research, and publications. Dr. Suthaharan enjoys investigating real-world, complex problems, and developing and implementing algorithms to solve those problems using modern technologies. The main theme of his current research is the signature discovery and event detection for a secure and reliable environment. The ultimate goal of his research is to build a secure and reliable environment using modern and emerging technologies. His current research primarily focuses on the characterization and detection of environmental events, the exploration of machine learning techniques, and the development of advanced statistical and computational techniques to discover key signatures and detect emerging events from structured and unstructured big data. Dr. Suthaharan has authored or co-authored more than seventy-five research papers in the areas of computer science, and published them in international journals and referred conference proceedings. He also invented a key management and encryption technology, which has been patented in Australia, Japan, and Singapore. He also received visiting scholar awards from and served as a visiting researcher at the University of Sydney, Australia; the University of Melbourne, Australia; and the University of California, Berkeley, USA. He was a senior member of the Institute of Electrical and Electronics Engineers, and volunteered as an elected chair of the Central North Carolina Section twice. He is a member of Sigma Xi, the Scientific Research Society, and a Fellow of the Institution of Engineering and Technology.mehr

Verfügbare Formate

BuchGebunden

EUR160,49

BuchKartoniert, Paperback

EUR160,49

E-BookPDF1 - PDF WatermarkE-Book

EUR149,79

Auf dieser Seite:

EUR149,79

In den Warenkorb

Produkt

KlappentextThis book presents machine learning models and algorithms to address big data classification problems. Existing machine learning techniques like the decision tree (a hierarchical approach), random forest (an ensemble hierarchical approach), and deep learning (a layered approach) are highly suitable for the system that can handle such problems. This book helps readers, especially students and newcomers to the field of big data and machine learning, to gain a quick understanding of the techniques and technologies; therefore, the theory, examples, and programs (Matlab and R) presented in this book have been simplified, hardcoded, repeated, or spaced for improvements. They provide vehicles to test and understand the complicated concepts of various topics in the field. It is expected that the readers adopt these programs to experiment with the examples, and then modify or write their own programs toward advancing their knowledge for solving more complex and challenging problems.

The presentation format of this book focuses on simplicity, readability, and dependability so that both undergraduate and graduate students as well as new researchers, developers, and practitioners in this field can easily trust and grasp the concepts, and learn them effectively. It has been written to reduce the mathematical complexity and help the vast majority of readers to understand the topics and get interested in the field. This book consists of four parts, with the total of 14 chapters. The first part mainly focuses on the topics that are needed to help analyze and understand data and big data. The second part covers the topics that can explain the systems required for processing big data. The third part presents the topics required to understand and select machine learning techniques to classify big data. Finally, the fourth part concentrates on the topics that explain the scaling-up machine learning, an important solution for modern big data problems.

Shan Suthaharan is a Professor of Computer Science at the University of North Carolina at Greensboro (UNCG), North Carolina, USA. He also serves as the Director of Undergraduate Studies at the Department of Computer Science at UNCG. He has more than twenty-five years of university teaching and administrative experience, and has taught both undergraduate and graduate courses. His aspiration is to educate and train students so that they can prosper in the computer field by understanding current real-world and complex problems, and develop efficient techniques and technologies. His current teaching interests include big data analytics and machine learning, cryptography and network security, and computer networking and analysis. He earned his doctorate in Computer Science from Monash University, Australia. Since then, he has been actively working on disseminating his knowledge and experience through teaching, advising, seminars, research, and publications. Dr. Suthaharan enjoys investigating real-world, complex problems, and developing and implementing algorithms to solve those problems using modern technologies. The main theme of his current research is the signature discovery and event detection for a secure and reliable environment. The ultimate goal of his research is to build a secure and reliable environment using modern and emerging technologies. His current research primarily focuses on the characterization and detection of environmental events, the exploration of machine learning techniques, and the development of advanced statistical and computational techniques to discover key signatures and detect emerging events from structured and unstructured big data. Dr. Suthaharan has authored or co-authored more than seventy-five research papers in the areas of computer science, and published them in international journals and referred conference proceedings. He also invented a key management and encryption technology, which has been patented in Australia, Japan, and Singapore. He also received visiting scholar awards from and served as a visiting researcher at the University of Sydney, Australia; the University of Melbourne, Australia; and the University of California, Berkeley, USA. He was a senior member of the Institute of Electrical and Electronics Engineers, and volunteered as an elected chair of the Central North Carolina Section twice. He is a member of Sigma Xi, the Scientific Research Society, and a Fellow of the Institution of Engineering and Technology.

Details

Weitere ISBN/GTIN9781489976413

ProduktartE-Book

EinbandartE-Book

FormatPDF

Format Hinweis1 - PDF Watermark

FormatE107

Verlag

SPRINGER US

Erscheinungsjahr2015

Erscheinungsdatum20.10.2015

Auflage1st ed. 2016

ReiheIntegrated Series in Information Systems

Reihen-Nr.36

Seiten359 Seiten

SpracheEnglisch

IllustrationenXIX, 359 p. 149 illus., 82 illus. in color.

Artikel-Nr.1843302

Rubriken

Genre9200

Verwandte Artikel

ReiheIntegrated Series in Information Systems

EURO Working Group on DSS

Information Technology Governance in Public Organizations

User Centric E-Government

Real-World Decision Support Systems

Machine Learning Models and Algorithms for Big Data Classification

Inhalt/Kritik

Inhaltsverzeichnis

1;Preface;8
2;Acknowledgements;10
3;About the Author;12
4;Contents;14
5;1 Science of Information;21
5.1;1.1 Data Science;21
5.1.1;1.1.1 Technological Dilemma;22
5.1.2;1.1.2 Technological Advancement;22
5.2;1.2 Big Data Paradigm;23
5.2.1;1.2.1 Facts and Statistics of a System;23
5.2.1.1;1.2.1.1 Data;23
5.2.1.2;1.2.1.2 Knowledge;24
5.2.1.3;1.2.1.3 Physical Operation;24
5.2.1.4;1.2.1.4 Mathematical Operation;25
5.2.1.5;1.2.1.5 Logical Operation;25
5.2.2;1.2.2 Big Data Versus Regular Data;25
5.2.2.1;1.2.2.1 Scenario;25
5.2.2.2;1.2.2.2 Data Representation;26
5.3;1.3 Machine Learning Paradigm;27
5.3.1;1.3.1 Modeling and Algorithms;27
5.3.2;1.3.2 Supervised and Unsupervised;27
5.3.2.1;1.3.2.1 Classification;28
5.3.2.2;1.3.2.2 Clustering;29
5.4;1.4 Collaborative Activities;30
5.5;1.5 A Snapshot;30
5.5.1;1.5.1 The Purpose and Interests;30
5.5.2;1.5.2 The Goal and Objectives;31
5.5.3;1.5.3 The Problems and Challenges;31
5.6;Problems;31
5.7;References;32
6;Part I Understanding Big Data;34
6.1;2 Big Data Essentials;35
6.1.1;2.1 Big Data Analytics;35
6.1.1.1;2.1.1 Big Data Controllers;36
6.1.1.2;2.1.2 Big Data Problems;37
6.1.1.3;2.1.3 Big Data Challenges;37
6.1.1.4;2.1.4 Big Data Solutions;38
6.1.2;2.2 Big Data Classification;38
6.1.2.1;2.2.1 Representation Learning;39
6.1.2.2;2.2.2 Distributed File Systems;40
6.1.2.3;2.2.3 Classification Modeling;41
6.1.2.3.1;2.2.3.1 Class Characteristics;41
6.1.2.3.2;2.2.3.2 Error Characteristics;42
6.1.2.3.3;2.2.3.3 Domain Characteristics;43
6.1.2.4;2.2.4 Classification Algorithms;43
6.1.2.4.1;2.2.4.1 Training;44
6.1.2.4.2;2.2.4.2 Validation;44
6.1.2.4.3;2.2.4.3 Testing;44
6.1.3;2.3 Big Data Scalability;44
6.1.3.1;2.3.1 High-Dimensional Systems;45
6.1.3.2;2.3.2 Low-Dimensional Structures;45
6.1.4;Problems;46
6.1.5;References;46
6.2;3 Big Data Analytics;48
6.2.1;3.1 Analytics Fundamentals;48
6.2.1.1;3.1.1 Research Questions;49
6.2.1.2;3.1.2 Choices of Data Sets;50
6.2.2;3.2 Pattern Detectors;51
6.2.2.1;3.2.1 Statistical Measures;51
6.2.2.1.1;3.2.1.1 Counting;51
6.2.2.1.2;3.2.1.2 Mean and Variance;51
6.2.2.1.3;3.2.1.3 Covariance and Correlation;54
6.2.2.2;3.2.2 Graphical Measures;55
6.2.2.2.1;3.2.2.1 Histogram;55
6.2.2.2.2;3.2.2.2 Skewness;55
6.2.2.2.3;3.2.2.3 Scatter Plot;58
6.2.2.3;3.2.3 Coding Example;58
6.2.3;3.3 Patterns of Big Data;61
6.2.3.1;3.3.1 Standardization: A Coding Example;64
6.2.3.2;3.3.2 Evolution of Patterns;66
6.2.3.3;3.3.3 Data Expansion Modeling;68
6.2.3.3.1;3.3.3.1 Orthogonalization: A Coding Example;69
6.2.3.3.2;3.3.3.2 No Mean-Shift, Max Weights, Gaussian Increase;72
6.2.3.3.3;3.3.3.3 Mean-Shift, Max Weights, Gaussian Increase;72
6.2.3.3.4;3.3.3.4 No Mean-Shift, Gaussian Weights, Gaussian Increase;74
6.2.3.3.5;3.3.3.5 Mean-Shift, Gaussian Weights, Gaussian Increase;74
6.2.3.3.6;3.3.3.6 Coding Example;74
6.2.3.4;3.3.4 Deformation of Patterns;79
6.2.3.4.1;3.3.4.1 Imbalanced Data;80
6.2.3.4.2;3.3.4.2 Inaccurate Data;80
6.2.3.4.3;3.3.4.3 Incomplete data;81
6.2.3.4.4;3.3.4.4 Coding Example;82
6.2.3.5;3.3.5 Classification Errors;83
6.2.3.5.1;3.3.5.1 Approximation;83
6.2.3.5.2;3.3.5.2 Estimation;84
6.2.3.5.3;3.3.5.3 Optimization;84
6.2.4;3.4 Low-Dimensional Structures;84
6.2.4.1;3.4.1 A Toy Example;84
6.2.4.2;3.4.2 A Real Example;86
6.2.4.2.1;3.4.2.1 Relative Scoring;86
6.2.4.2.2;3.4.2.2 Coding Example;87
6.2.5;Problems;90
6.2.6;References;91
7;Part II Understanding Big Data Systems;93
7.1;4 Distributed File System;94
7.1.1;4.1 Hadoop Framework;94
7.1.1.1;4.1.1 Hadoop Distributed File System;95
7.1.1.2;4.1.2 MapReduce Programming Model;96
7.1.2;4.2 Hadoop System;96
7.1.2.1;4.2.1 Operating System;97
7.1.2.2;4.2.2 Distributed System;97
7.1.2.3;4.2.3 Programming Platform;98
7.1.3;4.3 Hadoop Environment;98
7.1.3.1;4.3.1 Essential Tools;99
7.1.3.1.1;4.3.1.1 Windows 7 (WN);99
7.1.3.1.2;4.3.1.2 VirtualBox (VB);99
7.1.3.1.3;4.3.1.3 Ubuntu Linux (UB);99
7.1.3.1.4;4.3.1.4 Cloudera Hadoop (CH);100
7.1.3.1.5;4.3.1.5 R and RStudio (RR);100
7.1.3.2;4.3.2 Installation Guidance;100
7.1.3.2.1;4.3.2.1 Internet Resources;101
7.1.3.2.2;4.3.2.2 Setting Up a Virtual Machine;102
7.1.3.2.3;4.3.2.3 Setting Up a Ubuntu O/S;102
7.1.3.2.4;4.3.2.4 Setting Up a Hadoop Distributed File System;103
7.1.3.2.5;4.3.2.5 Setting Up an R Environment;104
7.1.3.2.6;4.3.2.6 RStudio;107
7.1.3.3;4.3.3 RStudio Server;108
7.1.3.3.1;4.3.3.1 Server Setup;108
7.1.3.3.2;4.3.3.2 Client Setup;108
7.1.4;4.4 Testing the Hadoop Environment;109
7.1.4.1;4.4.1 Standard Example;109
7.1.4.2;4.4.2 Alternative Example;110
7.1.5;4.5 Multinode Hadoop;110
7.1.5.1;4.5.1 Virtual Network;111
7.1.5.2;4.5.2 Hadoop Setup;111
7.1.6;Problems;112
7.1.7;References;112
7.2;5 MapReduce Programming Platform;113
7.2.1;5.1 MapReduce Framework;113
7.2.1.1;5.1.1 Parametrization;114
7.2.1.2;5.1.2 Parallelization;115
7.2.2;5.2 MapReduce Essentials;116
7.2.2.1;5.2.1 Mapper Function;116
7.2.2.2;5.2.2 Reducer Function;117
7.2.2.3;5.2.3 MapReduce Function;118
7.2.2.4;5.2.4 A Coding Example;118
7.2.3;5.3 MapReduce Programming;121
7.2.3.1;5.3.1 Naming Convention;121
7.2.3.2;5.3.2 Coding Principles;122
7.2.3.2.1;5.3.2.1 Input: Initialization;122
7.2.3.2.2;5.3.2.2 Input: Fork MapReduce job;123
7.2.3.2.3;5.3.2.3 Input: Add Input to dfs;123
7.2.3.2.4;5.3.2.4 Processing: Mapper;124
7.2.3.2.5;5.3.2.5 Processing: Reducer;124
7.2.3.2.6;5.3.2.6 Processing: MapReduce;124
7.2.3.2.7;5.3.2.7 Output: Get Output from dfs;124
7.2.3.3;5.3.3 Application of Coding Principles;124
7.2.3.3.1;5.3.3.1 A Coding Example;125
7.2.3.3.2;5.3.3.2 Pythagorean Numbers;126
7.2.3.3.3;5.3.3.3 Summarization;127
7.2.4;5.4 File Handling in MapReduce;127
7.2.4.1;5.4.1 Pythagorean Numbers;128
7.2.4.2;5.4.2 File Split Example;129
7.2.4.3;5.4.3 File Split Improved;130
7.2.5;Problems;132
7.2.6;References;132
8;Part III Understanding Machine Learning;134
8.1;6 Modeling and Algorithms;135
8.1.1;6.1 Machine Learning;135
8.1.1.1;6.1.1 A Simple Example;136
8.1.1.2;6.1.2 Domain Division Perspective;137
8.1.1.3;6.1.3 Data Domain;140
8.1.1.4;6.1.4 Domain Division;141
8.1.2;6.2 Learning Models;142
8.1.2.1;6.2.1 Mathematical Models;144
8.1.2.2;6.2.2 Hierarchical Models;146
8.1.2.3;6.2.3 Layered Models;147
8.1.2.4;6.2.4 Comparison of the Models;147
8.1.2.4.1;6.2.4.1 Data Domain Perspective;147
8.1.2.4.2;6.2.4.2 Programming Perspective;148
8.1.3;6.3 Learning Algorithms;152
8.1.3.1;6.3.1 Supervised Learning;152
8.1.3.2;6.3.2 Types of Learning;153
8.1.4;Problems;154
8.1.5;References;154
8.2;7 Supervised Learning Models;156
8.2.1;7.1 Supervised Learning Objectives;156
8.2.1.1;7.1.1 Parametrization Objectives;157
8.2.1.1.1;7.1.1.1 Prediction Point of View;157
8.2.1.1.2;7.1.1.2 Classification Point of View;158
8.2.1.2;7.1.2 Optimization Objectives;159
8.2.1.2.1;7.1.2.1 Prediction Point of View;160
8.2.1.2.2;7.1.2.2 Classification Point of View;161
8.2.2;7.2 Regression Models;161
8.2.2.1;7.2.1 Continuous Response;162
8.2.2.2;7.2.2 Theory of Regression Models;162
8.2.2.2.1;7.2.2.1 Standard Regression;162
8.2.2.2.2;7.2.2.2 Ridge Regression;165
8.2.2.2.3;7.2.2.3 Lasso Regression;167
8.2.2.2.4;7.2.2.4 Elastic-Net Regression;169
8.2.3;7.3 Classification Models;171
8.2.3.1;7.3.1 Discrete Response;171
8.2.3.2;7.3.2 Mathematical Models;173
8.2.3.2.1;7.3.2.1 Logistic Regression;173
8.2.3.2.2;7.3.2.2 SVM Family;175
8.2.4;7.4 Hierarchical Models;177
8.2.4.1;7.4.1 Decision Tree;178
8.2.4.2;7.4.2 Random Forest;178
8.2.4.2.1;7.4.2.1 A Coding Example;180
8.2.5;7.5 Layered Models;181
8.2.5.1;7.5.1 Shallow Learning;182
8.2.5.1.1;7.5.1.1 A Coding Example;182
8.2.5.2;7.5.2 Deep Learning;188
8.2.5.2.1;7.5.2.1 Some Modern Deep Learning Models;190
8.2.6;Problems;190
8.2.7;References;191
8.3;8 Supervised Learning Algorithms;193
8.3.1;8.1 Supervised Learning;193
8.3.1.1;8.1.1 Learning;195
8.3.1.2;8.1.2 Training;196
8.3.1.3;8.1.3 Testing;198
8.3.1.4;8.1.4 Validation;200
8.3.1.4.1;8.1.4.1 Testing of Models on Seen Data;201
8.3.1.4.2;8.1.4.2 Testing of Models on Unseen Data;201
8.3.1.4.3;8.1.4.3 Testing of Models on Partially Seen and Unseen Data;202
8.3.2;8.2 Cross-Validation;202
8.3.2.1;8.2.1 Tenfold Cross-Validation;203
8.3.2.2;8.2.2 Leave-One-Out;203
8.3.2.3;8.2.3 Leave-p-Out;204
8.3.2.4;8.2.4 Random Subsampling;205
8.3.2.5;8.2.5 Dividing Data Sets;205
8.3.2.5.1;8.2.5.1 Possible Ratios;206
8.3.2.5.2;8.2.5.2 Significance;206
8.3.3;8.3 Measures;206
8.3.3.1;8.3.1 Quantitative Measure;207
8.3.3.1.1;8.3.1.1 Distance-Based;207
8.3.3.1.2;8.3.1.2 Irregularity-Based;207
8.3.3.1.3;8.3.1.3 Probability-Based;208
8.3.3.2;8.3.2 Qualitative Measure;208
8.3.3.2.1;8.3.2.1 Visualization-Based;208
8.3.3.2.2;8.3.2.2 Confusion-Based;209
8.3.3.2.3;8.3.2.3 Oscillation-Based;211
8.3.4;8.4 A Simple 2D Example;212
8.3.5;Problems;214
8.3.6;References;215
8.4;9 Support Vector Machine;217
8.4.1;9.1 Linear Support Vector Machine;217
8.4.1.1;9.1.1 Linear Classifier: Separable Linearly;218
8.4.1.1.1;9.1.1.1 The Learning Model;220
8.4.1.1.2;9.1.1.2 A Coding Example: Two Points, Single Line;221
8.4.1.1.3;9.1.1.3 A Coding Example: Two Points, Three Lines;222
8.4.1.1.4;9.1.1.4 A Coding Example: Five Points, Three Lines;226
8.4.1.2;9.1.2 Linear Classifier: Nonseparable Linearly;228
8.4.2;9.2 Lagrangian Support Vector Machine;229
8.4.2.1;9.2.1 Modeling of LSVM;229
8.4.2.2;9.2.2 Conceptualized Example;229
8.4.2.3;9.2.3 Algorithm and Coding of LSVM;230
8.4.3;9.3 Nonlinear Support Vector Machine;233
8.4.3.1;9.3.1 Feature Space;234
8.4.3.2;9.3.2 Kernel Trick;234
8.4.3.3;9.3.3 SVM Algorithms on Hadoop;237
8.4.3.3.1;9.3.3.1 SVM: Reducer Implementation;237
8.4.3.3.2;9.3.3.2 LSVM: Mapper Implementation;240
8.4.3.4;9.3.4 Real Application;243
8.4.4;Problems;244
8.4.5;References;245
8.5;10 Decision Tree Learning;246
8.5.1;10.1 The Decision Tree;246
8.5.1.1;10.1.1 A Coding Example-Classification Tree;250
8.5.1.2;10.1.2 A Coding Example-Regression Tree;253
8.5.2;10.2 Types of Decision Trees;254
8.5.2.1;10.2.1 Classification Tree;255
8.5.2.2;10.2.2 Regression Tree;256
8.5.3;10.3 Decision Tree Learning Model;257
8.5.3.1;10.3.1 Parametrization;257
8.5.3.2;10.3.2 Optimization;258
8.5.4;10.4 Quantitative Measures;259
8.5.4.1;10.4.1 Entropy and Cross-Entropy;259
8.5.4.2;10.4.2 Gini Impurity;261
8.5.4.3;10.4.3 Information Gain;264
8.5.5;10.5 Decision Tree Learning Algorithm;265
8.5.5.1;10.5.1 Training Algorithm;266
8.5.5.2;10.5.2 Validation Algorithm;272
8.5.5.3;10.5.3 Testing Algorithm;272
8.5.6;10.6 Decision Tree and Big Data;275
8.5.6.1;10.6.1 Toy Example;275
8.5.7;Problems;277
8.5.8;References;278
9;Part IV Understanding Scaling-Up Machine Learning;279
9.1;11 Random Forest Learning;280
9.1.1;11.1 The Random Forest;280
9.1.1.1;11.1.1 Parallel Structure;281
9.1.1.2;11.1.2 Model Parameters;282
9.1.1.3;11.1.3 Gain/Loss Function;283
9.1.1.4;11.1.4 Bootstrapping and Bagging;283
9.1.1.4.1;11.1.4.1 Bootstrapping;283
9.1.1.4.2;11.1.4.2 Overlap Thinning;284
9.1.1.4.3;11.1.4.3 Bagging;285
9.1.2;11.2 Random Forest Learning Model;285
9.1.2.1;11.2.1 Parametrization;286
9.1.2.2;11.2.2 Optimization;286
9.1.3;11.3 Random Forest Learning Algorithm;286
9.1.3.1;11.3.1 Training Algorithm;287
9.1.3.1.1;11.3.1.1 Coding Example;288
9.1.3.2;11.3.2 Testing Algorithm;290
9.1.4;11.4 Random Forest and Big Data;291
9.1.4.1;11.4.1 Random Forest Scalability;291
9.1.4.2;11.4.2 Big Data Classification;291
9.1.5;Problems;294
9.1.6;References;295
9.2;12 Deep Learning Models;296
9.2.1;12.1 Introduction;296
9.2.2;12.2 Deep Learning Techniques;298
9.2.2.1;12.2.1 No-Drop Deep Learning;298
9.2.2.2;12.2.2 Dropout Deep Learning;298
9.2.2.3;12.2.3 Dropconnect Deep Learning;299
9.2.2.4;12.2.4 Gradient Descent;300
9.2.2.4.1;12.2.4.1 Conceptualized Example;301
9.2.2.4.2;12.2.4.2 Numerical Example;302
9.2.2.5;12.2.5 A Simple Example;304
9.2.2.6;12.2.6 MapReduce Implementation;305
9.2.3;12.3 Proposed Framework;308
9.2.3.1;12.3.1 Motivation;308
9.2.3.2;12.3.2 Parameters Mapper;308
9.2.4;12.4 Implementation of Deep Learning;310
9.2.4.1;12.4.1 Analysis of Domain Divisions;310
9.2.4.2;12.4.2 Analysis of Classification Accuracies;310
9.2.5;12.5 Ensemble Approach;312
9.2.6;Problems;313
9.2.7;References;313
9.3;13 Chandelier Decision Tree;315
9.3.1;13.1 Unit Circle Algorithm;315
9.3.1.1;13.1.1 UCA Classification;316
9.3.1.2;13.1.2 Improved UCA Classification;317
9.3.1.3;13.1.3 A Coding Example;318
9.3.1.4;13.1.4 Drawbacks of UCA;321
9.3.2;13.2 Unit Circle Machine;321
9.3.2.1;13.2.1 UCM Classification;321
9.3.2.2;13.2.2 A Coding Example;322
9.3.2.3;13.2.3 Drawbacks of UCM;324
9.3.3;13.3 Unit Ring Algorithm;324
9.3.3.1;13.3.1 A Coding Example;325
9.3.3.2;13.3.2 Unit Ring Machine;327
9.3.3.3;13.3.3 A Coding Example;327
9.3.3.4;13.3.4 Drawbacks of URM;329
9.3.4;13.4 Chandelier Decision Tree;329
9.3.4.1;13.4.1 CDT-Based Classification;330
9.3.4.2;13.4.2 Extension to Random Chandelier;334
9.3.5;Problems;334
9.3.6;References;334
9.4;14 Dimensionality Reduction;335
9.4.1;14.1 Introduction;335
9.4.2;14.2 Feature Hashing Techniques;336
9.4.2.1;14.2.1 Standard Feature Hashing;337
9.4.2.2;14.2.2 Flagged Feature Hashing;337
9.4.3;14.3 Proposed Feature Hashing;338
9.4.3.1;14.3.1 Binning and Mitigation;338
9.4.3.2;14.3.2 Mitigation Justification;339
9.4.3.3;14.3.3 Toy Example;339
9.4.4;14.4 Simulation and Results;340
9.4.4.1;14.4.1 A Matlab Implementation;340
9.4.4.2;14.4.2 A MapReduce Implementation;343
9.4.5;14.5 Principal Component Analysis;346
9.4.5.1;14.5.1 Eigenvector;347
9.4.5.2;14.5.2 Principal Components;349
9.4.5.3;14.5.3 The Principal Directions;352
9.4.5.4;14.5.4 A 2D Implementation;354
9.4.5.5;14.5.5 A 3D Implementation;356
9.4.5.6;14.5.6 A Generalized Implementation;358
9.4.6;Problems;360
9.4.7;References;360
10;Index;362mehr

Schlagworte

VLB Haupt-Lesemotiv

Verstehen (10)

Autor

Suthaharan, Shan

Shan Suthaharan is a Professor of Computer Science at the University of North Carolina at Greensboro (UNCG), North Carolina, USA. He also serves as the Director of Undergraduate Studies at the Department of Computer Science at UNCG. He has more than twenty-five years of university teaching and administrative experience, and has taught both undergraduate and graduate courses. His aspiration is to educate and train students so that they can prosper in the computer field by understanding current real-world and complex problems, and develop efficient techniques and technologies. His current teaching interests include big data analytics and machine learning, cryptography and network security, and computer networking and analysis. He earned his doctorate in Computer Science from Monash University, Australia. Since then, he has been actively working on disseminating his knowledge and experience through teaching, advising, seminars, research, and publications. Dr. Suthaharan enjoys investigating real-world, complex problems, and developing and implementing algorithms to solve those problems using modern technologies. The main theme of his current research is the signature discovery and event detection for a secure and reliable environment. The ultimate goal of his research is to build a secure and reliable environment using modern and emerging technologies. His current research primarily focuses on the characterization and detection of environmental events, the exploration of machine learning techniques, and the development of advanced statistical and computational techniques to discover key signatures and detect emerging events from structured and unstructured big data. Dr. Suthaharan has authored or co-authored more than seventy-five research papers in the areas of computer science, and published them in international journals and referred conference proceedings. He also invented a key management and encryption technology, which has been patented in Australia, Japan, and Singapore. He also received visiting scholar awards from and served as a visiting researcher at the University of Sydney, Australia; the University of Melbourne, Australia; and the University of California, Berkeley, USA. He was a senior member of the Institute of Electrical and Electronics Engineers, and volunteered as an elected chair of the Central North Carolina Section twice. He is a member of Sigma Xi, the Scientific Research Society, and a Fellow of the Institution of Engineering and Technology.

Bitte warten - die Druckansicht der Seite wird vorbereitet.

Machine Learning Models and Algorithms for Big Data Classification

Verfügbare Formate

Produkt

Inhalt/Kritik

Schlagworte

Autor

Services

Infos

Über uns

Leistungen

Newsletter

Zahlung