Hugendubel.info - Die B2B Online-Buchhandlung 

Merkliste
Die Merkliste ist leer.
Bitte warten - die Druckansicht der Seite wird vorbereitet.
Der Druckdialog öffnet sich, sobald die Seite vollständig geladen wurde.
Sollte die Druckvorschau unvollständig sein, bitte schliessen und "Erneut drucken" wählen.

Classification - the Ubiquitous Challenge

E-BookPDF1 - PDF WatermarkE-Book
704 Seiten
Englisch
Springer Berlin Heidelbergerschienen am30.03.20062005
The contributions in this volume represent the latest research results in the field of classification, clustering, and data analysis. Besides the theoretical analysis, papers focus on various application fields as archaeology, astronomy, bio-sciences, business, electronic data and web, finance and insurance, library science and linguistics, marketing, music science, and quality assurance.mehr
Verfügbare Formate
BuchKartoniert, Paperback
EUR213,99
E-BookPDF1 - PDF WatermarkE-Book
EUR213,99

Produkt

KlappentextThe contributions in this volume represent the latest research results in the field of classification, clustering, and data analysis. Besides the theoretical analysis, papers focus on various application fields as archaeology, astronomy, bio-sciences, business, electronic data and web, finance and insurance, library science and linguistics, marketing, music science, and quality assurance.
Details
Weitere ISBN/GTIN9783540280842
ProduktartE-Book
EinbandartE-Book
FormatPDF
Format Hinweis1 - PDF Watermark
FormatE107
Erscheinungsjahr2006
Erscheinungsdatum30.03.2006
Auflage2005
Seiten704 Seiten
SpracheEnglisch
IllustrationenXX, 704 p. 181 illus.
Artikel-Nr.1424263
Rubriken
Genre9200

Inhalt/Kritik

Inhaltsverzeichnis
1;Preface;6
2;Contents;14
3;Part I (Semi-) Plenary Presentations;22
3.1;Classification and Data Mining in Musicology;24
3.1.1;1 Introduction;24
3.1.2;2 Music,1/f-noise, fractal and chaos;24
3.1.3;3 Music and entropy;25
3.1.4;4 Score information and performance;27
3.1.5;Acknowledgements;28
3.1.6;References;31
3.2;Bayesian Mixed Membership Models for Soft Clustering and Classi.cation;32
3.2.1;1 Introduction;32
3.2.2;2 Mixed membership models;35
3.2.3;3 Disability types among older adults;37
3.2.3.1;3.1 National Long Term Care Survey;37
3.2.3.2;3.2 Applying the mixed membership model;38
3.2.4;4 Classifying publications by topic;39
3.2.4.1;4.1 Proceedings of the National Academy of Sciences;39
3.2.4.2;4.2 Applying the mixed membership model;40
3.2.4.3;4.3 An alternative approach with related data;43
3.2.4.4;4.4 Choosing K to describe PNAS topics;43
3.2.5;5 Summary and concluding remarks;44
3.2.6;Acknowledgments;45
3.2.7;References;45
3.3;Predicting Protein Secondary Structure with Markov Models;48
3.3.1;1 Introduction;48
3.3.2;2 Themethod;49
3.3.3;3 Improvements;50
3.3.4;4 Ongoing research;53
3.3.5;5 Summary;54
3.3.6;References;54
3.4;Milestones in the History of Data Visualization: A Case Study in Statistical Historiography;55
3.4.1;1 Introduction;55
3.4.1.1;1.1 The Milestones Project;56
3.4.2;2 Milestones tour;57
3.4.2.1;2.1 1600-1699: Measurement and theory;57
3.4.2.2;2.2 1700-1799: New graphic forms;58
3.4.2.3;2.3 1800-1850: Beginnings of modern graphics;58
3.4.2.4;2.4 1850-1900: The Golden Age of statistical graphics;60
3.4.2.5;2.5 1900-1950: The modern dark ages;60
3.4.2.6;2.6 1950-1975: Re-birth of data visualization;61
3.4.3;3 Problems and methods in statistical historiography;62
3.4.3.1;3.1 What counts as a Milestone?;62
3.4.3.2;3.2 Who gets credit?;63
3.4.3.3;3.3 Dating milestones;63
3.4.3.4;3.4 What is milestones data ;64
3.4.3.5;3.5 Analyzing milestones data ;64
3.4.3.6;3.6 What was he thinking?: Understanding through reproduction;64
3.4.3.7;3.7 What kinds of tools are needed?;65
3.4.4;4 How to visualize a history?;66
3.4.4.1;4.1 Lessons from the past;67
3.4.4.2;4.2 Lessons from the present;68
3.4.4.3;4.3 Lessons from the web;69
3.4.4.4;4.4 Lessons from the data visualization;70
3.4.5;Acknowledgments;70
3.4.6;References;71
3.5;Quantitative Text Typology: The Impact of Word Length;74
3.5.1;1 Introduction: Structuring the universe of texts;74
3.5.1.1;1.1 Classification and quantification;74
3.5.1.2;1.2 Quantitative text analysis: From a de.nition of the basics towards data homogeneity;75
3.5.1.3;1.3 Word length in a synergetic context;76
3.5.1.4;1.4 Qualitative and quantitative classi.cations: A priori and a posteriori;77
3.5.2;2 A case study: Classifying 398 Slovenian texts;78
3.5.2.1;2.1 Post hoc analysis of mean word length;80
3.5.2.2;2.2 Discriminant analyses: The whole corpus;80
3.5.2.3;2.3 From four to two letter types;81
3.5.2.4;2.4 Towards a new typology;82
3.5.2.5;2.5 Conclusion;85
3.5.3;References;85
3.6;Cluster Ensembles;86
3.6.1;1 Introduction;86
3.6.2;2 Consensus partitions;88
3.6.3;3 Extensions;91
3.6.4;References;92
3.7;Bootstrap Confidence Intervals for Three-way Component Methods;94
3.7.1;1 Introduction;94
3.7.2;2 The bootstrap for fully determined solutions;95
3.7.3;3 Smaller bootstrap intervals using transformations;98
3.7.4;4 Performance of bootstrap confidence intervals;99
3.7.5;5 An application: Bootstrap confidence intervals for results from a Tucker3 Analysis;100
3.7.6;6 Discussion;102
3.7.7;References;104
3.8;Organising the Knowledge Space for Software Components;106
3.8.1;1 Introduction;106
3.8.2;2 The software development process;107
3.8.3;3 A knowledge space for software development;109
3.8.4;4 Organising the knowledge space;110
3.8.4.1;4.1 Ontologies;110
3.8.4.2;4.2 A discovery and composition ontology;111
3.8.4.3;4.3 Description of components;112
3.8.4.4;4.4 Discovery and composition of components;113
3.8.5;5 Conclusions;116
3.8.6;References;116
3.9;Multimedia Pattern Recognition in Soccer Video Using Time Intervals;118
3.9.1;1 Introduction;118
3.9.2;2 Multimedia event classification framework;119
3.9.2.1;2.1 Pattern representation;120
3.9.2.2;2.2 Pattern classification;122
3.9.3;3 Highlight event classification in soccer broadcasts;124
3.9.4;4 Evaluation;126
3.9.4.1;4.1 Evaluation criteria;126
3.9.4.2;4.2 Classification results;127
3.9.5;5 Conclusion;128
3.9.6;References;129
3.10;Quantitative Assessment of the Responsibility for the Disease Load in a Population;130
3.10.1;1 Introduction;130
3.10.2;2 Basic definitions of attributable risk;131
3.10.3;3 Crude and adjusted attributable risk;132
3.10.4;4 Sequential attributable risk;133
3.10.5;5 Partial attributable risk;134
3.10.6;6 Illustrative example: The G.R.I.P.S. Study;135
3.10.7;7 Conclusion;136
3.10.8;Acknowledgment;137
3.10.9;References;137
4;Part II Classification and Data Analysis;140
4.1;Bootstrapping Latent Class Models;142
4.1.1;1 Introduction;142
4.1.2;2 Bootstrap analysis;143
4.1.3;3 Bootstrap analysis in .nite mixture models;144
4.1.4;4 An application to the latent class model;145
4.1.5;5 Conclusion;149
4.1.6;References;149
4.2;Dimensionality of Random Subspaces;150
4.2.1;1 Introduction;150
4.2.2;2 Model aggregation;151
4.2.3;3 Random Subspace Method;152
4.2.4;4 Feature selection for ensembles;153
4.2.5;5 Proposed method;154
4.2.6;6 Related work;154
4.2.7;7 Experiments;155
4.2.8;8 Summary;156
4.2.9;References;156
4.3;Two-stage Classification with Automatic Feature Selection for an Industrial Application;158
4.3.1;1 Introduction;158
4.3.2;2 Two-stage classification;159
4.3.2.1;2.1 Motivation;159
4.3.2.2;2.2 First stage - object classification;160
4.3.2.3;2.3 Second stage - image sequence classification;160
4.3.2.4;2.4 Polynomial classifier;160
4.3.3;3 System optimization;161
4.3.3.1;3.1 Wrapper approach;161
4.3.3.2;3.2 Search strategies in feature subsets;162
4.3.3.3;3.3 Efficiency;162
4.3.4;4 Experimental results;163
4.3.5;5 Conclusion and outlook;165
4.3.6;References;165
4.4;Bagging, Boosting and Ordinal Classification;166
4.4.1;1 Introduction;166
4.4.2;2 Aggregating classi.ers;166
4.4.3;3 Ordinal prediction;168
4.4.4;4 Empirical studies;171
4.4.5;5 Concluding remarks;172
4.4.6;References;173
4.5;A Method for Visual Cluster Validation;174
4.5.1;1 Introduction;174
4.5.2;2 Optimal projection for separation;176
4.5.3;3 Optimal projection for heterogeneity;177
4.5.4;4 Example;178
4.5.5;5 Conclusion;181
4.5.6;References;181
4.6;Empirical Comparison of Boosting Algorithms;182
4.6.1;1 Introduction;182
4.6.2;2 Arcing algorithms;183
4.6.2.1;2.1 Adaboost;183
4.6.2.2;2.2 Arcing family;185
4.6.3;3 Empirical study;185
4.6.3.1;3.1 Base classi.er and performance measure;186
4.6.3.2;3.2 Results;186
4.6.4;4 Conclusion;187
4.6.5;References;188
4.7;Iterative Majorization Approach to the Distance-based Discriminant Analysis;189
4.7.1;1 Introduction;189
4.7.2;2 Problem formulation;190
4.7.3;3 Iterative majorization;191
4.7.4;4 Dimensionality reduction and multiple-class setting;193
4.7.5;5 Experimental results;194
4.7.6;References;196
4.8;An Extension of the CHAID Tree-based Segmentation Algorithm to Multiple Dependent Variables;197
4.8.1;1 Background and summary of approach;197
4.8.2;2 The CHAID algorithm;198
4.8.3;3 Latent class modeling;200
4.8.4;4 The hybrid CHAID algorithm;201
4.8.5;5 Empirical example;202
4.8.6;6 Final comments;203
4.8.7;References;204
4.9;Expectation of Random Sets and the Mean Values of Interval Data;205
4.9.1;1 Introduction;205
4.9.2;2 Reduction to characteristic points;206
4.9.2.1;3.1 The Aumann expectation;207
4.9.2.2;3.2 The Frechet expectation;207
4.9.2.3;3.3 The Doss expectation;208
4.9.2.4;3.4 The Vorob ev expectation;208
4.9.3;4 Expectations of Random Closed Rectangles;209
4.9.3.1;4.1 The Aumann expectation;209
4.9.3.2;4.2 The Frechet expectation;211
4.9.3.3;4.3 The Doss expectation;211
4.9.3.4;4.4 The Vorob ev expectation;211
4.9.4;5 Discussion;212
4.9.5;References;212
4.10;Experimental Design for Variable Selection in Data Bases;213
4.10.1;1 Introduction;213
4.10.2;2 Data;214
4.10.3;3 Plackett-Burman designs;214
4.10.4;4 Results;216
4.10.4.1;4.1 Stepwise regression by forward selection;216
4.10.4.2;4.2 Classification methods;216
4.10.4.3;4.3 Variable assessment;217
4.10.5;5 Conclusion;220
4.10.6;References;220
4.11;KMC/EDAM: A New Approach for the Visualization of K-Means Clustering Results;221
4.11.1;1 Introduction;221
4.11.2;2 Methods;222
4.11.2.1;2.1 Preliminaries;222
4.11.2.2;2.2 Basic idea;223
4.11.2.3;2.3 KMC/EDAM;223
4.11.3;3 Examples;225
4.11.4;4 Conclusion;228
4.11.5;References;228
4.12;Clustering of Variables with Missing Data: Application to Preference Studies;229
4.12.1;1 Introduction;229
4.12.2;2 Clustering of variables around latent components;230
4.12.3;3 Imputation methods;230
4.12.3.1;3.1 Direct imputation methods;230
4.12.3.2;3.2 Imputation within each cluster;230
4.12.3.3;3.3 Method based on a cross-partition;231
4.12.4;4 Illustration: data set jam ;232
4.12.5;5 Simulation study;233
4.12.5.1;5.1 Jam data set;233
4.12.5.2;5.2 Simulated data;233
4.12.5.3;5.3 Criterion for comparison;233
4.12.5.4;5.4 Results;234
4.12.6;6 Conclusion;236
4.12.7;Acknowledgment;236
4.12.8;References;236
4.13;Binary On-line Classification Based on Temporally Integrated Information;237
4.13.1;1 General framework;237
4.13.1.1;1.1 Data format;238
4.13.1.2;1.2 On-line classification;238
4.13.2;2 Integration of information across time;239
4.13.3;3 Application;240
4.13.3.1;3.1 Neurophysiology;241
4.13.3.2;3.2 Model;241
4.13.3.3;3.3 Results;243
4.13.4;References;244
4.14;Different Subspace Classification;245
4.14.1;1 Introduction;245
4.14.2;2 Notationandmethod;246
4.14.2.1;2.1 Characteristic regions;246
4.14.2.2;2.2 Classification rule;247
4.14.3;3 Visualization;248
4.14.4;4 Parameter choice for DiSCo;249
4.14.4.1;4.1 Building the regions;249
4.14.4.2;4.2 Optimizing the thresholds;250
4.14.5;5 Simulation study;250
4.14.5.1;5.1 Data generation;250
4.14.5.2;5.2 Results;251
4.14.6;6 Summary;252
4.14.7;References;252
4.15;Density Estimation and Visualization for Data Containing Clusters of Unknown Structure;253
4.15.1;1 Introduction;253
4.15.2;2 Information optimal sets, Pareto Radius, PDE;254
4.15.3;3 PDE in one dimension: PDEplot;256
4.15.4;4 Measuring and visualization of density of high dimensional data;257
4.15.5;5 Summary;259
4.15.6;References;260
4.16;Hierarchical Mixture Models for Nested Data Structures;261
4.16.1;1 Introduction;261
4.16.2;2 Model formulation;262
4.16.2.1;2.1 Standard finite mixture model;262
4.16.2.2;2.2 Hierarchical finite mixture model;262
4.16.3;3 Maximum likelihood estimation by an adapted EM algorithm;264
4.16.4;4 An empirical example;265
4.16.5;5 Variants and extensions;267
4.16.6;References;268
4.17;Iterative Proportional Scaling Based on a Robust Start Estimator;269
4.17.1;1 Introduction;269
4.17.2;2 Covariance selection models;270
4.17.3;3 Iterative proportional scaling (IPS);271
4.17.4;4 IPS robustified;272
4.17.5;5 Model selection with RIPS;273
4.17.6;6 Open questions;276
4.17.7;References;276
4.18;Exploring Multivariate Data Structures with Local Principal Curves;277
4.18.1;1 Introduction;277
4.18.2;2 Local principal curves;278
4.18.3;3 Simulated data examples;281
4.18.4;5 Conclusion;283
4.18.5;References;283
4.19;A Three-way Multidimensional Scaling Approach to the Analysis of Judgments About Persons;285
4.19.1;1 Introduction;285
4.19.2;2 The structure of judgments about persons;285
4.19.3;3 SUMM-ID model;286
4.19.4;4 Application;290
4.19.5;5 Concluding remarks;291
4.19.6;References;292
4.20;Discovering Temporal Knowledge in Multivariate Time Series;293
4.20.1;1 Introduction;293
4.20.2;2 Data;294
4.20.3;3 Unification-based Temporal Grammar;294
4.20.4;4 Time Series Knowledge Mining;296
4.20.5;5 Discussion;298
4.20.6;6 Summary;299
4.20.7;Acknowledgements;299
4.20.8;References;300
4.21;A New Framework for Multidimensional Data Analysis;301
4.21.1;1 Information in data;301
4.21.2;2 Illustrative example;302
4.21.3;3 Geometric model for categorical data;304
4.21.4;4 Squared item-component correlation;304
4.21.5;5 Correlation between multidimensional variables;305
4.21.6;6 Decomposition of information in data and total information;306
4.21.7;7 Conclusion;307
4.21.8;References;308
4.22;External Analysis of Two-mode Three-way Asymmetric Multidimensional Scaling;309
4.22.1;1 Introduction;309
4.22.2;2 Themethod;310
4.22.3;3 An application;311
4.22.4;4 Discussion;314
4.22.5;References;316
4.23;The Relevance Vector Machine Under Covariate Measurement Error;317
4.23.1;1 Introduction;317
4.23.2;2 Nonparametric regression using the RVM;318
4.23.2.1;2.1 The RVM model setup;318
4.23.2.2;2.2 Inference;319
4.23.3;3 Covariate measurement error and its correction;320
4.23.3.1;3.1 The classical error model;320
4.23.3.2;3.2 Error correction using regression calibration;320
4.23.3.3;3.3 Error correction using SIMEX;321
4.23.3.4;3.4 Simulation results for the SIMEX;322
4.23.4;4 Discussion;323
4.23.5;Acknowledgements;324
4.23.6;References;324
5;Part III Applications;326
5.1;A Contribution to the History of Seriation in Archaeology;328
5.1.1;1 Introduction;328
5.1.2;2 The early years;328
5.1.3;3 Mathematicalmodels;329
5.1.4;4 The method of Brainerd and Robinson;330
5.1.5;5 Permutation search;331
5.1.6;6 Towards correspondence analysis;332
5.1.7;References;335
5.2;Model-based Cluster Analysis of Roman Bricks and Tiles from Worms and Rheinzabern;338
5.2.1;1 Introduction and task;338
5.2.2;2 Model-based Gaussian clustering;340
5.2.3;3 Results and archaeological discussion;342
5.2.4;4 Conclusion;345
5.2.5;References;345
5.3;Astronomical Object Classification and Parameter Estimation with the Gaia Galactic Survey Satellite;346
5.3.1;1 The Gaia Galactic survey mission;346
5.3.2;2 Astrophysical data;346
5.3.3;3 Classification challenges;347
5.3.4;4 Outlook;348
5.3.5;References;349
5.4;Design of Astronomical Filter Systems for Stellar Classification Using Evolutionary Algorithms;351
5.4.1;1 Astrophysical context;351
5.4.2;2 The optimization model;352
5.4.2.1;2.1 Parametrization;352
5.4.2.2;2.2 Figure-of-merit (fitness);353
5.4.2.3;2.3 Evolutionary algorithm;354
5.4.3;3 Application, results and interpretation;355
5.4.4;4 Conclusions and future work;358
5.4.5;References;358
5.5;Analyzing Microarray Data with the Generative Topographic Mapping Approach;359
5.5.1;1 Introduction;359
5.5.2;2 Data structure;360
5.5.3;3 The GTM approach;361
5.5.4;4 Application to a data set;363
5.5.5;5 Summary and outlook;365
5.5.6;References;366
5.6;Test for a Change Point in Bernoulli Trials with Dependence;367
5.6.1;1 Introduction;367
5.6.2;2 Test problem;368
5.6.3;3 Intercalary independence of Markov processes;370
5.6.4;4 Strategies for performing a test;371
5.6.5;5 Example;372
5.6.6;References;373
5.7;Data Mining in Protein Binding Cavities;375
5.7.1;1 Introduction;375
5.7.2;2 Other approaches;376
5.7.3;3 Theory and algorithm;377
5.7.4;4 First results;379
5.7.5;5 Conclusions;380
5.7.6;References;381
5.8;Classification of In Vivo Magnetic Resonance Spectra;383
5.8.1;1 Introduction;383
5.8.2;2 Data;384
5.8.2.1;2.1 General features;384
5.8.2.2;2.2 Details;384
5.8.3;3 Methods;385
5.8.3.1;3.1 Evaluated algorithms;386
5.8.3.2;3.2 Benchmark settings;387
5.8.4;4 Results;388
5.8.5;5 Conclusions;390
5.8.6;References;390
5.9;Modifying Microarray Analysis Methods for Categorical Data - SAM and PAM for SNPs;391
5.9.1;1 Introduction;391
5.9.2;2 Multiple testing and the false discovery rate;392
5.9.3;3 Significance analysis of microarrays;393
5.9.4;4 SAM applied to single nucleotide polymorphisms;394
5.9.5;5 Prediction analysis of microarrays;395
5.9.6;6 Prediction analysis of SNPs;396
5.9.7;7 Discussion;397
5.9.8;References;398
5.10;Improving the Identification of Differentially Expressed Genes in cDNA Microarray Experiments ;399
5.10.1;1 Introduction;399
5.10.2;2 Data sets, LogRatio, RelDi.;400
5.10.3;3 Comparison of LogRatio and RelDi.;401
5.10.4;4 Stabilization of variance;405
5.10.5;5 Summary;406
5.10.6;References;406
5.11;PhyNav: A Novel Approach to Reconstruct Large Phylogenies;407
5.11.1;1 Introduction;407
5.11.2;2 Minimal k-distance subsets;408
5.11.3;3 The PhyNav algorithm;409
5.11.4;4 The efficiency of PhyNav;409
5.11.4.1;4.1 Simulated datasets;410
5.11.4.2;4.2 Biological datasets;411
5.11.5;5 Discussion and conclusion;412
5.11.6;Acknowledgments;413
5.11.7;References;413
5.12;NewsRec, a Personal Recommendation System for News Websites;415
5.12.1;1 Introduction;415
5.12.2;2 Requirements, system design, and implementation details;417
5.12.3;3 Website classi.cation and evaluation measures;418
5.12.4;4 Empirical results;419
5.12.5;5 Conclusions and outlook;420
5.12.6;References;422
5.13;Clustering of Large Document Sets with Restricted Random Walks on Usage Histories;423
5.13.1;1 Motivation;423
5.13.2;2 Clustering with purchase histories;424
5.13.3;3 Time complexity;428
5.13.4;4 Results;428
5.13.5;5 Outlook;430
5.13.6;References;430
5.14;Fuzzy Two-mode Clustering vs. Collaborative Filtering;431
5.14.1;1 Introduction;431
5.14.2;2 Two-mode data analysis;432
5.14.2.1;2.1 Memory-based Collaborative Filtering (CF);432
5.14.2.2;2.2 (Fuzzy) Two-Mode Clustering (FTMC);433
5.14.3;3 The Delta-Method for fuzzy two-mode clustering;434
5.14.4;4 Examples and comparisons;435
5.14.5;5 Conclusions;437
5.14.6;References;437
5.15;Web Mining and Online Visibility;439
5.15.1;1 Introduction - Why measurement of online visibility? ;439
5.15.2;2 (Human) Online search in a changing webgraph;439
5.15.2.1;2.1 The web as a graph;440
5.15.2.2;2.2 (Human) Online searching and sur.ng behavior;441
5.15.3;3 Measurement of Online Visibility;441
5.15.3.1;3.1 Main drivers of Online Visibility;442
5.15.3.2;3.2 Web data used for our sample;442
5.15.3.3;3.3 The measure GOVis;443
5.15.3.4;3.4 Results;444
5.15.4;4 Conclusion and managerial implications;445
5.15.5;References;446
5.16;Analysis of Recommender System Usage by Multidimensional Scaling;447
5.16.1;1 Introduction;447
5.16.2;2 Methodology;448
5.16.3;3 Empirical results;449
5.16.3.1;3.1 The data sets;449
5.16.3.2;3.2 Representation of products and search profiles;450
5.16.3.3;3.3 Analysis of system usage;451
5.16.4;4 Summary;453
5.16.5;References;454
5.17;On a Combination of Convex Risk Minimization Methods;455
5.17.1;1 Introduction;455
5.17.2;2 Strategy;455
5.17.3;3 Kernel logistic regression and e.support vector regression;458
5.17.4;4 Application;460
5.17.5;5 Discussion;462
5.17.6;Acknowledgments;462
5.17.7;References;462
5.18;Credit Scoring Using Global and Local Statistical Models;463
5.18.1;1 Introduction;463
5.18.2;2 Description of the data set;464
5.18.3;3 Global scoring model;464
5.18.3.1;3.1 Global scoring using logistic discriminant analysis;464
5.18.3.2;3.2 Classification rule under constraints;465
5.18.4;4 Local scoring by two-stage classification;466
5.18.4.1;4.1 Clustering using self-organizing maps;467
5.18.4.2;4.2 K-means cluster analysis;468
5.18.4.3;4.3 Evaluation of two-stage classi.cation;468
5.18.5;5 Application to the test sample;469
5.18.6;6 Conclusions;470
5.18.7;References;470
5.19;Informative Patterns for Credit Scoring: Support Vector Machines Preselect Data Subsets for Linear Discriminant Analysis;471
5.19.1;1 Introduction;471
5.19.2;2 LinearSVMandLDA;472
5.19.3;3 Subset preselection for LDA: Empirical results;475
5.19.3.1;3.1 About typical and critical subsets;475
5.19.3.2;3.2 LDA with subset preselection;476
5.19.3.3;3.3 Comparing SVM, LDA and LDA-SP;476
5.19.3.4;3.4 Advantages of LDA with subset preselection;477
5.19.4;4 Conclusions;477
5.19.5;References;478
5.20;Application of Support Vector Machines in a Life Assurance Environment;479
5.20.1;1 Introduction;479
5.20.2;2 Support vector machines;480
5.20.3;3 Problem context and the data;481
5.20.4;4 A measure of variable importance;482
5.20.5;5 Results;484
5.20.6;References;486
5.21;Continuous Market Risk Budgeting in Financial Institutions;487
5.21.1;1 Introduction;487
5.21.2;2 Analysis framework;488
5.21.3;3 Time dimension of risk limits;489
5.21.4;4 Continuous risk budgeting;490
5.21.5;5 Simulation analysis;492
5.21.6;Acknowledgement;493
5.21.7;References;494
5.22;Smooth Correlation Estimation with Application to Portfolio Credit Risk;495
5.22.1;1 Introduction;495
5.22.2;2 The sector variable;496
5.22.3;3 Testing for independence;497
5.22.4;4 Model generation;498
5.22.5;5 A one-factor model;499
5.22.6;6 Algebraic approximation;500
5.22.7;7 Impact on the practical performance;501
5.22.8;References;501
5.22.9;A Appendix;502
5.23;How Many Lexical-semantic Relations are Necessary?;503
5.23.1;1 Introduction;503
5.23.2;2 Concept calculus;504
5.23.3;3 Diagrammatic representation;506
5.23.4;4 Concept and linguistic sign;509
5.23.5;5 Summary;510
5.23.6;References;510
5.24;Automated Detection of Morphemes Using Distributional Measurements;511
5.24.1;1 Overview and introduction;511
5.24.2;2 Why bother with the segmentation of words at all?;512
5.24.3;3 The historical background of research: Distributional analysis;512
5.24.4;4 Basicmethod;513
5.24.5;5 Re.nements of the evaluation;515
5.24.6;6 Transferring graphemic to phonemic representation;516
5.24.7;7 Concluding remarks;517
5.24.8;References;518
5.25;Classification of Author and/or Genre? The Impact of Word Length;519
5.25.1;1 Word length and the quantitative description of text(s) and author(s);519
5.25.2;2 A case study: text basis and analytical options;520
5.25.3;3 Methods of text discrimination;521
5.25.3.1;3.1 Quantitative measures for text analysis;522
5.25.3.2;3.2 Discriminant analysis;523
5.25.3.3;3.3 Statistical distance as a measure for data discrimination;523
5.25.4;4 Summary;526
5.25.5;References;526
5.26;Some Historical Remarks on Library Classification - a Short Introduction to the Science of Library Classification;527
5.26.1;1 Introduction;527
5.26.2;2 Classified arrangement in monastery libraries of the Middle Ages;528
5.26.3;3 Classified arrangement in private libraries of the Middle Ages;528
5.26.4;4 Classified arrangement in the late Middle Ages and at the beginning of modern times;529
5.26.5;6 Systematic cataloguing in the 18th century;530
5.26.6;7 Subject cataloguing in the 19th century;530
5.26.7;8 Subject cataloguing in the 20th century;531
5.26.8;References;532
5.27;Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry;534
5.27.1;1 Introduction;534
5.27.2;2 Pair-wise data clustering;535
5.27.3;3 Resampling techniques based on weights of observations;536
5.27.4;4 Rand s measure for comparing partitions;536
5.27.5;5 A simulation study;538
5.27.6;6 Application in quantitative linguistics;539
5.27.7;7 Conclusions;540
5.27.8;References;541
5.28;Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts;542
5.28.1;1 Introduction;542
5.28.2;2 Approach;543
5.28.3;3 Algorithm;544
5.28.4;4 Results;546
5.28.5;5 Conclusions and prospects;548
5.28.6;Acknowledgements;549
5.28.7;References;549
5.29;Document Management and the Development of Information Spaces;550
5.29.1;1 Starting point and task;550
5.29.2;2 Implementation;550
5.29.3;3 Representation of the information space;551
5.29.4;4 Processing .ow text;551
5.29.5;5 Processing partially structured documents;554
5.29.6;6 Summary and outlook;556
5.29.7;References;557
5.30;Stochastic Ranking and the Volatility Croissant : A Sensitivity Analysis of Economic Rankings;558
5.30.1;1 Introduction;558
5.30.2;2 Index definition and ranking;559
5.30.3;3 Data;561
5.30.4;4 Sensitivity analysis by randomised weights;562
5.30.5;5 Ranking results;563
5.30.6;6 Conclusions;565
5.30.7;References;565
5.31;Importance Assessment of Correlated Predictors in Business Cycles Classi.cation;566
5.31.1;1 Problem;566
5.31.1.1;1.1 Introduction;566
5.31.1.2;1.2 Measures of importance;567
5.31.2;2 Correlated predictors in regression models;567
5.31.2.1;2.1 Overview;567
5.31.2.2;2.2 Orthogonalization;568
5.31.3;3 Correlated predictors in classi.cation models;569
5.31.3.1;3.1 Orthogonalization;569
5.31.3.2;3.2 Using a large number of variables;569
5.31.3.3;3.3 Results for the business cycle model;570
5.31.4;4 Discussion and outlook;571
5.31.5;References;573
5.32;Economic Freedom in the 25-Member European Union: Insights Using Classi.cation Tools;574
5.32.1;1 Introduction;574
5.32.2;2 Data description and distance measures;575
5.32.2.1;2.1 Description of the economic freedom index data;575
5.32.2.2;2.2 Distance measures;576
5.32.3;3 Cluster analysis methods and cluster patterns;578
5.32.3.1;3.1 Cluster analysis methods;578
5.32.3.2;3.2 Empirical cluster patterns;579
5.32.4;4 Conclusion and outlook;581
5.32.5;References;581
5.33;Intercultural Consumer Classifications in E-Commerce;582
5.33.1;1 Introduction;582
5.33.2;2 The concept of construction consumer typologies;582
5.33.3;3 Characteristics for constructing typologies relevant for E-Commerce;583
5.33.3.1;3.1 Requirements regarding criteria used for constructing typologies;583
5.33.3.2;3.2 Selected constructs for a classi.cation;583
5.33.4;4 Empirical survey of the typology theory;584
5.33.4.1;4.1 Survey design and data collection;584
5.33.4.2;4.2 A typology of online customers;585
5.33.5;5 Conclusion;588
5.33.6;References;588
5.34;Reservation Price Estimation by Adaptive Conjoint Analysis;590
5.34.1;1 Introduction;590
5.34.2;2 Conjoint analysis for reservation price estimation;591
5.34.3;3 Reservation price estimation based on economic theory;592
5.34.4;4 Application of the method;595
5.34.5;5 Conclusion and further research;596
5.34.6;References;597
5.35;Estimating Reservation Prices for Product Bundles Based on Paired Comparison Data;598
5.35.1;1 Introduction;598
5.35.2;2 Gathering data for conjoint measurement;599
5.35.2.1;2.1 Direct vs. indirect elicitation of reservation prices;599
5.35.2.2;2.2 Relative direct elicitation of reservation prices;600
5.35.3;3 Study design and application situation;601
5.35.4;4 Results;602
5.35.5;5 Discussion;604
5.35.6;References;604
5.36;Classification of Perceived Musical Intervals;606
5.36.1;1 Background;606
5.36.2;2 Experimental setting;608
5.36.3;3 Results;610
5.36.4;4 Conclusion;612
5.36.5;References;613
5.37;In Search of Variables Distinguishing Low and High Achievers in Music Sight Reading Task;614
5.37.1;1 Background;614
5.37.2;2 Method;615
5.37.3;3 Results;617
5.37.4;4 Discussion;619
5.37.5;References;620
5.38;Automatic Feature Extraction from Large Time Series;621
5.38.1;1 Introduction;621
5.38.2;2 Systematization of statistical methods;622
5.38.2.1;2.1 Windowing extends the method space;622
5.38.2.2;2.2 Method trees for feature extraction;623
5.38.2.3;2.3 Dynamic windowing in method trees;624
5.38.3;3 Automatic feature extraction;625
5.38.4;4 Experiments;626
5.38.4.1;4.1 Results;627
5.38.5;5 Conclusion;627
5.38.6;References;628
5.39;Identification of Musical Instruments by Means of the Hough-Transformation;629
5.39.1;1 The Hough-transform;629
5.39.2;2 Application to sound data;630
5.39.2.1;2.1 Digital sounds;630
5.39.2.2;2.2 Motivation: signal edges;630
5.39.2.3;2.3 Parametrization;631
5.39.2.4;2.4 Resulting data format;631
5.39.3;3 Classification;632
5.39.3.1;3.1 Approaches;632
5.39.3.2;3.2 Data set;633
5.39.3.3;3.3 Methods;633
5.39.3.4;3.4 Variable selection;634
5.39.3.5;3.5 Results;634
5.39.3.6;3.6 Comparing the results;635
5.39.4;4 Conclusions;636
5.39.5;References;636
5.40;Support Vector Machines for Bass and Snare Drum Recognition;637
5.40.1;1 Introduction;637
5.40.2;2 Previous work;638
5.40.3;3 Data gathering;639
5.40.4;4 Descriptors for audio;640
5.40.5;5 Support Vector Machines;641
5.40.6;6 Experiments and results;642
5.40.7;7 Conclusions and future work;643
5.40.8;Acknowledgements;644
5.40.9;References;644
5.41;Register Classification by Timbre;645
5.41.1;1 Introduction;645
5.41.2;2 Data;646
5.41.3;3 Classification methods;647
5.41.4;4 Results;648
5.41.4.1;4.1 Individual tones, voices only;648
5.41.4.2;4.2 Individual tones, voices and instruments;649
5.41.4.3;4.3 Averaged tones, voices only;649
5.41.4.4;4.4 Averaged tones, voices and instruments;649
5.41.5;5 Acoustics;650
5.41.6;6 Conclusion;651
5.41.7;References;652
5.42;Classification of Processes by the Lyapunov Exponent;653
5.42.1;1 Introduction;653
5.42.2;2 Lyapunov exponent;654
5.42.3;3 Well-predictable and not-well-predictable processes;656
5.42.4;4 Experimental results;658
5.42.5;5 Conclusion;659
5.42.6;References;659
5.43;Desirability to Characterize Process Capability;661
5.43.1;1 Introduction;661
5.43.2;2 Combining capability and desirability - the indices EDU and EDM;663
5.43.3;3 Discussion;665
5.43.4;4 Estimation;666
5.43.5;5 Simulation;667
5.43.6;6 Conclusion;668
5.43.7;References;668
5.44;Application and Use of Multivariate Control Charts in a BTA Deep Hole Drilling Process;669
5.44.1;1 Introduction;669
5.44.2;2 Monitoring the process using multiple Residual Shewhart control charts;670
5.44.3;3 Monitoring the process using multivariate control charts;671
5.44.3.1;3.1 Data depth;671
5.44.3.2;3.2 A control chart based on sequential rank of data depth measures;672
5.44.4;4 Application;673
5.44.4.1;4.1 Choice of the control charts parameters;673
5.44.4.2;4.2 Results;674
5.44.4.3;4.3 Discussion;675
5.44.5;5 Conclusion;675
5.44.6;Acknowledgements;676
5.44.7;References;676
5.45;Determination of Relevant Frequencies and Modeling Varying Amplitudes of Harmonic Processes;677
5.45.1;1 Introduction;677
5.45.2;2 Determination of the distribution of periodogram ordinates;678
5.45.3;3 Regression models on periodogram ordinates;679
5.45.3.1;3.1 Modelling varying amplitudes;679
5.45.3.2;3.2 Estimating the variance of e (s2 e );680
5.45.4;4 Simulation study on time-varying amplitudes;680
5.45.4.1;4.1 Design considerations;680
5.45.4.2;4.2 Results;681
5.45.5;5 Conclusions;684
5.45.6;References;684
6;Part IV Contest: Social Milieus in Dortmund;686
6.1;Introduction to the Contest Social Milieus in Dortmund ;688
6.1.1;1 Contest goal and data;688
6.2;Application of a Genetic Algorithm to Variable Selection in Fuzzy Clustering;695
6.2.1;1 The problem;695
6.2.2;2 Tackling the problem;695
6.2.3;3 Methods;696
6.2.3.1;3.1 Fuzzy clustering;696
6.2.3.2;3.2 Measuring the clustering quality;697
6.2.3.3;3.3 Defining subgroups of variables;697
6.2.3.4;3.4 Genetic optimization algorithms;698
6.2.3.5;3.5 Implementation;699
6.2.4;4 Applying the procedure;699
6.2.4.1;4.1 The Dortmund data;699
6.2.4.2;4.2 Results;700
6.2.4.3;4.3 Comparing the results;701
6.2.5;5 Summary;702
6.2.6;References;702
6.3;Annealed k-Means Clustering and Decision Trees;703
6.3.1;1 Introduction;703
6.3.2;2 Preprocessing;704
6.3.3;3 Clustering;704
6.3.3.1;3.1 Annealed k-means;704
6.3.3.2;3.2 Learning about k;705
6.3.3.3;3.3 Solution;706
6.3.4;4 Classification;706
6.3.5;5 Interpretation;708
6.3.6;6 Outlook;709
6.3.7;References;710
6.4;Correspondence Clustering of Dortmund City Districts;711
6.4.1;1 Introduction;711
6.4.2;2 Material and methods;712
6.4.3;3 Results;715
6.4.4;4 Conclusion;718
6.4.5;References;718
6.4.6;Keywords;719
6.4.7;Authors;724
mehr
Leseprobe
Quantitative Assessment of the Responsibility for the Disease Load in a Population (p. 109-110)

Wolfgang Uter and Olaf Gefeller

Department of Medical Informatics, Biometry and Epidemiology,
University of Erlangen Nuremberg, Germany

Abstract. The concept of attributable risk (AR), introduced more than 50 years ago, quantifies the proportion of cases diseased due to a certain exposure (risk) factor. While valid approaches to the estimation of crude or adjusted AR exist, a problem remains concerning the attribution of AR to each of a set of several exposure factors. Inspired by mathematical game theory, namely, the axioms of fairness and the Shapley value, introduced by Shapley in 1953, the concept of partial AR has been developed. The partial AR offers a unique solution for allocating shares of AR to a number of exposure factors of interest, as illustrated by data from the German G¨ottingen Risk, Incidence, and Prevalence Study (G.R.I.P.S.).


1 Introduction

Analytical epidemiological studies aim at providing quantitative information on the association between a certain exposure, or several exposures, and some disease outcome of interest. Usually, the disease etiology under study is multifactorial, so that several exposure factors have to be considered simultaneously. The effect of a particular exposure factor on the dichotomous disease variable is quantified by some measure of association, including the relative risk (RR) or the odds ratio (OR), which will be explained in the next section.

While these measures indicate by which factor the disease risk increases if a certain exposure factor is present in an individual, the concept of attributable risk (AR) addresses the impact of an exposure on the overall disease load in the population. This paper focusses on the AR, which can be informally introduced as the answer to the question, "what proportion of the observed cases of disease in the study population suffers from the disease due to the exposure of interest?". In providing this information the AR places the concept of RR commonly used in epidemiology in a public health perspective, namely by providing an answer also to the reciprocal question, "what proportion of cases of disease could - theoretically - be prevented if the exposure factor could be entirely removed by some adequate preventive action?". Since its introduction in 1953 (Levin (1953)), the concept of AR is increasingly being used by epidemiological researchers.

However, while the One of the diffculties in applying the concept of AR is the question of how to adequately estimate the AR associated with several exposure factors of interest, and not just one single exposure factor. The present paper briefly introduces the concept of sequential attributable risk (SAR) and then focusses on the partial attributable risk (PAR), following an axiomatic approach founded on game theory. For illustrative purposes, data from a German cohort study on risk factors for myocardial infarction are used. methodology of this invaluable epidemiological measure has constantly been extended to cover a variety of epidemiological situations, its practical use has not followed these advances satisfactorily (reviewed by Uter and Pfahlberg (1999)).
mehr

Autor