Hugendubel.info - Die B2B Online-Buchhandlung 

Merkliste
Die Merkliste ist leer.
Bitte warten - die Druckansicht der Seite wird vorbereitet.
Der Druckdialog öffnet sich, sobald die Seite vollständig geladen wurde.
Sollte die Druckvorschau unvollständig sein, bitte schliessen und "Erneut drucken" wählen.

Statistical Hypothesis Testing with SAS and R

E-BookEPUB2 - DRM Adobe / EPUBE-Book
312 Seiten
Englisch
John Wiley & Sonserschienen am09.01.20141. Auflage
This book provides a reference guide to statistical tests and their application to data using SAS and R. A general summary of statistical test theory is presented, along with a general description for each test, together with necessary prerequisites, assumptions, and the formal test problem. The test statistic is stated together with annotations on its distribution, along with examples in both SAS and R. Each example contains the code to perform the test, the output, and remarks that explain necessary program parameters.mehr
Verfügbare Formate
BuchGebunden
EUR110,50
E-BookEPUB2 - DRM Adobe / EPUBE-Book
EUR73,99
E-BookPDF2 - DRM Adobe / Adobe Ebook ReaderE-Book
EUR73,99

Produkt

KlappentextThis book provides a reference guide to statistical tests and their application to data using SAS and R. A general summary of statistical test theory is presented, along with a general description for each test, together with necessary prerequisites, assumptions, and the formal test problem. The test statistic is stated together with annotations on its distribution, along with examples in both SAS and R. Each example contains the code to perform the test, the output, and remarks that explain necessary program parameters.
Details
Weitere ISBN/GTIN9781118762615
ProduktartE-Book
EinbandartE-Book
FormatEPUB
Format Hinweis2 - DRM Adobe / EPUB
FormatFormat mit automatischem Seitenumbruch (reflowable)
Erscheinungsjahr2014
Erscheinungsdatum09.01.2014
Auflage1. Auflage
Seiten312 Seiten
SpracheEnglisch
Dateigrösse10901 Kbytes
Artikel-Nr.2951246
Rubriken
Genre9201

Inhalt/Kritik

Leseprobe
Chapter 1
Statistical hypothesis testing
1.1 Theory of statistical hypothesis testing

Hypothesis testing is a key tool in statistical inference next to point estimation and confidence sets. All three concepts make an inference about a population based on a sample taken from it. Hypothesis testing aims at a decision on whether or not a hypothesis on the nature of the population is supported by the sample.

In the following we shortly run through the steps of a statistical test procedure and introduce the notation used throughout this book. For a detailed mathematical explanation please refer to the book by Lehmann (1997).

We denote a sample of size by , where the are observations of identically independently distributed random variables , . Usually some further assumptions are needed concerning the nature of the mechanism generating the sample. These can be rather general assumptions like a symmetric continuous distribution. Often a parametric distribution is assumed with only parameter values unknown, for example, the Gaussian distribution with both or either unknown mean and variance. In this case hypothesis tests deal with statements on the unknown population parameters. We exemplify our general discussion by this situation.

Each of the statistical tests presented in the following chapters is introduced by a verbal description of the type of conjecture to be decided upon together with the made assumptions. Next the test problem is formalized by the null hypothesis and the alternative hypothesis . If a statement on population parameters is of interest, often the parameter space , is partitioned into disjunct sets and with , corresponding to and , respectively.

As the next building stone of a statistical test the test statistic, which is a function of the random sample, is stated. This function fulfills two criteria. First of all its value must provide insight on whether or not the null hypothesis might be true. Next the distribution of the test statistic must be known, given that the null hypothesis is true. Table 1.1 shows the four possible outcomes of a statistical test. In two of the cases the result of the test is a correct decision. Namely, a true null hypothesis is not rejected and a false null hypothesis is rejected. If the null hypothesis is true but is rejected as a result of the test, a type I error occurs. In the opposite situation that is true in nature but the test does not reject the null hypothesis, a type II error occurs.

Table 1.1 Possible results in statistical testing.

Generally, unless sample size or hypothesis are changed, a decrease in the probability of a type I error causes an increase in the probability for a type II error and vice versa. With the significance level the maximal probability of the appearance of a type I error is fixed and the critical region of the test is chosen according to this condition. If the observed value of the test statistic lies in the critical region, the null hypothesis is rejected. Hence, the error probability is under control when a decision is made against but not when the decision is for , which needs to be kept in mind while drawing conclusions from test results. If possible, the researcher's conjecture corresponds to the alternative hypothesis due to primarily controlling the type I error. However, in goodness-of-fit tests one is forced to formulate the researcher's hypothesis, that is, the specific distribution of interest, as null hypothesis as it is otherwise usually unfeasible to derive the distribution of the test statistic.

The power function measures the quality of a test. It yields the probability of rejecting the hypothesis for a given true parameter value . The test with the greatest power among all tests with a given significance level is called the most powerful test.

Traditionally a pre-specified significance level of or is selected. However, there is no reason why a different value should not be chosen.

Up to here we are in the context of the Neyman-Pearson test theory. Most statistical computer programs are not returning whether the calculated test statistic lies within the critical region or not. Instead the p-value (probability-value) is given. This is the probability to obtain the observed value of the test statistic or a value that is more extreme in the direction of the alternative hypothesis calculated when is true. If the p-value is smaller than it follows that is rejected, otherwise is not rejected.

As already mentioned in the introduction this is the common approach. For further reading on the differences please refer to Goodman (1994), Hubbard and Bayarri (2003), Johnstone (1987), and Lehmann (1993).
1.2 Testing statistical hypothesis with SAS and R

Testing statistical hypotheses with SAS and R is very convenient. A lot of tests are already integrated in these software packages. In SAS tests are invoked via procedures while R uses functions. Although many test problems are handled in this way situations may occur where a SAS procedure or a R function is not available. Reasons are manifold. The SAS Institute decides which statistical test to include in SAS. Even if a newly developed test is accepted for inclusion in SAS it takes some time to develop a new procedure or to incorporate it in an existing SASprocedure. If a test is not implemented in a SAS procedure or in the R standard packages the likelihood is high to find the test as a SAS macro or in R user packages which are available through the World Wide Web. However, in this book we have refrained from presenting tests from SAS macros or R user packages for several reasons. We do not know how long macros, program code, or user packages are supported by the programmer and are therefore available for newer versions of SAS or R. In addition it is not possible to trace if the code is correct. If a statistical test is not implemented in the SAS software as procedure or in the R standard packages we will provide an algorithm with small SAS and R code to circumvent these problems. All presented statistical tests are accompanied by an example of their use in a given dataset. So it is easy to retrace the example and to translate the code to your own datasets. Sometimes more than one SAS procedure or R function is available to perform a statistical test. We only present one way to do so.
1.2.1 Programming philosophy of SAS and R

Testing statistical hypothesis in SAS or R is not the same, while R is a matrix language orientated software, SAS follows a different philosophy (except for SAS/IML). With a matrix orientated language some calculations are easier. For instance the average of a few observations, for example, the age and of four children in a family, can be calculated with one line of code in R by applying the function mean() to the vector containing the values, c(1,4,2,5).
mean(c(1,4,2,5))
Here the numeric vector of data values to be analyzed is inserted directly in the R function. However, it is also possible to call data from a previously defined object, for example, a dataframe
childrenchildren holds the variable age with observed values and . The SAS procedure proc means calculates the mean value. This type of programming philosophy must not be a disadvantage. It can save a lot of time, because the SAS procedures are very powerful and incorporate many statistical calculations in one go.

We assume that the reader is familiar with the basic programming features of SAS or R, such as data input and output, and only remark on some important points related to conducting statistical tests. Concerning data format usually one entry per observation and a column for each variable are suitable. However, in some cases it may be required to reorganize the dataset for test procedures. We accompany our examples with small datasets (see Appendix A), such that it is easy to see how data need to be arranged for the specific test.

In SAS most statistical tests are performed with procedures, which usually follow the schema:
proc proc-name data=dataset-name options; var variable-names options; options; run;
The data= statement identifies the dataset to be analyzed. If missing, the most recent dataset is taken. In some procedures it is necessary to fix some options to set up the statistical test, for example, to define the value to test against, or if the test is one or two sided. The var statement is followed by the variables on which the test shall be performed. Sometimes further options can be stated in separate command lines, for instance requesting an exact test. Note, some procedures differ from this general set-up. The procedure proc freq as an example has no var but a table statement. Occasionally the statement class class-variable is needed indicating a grouping variable which assigns each observation to a specific group. As options of procedures can be numerous and not all of them may be needed for the treated test, we restrict our exposure to the indispensable options. The same applies to the output we present for the examples.

Conducting a statistical test in the program R usually only requires one line of code. The common layout of R functions...
mehr

Autor

Dirk Taeger, Institute for Prevention and Occupational
Medicine of the German Social

Accident Insurance, Institute of the Ruhr-Universität
Bochum (IPA), Bochum, Germany

Sonja Kuhnt, Department of Computer Science, Dortmund
University of Applied Sciences

and Arts, Dortmund, Germany