Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386564315> ?p ?o ?g. }
Showing items 1 to 46 of
46
with 100 items per page.
- W4386564315 endingPage "922" @default.
- W4386564315 startingPage "920" @default.
- W4386564315 abstract "The distinction between the two uses of p-values described by Professor Greenland is related to two distinct interpretations of frequentist probability—that is, probability used to describe a random event. I will illustrate with a simple example. In the North Carolina Pick-4 lottery, 10 ping pong balls labeled with distinct digits from I 9 = 0 , 1 , … , 9 $$ {I}_9=left{0,1,dots, 9right} $$ are mixed in a clear container and opening a door allows a single ball to be selected. Prior to opening the door, blown air mixes the balls making equally likely selection of each ball plausible. This is repeated with three identical containers to obtain the remaining three digits. If a winning ticket is defined as one where the sum of the four digits exceeds 28, the state can charge $5 for a ticket with a $100 prize and expect a profit. There are 330 of 1 0 4 $$ 1{0}^4 $$ possible outcomes where the sum exceeds 28 so the expected value is 0 . 033 × $ 100 = $ 3 . 30 $$ 0.033times $100=$3.30 $$ . This calculation requires no repeated sampling but it is natural for the state to interpret this value in the long run. For an individual ticket holder, all that is required is that each ball is given an equal chance to be selected for the drawing associated with his ticket. The ticket holder does not need to imagine a long sequence of draws just as a cancer patient does not need to consider a long sequence of 5-year periods to understand a 30% 5-year survival. Using terminology from Vos and Holbert (2022), the scope for the ticket holder is specific while that of the state is generic. The uniform distribution on 4-tuples I 9 4 = I 9 × I 9 × I 9 × I 9 $$ {I}_9^4={I}_9times {I}_9times {I}_9times {I}_9 $$ provides a model for repeated draws of the Pick-4 lottery, that is, of the data generation process. For most inference applications, the distribution of an unknown population can be modeled rather than the process that generated the data. We modify this example to consider inference. We are told the sum of a single lottery draw and we are to infer whether the draw came from the NC lottery or lottery A that also has four containers but each contains 8 balls with labels from I 7 = 0 , 1 , … , 7 $$ {I}_7=left{0,1,dots, 7right} $$ . The sum of the digits is 29 but no other information is given. A reduction-to-contradiction argument establishes that the result came from the NC lottery. Premise: lottery A produced our data; every possible sum from lottery A belongs to the set 0 , 1 , … , 28 $$ left{0,1,dots, 28right} $$ ; 29 is not in this set; conclusion: the contradiction means it is impossible that the premise is true. The deductive argument used for a sum of 29 does not work if the sum is 28. Logical certainty is no longer possible but sums of 28 or less still provide evidence, to varying degrees, regarding which lottery was used. A reduction-to-incredibility argument modeled on the above deduction can be used. Premise: lottery A produced the sum of 28; of the 8 4 $$ {8}^4 $$ possible 4-tuples only one produces a sum as large as 28; each 4-tuple had an equal chance of being selected; the probability of a sum of 28 is 1 / 8 4 < 0 . 00025 $$ 1/{8}^4<0.00025 $$ ; conclusion: the unlikely observation makes it doubtful that the premise is true. An important distinction from the deductive argument is the second step regarding all possible outcomes being equally likely. Without this we can say 28 is in the upper 0.025 percentile of the sampling distribution of 4-tuples ordered by their sum, but we cannot say the probability is less than 0.00025. In contrast, the deductive argument is valid even if the balls are hand-picked; randomization plays no role. In the conclusion of the inductive argument, the word “unlikely” refers to the stochastic probability of obtaining a sum of 28 while “doubtful” describes a degree-of-belief regarding the lottery that was used. While these are related quantities—observations that are less likely to have occurred would create greater doubt—failure to understand these as distinct can lead to confusion, especially when the numeric value of the stochastic probability, 1 / 8 4 $$ 1/{8}^4 $$ , is used to assign a numeric measure of one's doubt in the absence of any other information regarding the two lotteries. The p-value, 1 / 8 4 $$ 1/{8}^4 $$ , is obtained from a measurable function and so, by definition, is a random variable. All p-values are measurable functions and so all p-values are random variables. However, the adjective random describes only one use for this measurable function, namely to model a random process. Random variables also provide distributions that are relevant to the inference question. Although randomization plays no role in the definition of these distributions, their relevance to inference does depend on how the observed sample was obtained from the population. As a random process, 1 / 8 4 $$ 1/{8}^4 $$ is the limiting relative frequency of draws from lottery A that result in a sum of 28. Using the random process interpretation means we have to create a hypothetical process by imagining repeated draws from lottery A when, in fact, the actual sample may have come from the NC lottery. That is, the hypothetical samples do not come from the population, as the actual sample did, but from a model for the population. This distinction between population and model for the population is especially important when the model is infinite. A more realistic example is inference for a dichotomous attribute of a population, say, high blood pressure (BP). The population distribution is the ordered pair of relative frequencies associated with the two attributes, ( 1 − p pop , p pop ) $$ left(1-{p}_{mathrm{pop}},{p}_{mathrm{pop}}right) $$ where p pop $$ {p}_{mathrm{pop}} $$ is the unknown proportion with high BP. The Bernoulli family of distributions, ( 1 − p , p ) , 0 < p < 1 $$ left{left(1-p,pright),0<p<1right} $$ , provide models for the population distribution. If the support for the Bernoulli family is 0 , 1 $$ left{0,1right} $$ , then the n $$ n $$ -fold convolution of ( 1 − p , p ) $$ left(1-p,pright) $$ is the binomial distribution B ( n , p ) $$ Bleft(n,pright) $$ placing mass n y p y ( 1 − p ) n − y $$ left(genfrac{}{}{0ex}{}{n}{y}right){p}^y{left(1-pright)}^{n-y} $$ on sum y $$ y $$ . For rational p $$ p $$ the binomial distribution can be obtained by considering all possible samples (with replacement) of size n $$ n $$ and calculating the relative frequency for each sum.1 When each sample is equally likely these relative frequencies are probabilities. Convolution extends this relationship between Bernoulli and binomial distributions to the case where p $$ p $$ is any real number in the unit interval. The key here is that Bernoulli distributions and their relationship with the binomial family are part of mathematics and don't involve randomization. If B ( 1 , p ) $$ Bleft(1,pright) $$ provides a good model for the population, then so does B ( n , p ) $$ Bleft(n,pright) $$ for the sampling distribution of the sum. However, if the observation is in the extreme tail of B ( n , p $$ BBig(n,p $$ ) this does not necessarily mean the observation should be considered unlikely. As we saw in the lottery example, a sum of 28 is in the extreme tail of the sampling distribution but it is the fact that every 4-tuple had an equal chance of selection that makes the tail area equal to the probability. The same is true for the binomial example, and for inference in general. Now that two models are replaced with a family of models, inference involves a continuum of null hypotheses and so, a continuum of reduction-to-incredibility arguments. The p-value associated with each argument is a tail area that describes how extreme the sample is as a point in the sampling distribution obtained from the premise. The conclusion requires a probability, and a single random sample from the population justifies that each of the tail areas in this continuum is a probability. Interpreting a p-value as describing a random process, as one is inclined to do when it is labeled a random variable, is problematic when we have a continuum of hypotheses. The hypothetical samples come from a model for the population, and when this model is infinite it is not clear what it means to give every sample an equal chance of being selected. I recognize there are other interpretations for the hypothetical random process associated with the p-value. In fact, it is the plethora of such interpretations that can make p-values confusing. My final comment concerns geometry. Generalized estimators which are related to Godambe's (1960) estimating functions are useful for studying inferential properties of different methods for defining p-values. These generalized estimators form a vector bundle over a statistical manifold and their statistical properties are described by the relationship with the tangent bundle. Details are in Vos (2022)." @default.
- W4386564315 created "2023-09-10" @default.
- W4386564315 creator A5052104524 @default.
- W4386564315 date "2023-05-13" @default.
- W4386564315 modified "2023-10-16" @default.
- W4386564315 title "Comments on <i>Divergence vs. Decision P‐values</i>" @default.
- W4386564315 cites W2005986688 @default.
- W4386564315 doi "https://doi.org/10.1111/sjos.12647" @default.
- W4386564315 hasPublicationYear "2023" @default.
- W4386564315 type Work @default.
- W4386564315 citedByCount "0" @default.
- W4386564315 crossrefType "journal-article" @default.
- W4386564315 hasAuthorship W4386564315A5052104524 @default.
- W4386564315 hasBestOaLocation W43865643151 @default.
- W4386564315 hasConcept C105795698 @default.
- W4386564315 hasConcept C138885662 @default.
- W4386564315 hasConcept C149782125 @default.
- W4386564315 hasConcept C207390915 @default.
- W4386564315 hasConcept C33923547 @default.
- W4386564315 hasConcept C41895202 @default.
- W4386564315 hasConceptScore W4386564315C105795698 @default.
- W4386564315 hasConceptScore W4386564315C138885662 @default.
- W4386564315 hasConceptScore W4386564315C149782125 @default.
- W4386564315 hasConceptScore W4386564315C207390915 @default.
- W4386564315 hasConceptScore W4386564315C33923547 @default.
- W4386564315 hasConceptScore W4386564315C41895202 @default.
- W4386564315 hasIssue "3" @default.
- W4386564315 hasLocation W43865643151 @default.
- W4386564315 hasOpenAccess W4386564315 @default.
- W4386564315 hasPrimaryLocation W43865643151 @default.
- W4386564315 hasRelatedWork W187846026 @default.
- W4386564315 hasRelatedWork W2001800417 @default.
- W4386564315 hasRelatedWork W2023004453 @default.
- W4386564315 hasRelatedWork W2024955095 @default.
- W4386564315 hasRelatedWork W2043852408 @default.
- W4386564315 hasRelatedWork W2110693695 @default.
- W4386564315 hasRelatedWork W2119158312 @default.
- W4386564315 hasRelatedWork W2343708061 @default.
- W4386564315 hasRelatedWork W2552050053 @default.
- W4386564315 hasRelatedWork W3032898173 @default.
- W4386564315 hasVolume "50" @default.
- W4386564315 isParatext "false" @default.
- W4386564315 isRetracted "false" @default.
- W4386564315 workType "article" @default.