Statistics 2


Types of Study

Case control study

  • observational and retrospective
  • compares a group with a disease to a group without
  • Looks for prior exposure or risk factor
  • Asks “what happened”?
  • Measures Odds Ratio (think of odd case)

Cohort Study

  • Observational and prospective
  • Compares two groups, one with exposure, one without exposure
  • Looks to see if exposure increases likelyhood of disease
  • Asks “what will happen”
  • Measures Relative Risk  (Think of Arnold in court – relative co(h)ort)

Cross Sectional Study

  • Observational
  • Collects data from a group of people to asses frequency of disease and related risk factors at a particular point in time
  • Asks “What is happening”
  • Can show risk factor association – but not causality

Twin Concordance Study

  • Compares the frequency with which both monozygotic twins and dizygotic twins or both sets develop a disease
  • Compares sibling raised by biolgic vs adoptive
  • Measures heritability 

Adoptive Study

  • Compares siblings raised with biologic vs adoptive
  • Measures heritability and influence of environment

Evaluation of Diagnostic Tests

Uses a 2 x 2 table to compare test results with the actual presence of the disease

TP = True Positive, FP = False Positive, TN = True Negative, FN = False Negative.

 with this be aware that the disease and test may be flipped around which will inver the table

  • Sensitivity = all the people with the disease who test positive
  • Sensitivity  = TP/(TP+FN)
  • It’s the ability of the test to detect the disease when it is present
  • Value approaching 1 is desirable – indicates a low false negative
  • If it’s a very sensitive test (when negative) will rule OUT the disease – SeNsitive = OUT –> SNOUT
  • Specificity = proportion of those without the disease who test negative
  • Specificity = TN / (FP + TN)
  • The ability of the test to indicate NON disease when disease is truely not present
  • A value of approaching 1 is good – indicates a low false-positive rate
  • A very specific test rules IN a disease with a high degree of certainty – SPecificity = In –> SPIN

Positive Predictive Value

  • The portion of positive test results which are true positives
  • Given a positive test result this is the probability the person actually has the disease
  • NB: If the prevalance of a disease in a population is low even a test with a high sensitivity and specificity will have a low postitive predictive value
  • PPV = TP / (TP + FP)

Negative Predictive Value

  • The probablity of a person actually not having the disease given a negative test result
  • NPV = TN / (TN + FN)

Likelyhood Ratio for positive test result

  • How much the odds of a disease increase when the test result is positive
  • sensitivity / (1 – specificity)

Likelyhood ratio for negative test result

  • How much the odds of the disease decrease when the test result is negative
  • (1 – sensitivity)  / specificity

Neumonic: Remember the PIG is always ontop – (SNOUT = sensitivity is always in the numerator) 


Prevalence Vs Incidence


Point Prevalence =    Total cases in a population at a given time  /                               .                                                total population at a give time

Incidence = new cases in population over a given time period /                                         .                         total population at risk in that time period


Prevalence = incidence x disease duration

For chronic diseases prevalence is more than incidence

For acute diseases prevalence is same as incidence

Note: when doing calculations – those with the disease or who have previously had it are not considered at risk for that disease


Odds Ratio Vs Relative Risk

  • EE = Experimental group where the event happened
  • EN = Experimental group where the event didn’t happen (negative)
  • CE = Control group where the event happened
  • CN = Control group where the event didn’t happen (never happened / negative)
  • ES = Total Number of Subjects in the Experimental group
  • CS = Total number of subjects in the Control Group
  • EER = Experiment Event Rate
  • CER = Control Event Rate

Odds Ratio OR

  • This is used for Case control studies 
  • Mneumonic – Think of this ODD suitCASE
  • Odds ratios are used for Case Control studies

Odds ratio = Odds of having the disease in the exposed group divided by the odds of having the disease in the unexposed group

Exposed group up top – imagine a flasher at the top of the railway bridge 

Odds ratio = (EE / EN) / (CE / CN)  

Or Odds ratio = (EE x CN) / (CE x EN)

Relative Risk (RR)

  • Used for cohort studies
  • Nmeumonic: Think of Arnold in Court – Relative in Court = Relative Cohort
  • Relative Risk Uses TOTALS (ES & CS)  – Arnold in court – a TOTAL FUCKUP (Odds ratio – doesn’t use totals) 
  • Relative probability of getting a disease in the exposed group compared to the unexposed group
  • Calculated as the percentage with the disease in the exposed group divided by percentage with disease in the unexposed group
  • (AGAIN the exposed group is upto – flasher ontop of railway bridge)
  • RR = (EE / ES) / (CE / CS) 
  • RR = EER/CER

So same thing written a different way would be to say that Relative risk – used for cohort studies is EER / CER

(relative risk = experimental event rate / control event rate)

Relative Risk Reduction


(Lion = Pirate / cerebellum)

Attributable Risk

  • The difference in risk between exposed and unexposed groups
  • or the proportion of disease occurrence which are attributable to exposure
  • Attributable risk = (EE/ES) – (CE/CS)
  • Or AR = EER – CER

Absolute Risk Reduction (or absolute risk increase)

  • The reduction or increase in risk with a treatment when compared to a placebo
  • The difference in the event rate in the intervention group compared with the event rate in the control group
  • ARR = CER – EER
  • If It’s CER-EER < 0 then it’s an absoulte risk reduction
  • If CER – EER > 0 then it’s an absolute risk increase

Attributable risk and absolute risk are opposite – Attributable risk is EER – CER, Absoulte risk recuction is CER – EER. 

Att Eer Cer

Ab Cer Er (abs cool – er)

Number Needed to Treat


Number Needed To Harm

NNH = 1 / Absolute risk increase


Worked Example

EER = EE/ES = 15/150 = 0.1 = 10%

CER = CE/CS = 100/250 = 0.4 = 40%

Absolute Risk Reduction = ARR = CER – EER (abs cool er) = 0.4 – 0.1 = 0.3 = 30%

Relative Risk Reduction = ARR / CER  (lion = pirate / cerebellum) = 0.3/0.4 = 0.75 = 75%

Number needed to treat = 1/lion  = 1/ARR = 1/0.3 = 3.33


Relative Risk = EER / CER = (EE/ES) / (CE/CS) = 0.1/0.4 = 0.25 = 25%

Odds Ratio = (it’s odd it doesn’t have total) = (EE/EN)/(CE/CN) = (15/135)/(100/150) = -0.111/0.666 = 0.167

Attributable Risk = (At risk ere if youre cqueer) = At Risk = ERR – CER = 0.1 = 0.4 = -0.3


Type 1 error (α)

  • stating that there is an effect or a difference when none exists – accepting the experimental hypothesis in error
  • p = probability of making a type 1 error
  • p is judged against a preset level of significance – usually p < 0.05
  • AKA “false positive error”

Type 2 error (β)

  • stating that there is not an effect of difference where one does exist
  • β is a “False negative error”
  • accepting a nul hypothesis which isn’t actually the case


  • Probability of correctly rejecting the null hypothesis or correctly accepting the experimental hypothesis
  • Power = 1 – β

Normal distribution

So 68% fall within one standard deviation of the mean, 95% fall within 2 standard deviations of the mean, 99.7% fall within 3 standard deviations of the mean.

95% confidence interval = Mean +/- 1.96 x Standard Error

Standard Error = Standard deviation / √number of patients

  • t-test: checks the differences between the MEANS of 2 groups (MR T is MEAN!)
  • ANOVA checks the differences between the means of 3 groups (MR T AND ANOVER! – mr t is two – add another and you have the means of three groups)
  • Chi Squared (X2) check the differences between percentages of proportions of categorical variable (like eye colour)  (not means)




Metanalysis Displaying / interpretation of data

Forest Plot / Blobogram

Blob / Square

  • findings from each study are a blob or square
  • If the square is to the left the new treatment is better, if to the right it’s worse
  • The size of the square is proportional to the precision of the study (roughly proportional to the sample size)



The horizontal line on each square –

  • this representst the 95% confidence interval
  • represents the UNCERTAINTY of estimate of treatment effect
  • The wider the line the less certainty
  • If the line passes the vertical line of no effect it means the study is not statistically significant


The diamond

  • the aggregate effect found from all studies are displayed as a diamond
  • the width of the diamond shows the 95% confidence interval
  • If the diamond crosses the vertical line of no effect, it means overall there is no statistically significant effect

The vertical line

  • line of no effect
  • odds ratio of 1
  • Risk and benefits are equal
  • any statistically significant study does not cross this line


  • if all studies are kinda reporting different things this will be super low
  • if the P value is say larger than 0.1 then we can be reassured they’re all measuring pretty similar things


Funnel Plot

  • Funnel plots are designed to highlight the existance of publication bias  in systematic reviews & metaanyalysis
  • It assumes that large studies will be near the average and smaller studies will be spread on both sides of the average
  • Variation from this assumption can indicate publication bias

Its sometimes difficult to identify this by eye so Egger’s Test  is a statistical test to check for publication bias – it is a formal way of looking at the funnel plot and working out whether there’s kinda like studies missing.


Cox Model

  • analyses survival data
  • isolates effects of treatment from effects of other variables

Kaplan – Meire Method

  • Censored survival time – you can’t know when some people in the study are gonna die cos they’re still alive
  • so the Kaplan meire method – it calculates the proportion of such people surviving a given lenght of time