Reporting Statistical Results in Text and in Graphs



Reporting Statistical Results in Text and in Graphs

The Office of Evaluation Sciences reports evaluation results in the form of abstracts published on

its website. Most evaluations are done via randomized controlled trials (RCTs), and results

typically are presented for a linear regression model estimated with ordinary least squares (OLS)

where one of the regression model’s parameters represents the effect of a treatment or

intervention. This document describes OES’s preferred method of reporting these estimates, 95%

confidence intervals, and p



-values.

When to Apply This Guidance

When reporting treatment effects and statistical inferences, OES has a preferred style for

reporting results in text and for graphical presentation. These guidelines apply both for project

abstracts (published on the OES website) and for slide presentations. This guidance pertains most

directly to results from RCTs, and observational quasi-experimental designs (QEDs) may require

alternative ways of presenting results.

Required Elements in Text

One treatment condition and a control condition

For reporting the treatment effect estimate from a linear regression, required elements to include

in text are:

● The observed mean (proportion for binary responses) for the control group

● The point estimate of the regression coefficient (expressed in percentage points for binary

responses). Depending on the research design and statistical model, this may be a

covariate-adjusted estimate.

● The 95% confidence interval for the regression coefficient

● The p



-value, rounded to 2 or 3 decimal places as appropriate. Report the actual p



-value,

not just whether it falls below a threshold.

● If multiple comparisons adjustment is used for p



-values, report the un



adjusted values in

text and note the adjusted values in a footnote (read more about OES guidance on multiple

comparisons).

These values help to convey to an audience both the statistical



and practical



significance of

treatment effects — with the former captured by p



-values and the latter captured by point

estimates and confidence intervals.

Here’s an example, from a fictional project that assessed the effect of letters on applications for a

benefit (required elements are highlighted in bold):

 Update September 2020



Of individuals who received a letter, 6.00 percent applied for benefits in the nine months

after the letters were sent out, compared with 0.96 percent of individuals who did not

receive a letter. After adjusting for age, socioeconomic status, and other factors that might

influence whether an individual applies for benefits, we estimated that letters resulted in

an increase of 5.09 percentage points (p



=.000, 95% CI [5.02, 5.17]).

Note two things about how the treatment effect is described in this example. First, the text makes

clear that the estimated treatment effect (an increase of 5.09 percentage points) is not just the

difference between the observed means in the two experimental groups but rather is adjusted for

covariates included in the statistical model. Second, the text is clear that this treatment effect on a

binary outcome is measured in percentage points (as opposed to “an increase of 5.09 percent”).

Effects on binary outcomes should be described in percentage points. If it is also useful to describe

the effect in relative terms as a percentage of the baseline outcome (“a 530% increase”), then this

can be done too, but should be done in addition to, not instead of, an estimate in percentage

points.

Multiple treatment conditions and a control condition

Many evaluations include two or more treatment conditions. When this is the case, the preferred

elements to include in text are:

● The observed mean (proportion for binary responses) for the control group

● The point estimate (regression coefficient) for the treatment effect for each treatment

group, relative to the control/reference group. Depending on the research design and

statistical model, these may be covariate-adjusted treatment effects.

● The 95% confidence intervals for these point estimates

● The p



-values for these treatment effects, rounded to 2 or 3 decimal places as appropriate.

Report the actual p



-values, not just whether they fall below a threshold.

● If adjusting p



-values for multiple comparisons, report the un



adjusted values in text and

note the adjusted values in a footnote (read more about OES guidance on multiple

comparisons).

Depending on the research questions and comparisons specified in the analysis plan, differences

among the treatment arms may be relevant. In each case, the (potentially covariate-adjusted)

difference should be reported as described above (point estimate of the difference, 95% CI, and

actual p



-value).

Predicted means for the treatment arms may also be explicitly reported if relevant, but this is

optional (these are likely depicted graphically; see guidance on figures below).

Here’s an example from project 1738, which evaluated the effect of postcard reminders to seniors

on vaccine uptake. The trial involved a stepped-wedge design where timing of receiving postcards

 Update September 2020



was varied for those assigned to treatment for a total of three treatment arms and one control

group. The results were summarized in text as follows: 

Individuals in the control group received 8.58 percent of recommended vaccinations

during the study period. Results indicated that, compared to the control group, individuals

in the October group showed an increase of 0.27 percentage points (p



= .002, CI [0.12,

0.42]), and individuals in the November group showed an increase of 0.15 percentage

points (p



= .13, CI [0.00, 0.30]).

In this particular example, the estimated treatment effect was statistically significant for only one

of the treatment arms. When this is the case, it may be useful to highlight this in-text. For example:

“The treatment effect was statistically significant only for the October group.”

Required Elements in Figures

In addition to reporting results in-text, it is helpful to present results graphically. This gives

readers a frame of reference for interpreting the magnitude of causal effects relative to the

baseline response.

One treatment condition and a control condition

When there is one treatment condition, required elements for graphical presentation are:

● Figure should be in the form of a bar plot where the y-axis starts at zero;

● Report the observed



mean for the control group with a blue bar;

● Report the (potentially covariate-adjusted) predicted



mean for the treatment group in

yellow (control mean + treatment effect); 

● Include 95% confidence intervals for those means. For the control group, the 95%

confidence interval is based on the standard error of the mean. For the treatment group,

the 95% confidence interval is based on the standard error of the treatment effect

estimate.

● In a footnote, explain (1) that the treatment group estimate is based on our estimate of the

treatment effect and covariate adjusted (if applicable); (2) that the error bars represent

95% confidence intervals; and (3) that the confidence interval for the treatment condition

reflects uncertainty in our treatment effect estimate and covariate adjustment (if

applicable). The last point is important to make clear that the confidence intervals for the

two groups are calculated in different ways and represent different sources of uncertainty;

○ Here is an example to be adapted as appropriate: “The estimate for the email group

is based on our estimate of the treatment effect after adjusting for age,

race/ethnicity, and other covariates. Error bars represent 95% confidence

intervals, and the confidence interval for the email group reflects statistical

This example has been edited for clarity.

For more on what to report when using blocking, see OES guidance on block randomization.

 Update September 2020



uncertainty in our estimate of the treatment effect after adjustment for

covariates.”

● Optional markers of statistical significance (e.g., an asterisk), with a legend indicating the



-value threshold for statistical significance;

● A horizontal line denoting the observed mean for the control condition is strongly

recommended in most cases, but not necessary;

● Include a y-axis label indicating the scale and type of response (e.g., “proportion

applications”).



Figure 1: Example figure using “dummy” data. 



Figure 1 shows an example using some simulated data. It includes the observed mean of the

control group (blue) and the predicted mean of the treatment group (yellow). It also includes 95%

confidence intervals and further includes a horizontal line at the level of the control mean to

clarify that the significance of the treatment effect is evaluated with respect to the response

under treatment relative to the mean under control. The y-axis further begins at zero, and the

y-axis label notes that the intervals shown are 95% confidence intervals.

Multiple treatment conditions and a control condition

When there are two or more treatment conditions:

● Figure should be in the form of a bar plot where the y-axis starts at zero;

● Report the observed



mean for the control group with a blue bar;

● Report the (potentially covariate-adjusted) predicted



means for the treatment groups in

yellow (control mean + treatment effect); 

R code to produce this figure is included at the end of this document.

For more on what to report when using blocking, see OES guidance on block randomization.

 Update September 2020



● Include 95% confidence intervals for those means. For the control group, the 95%

confidence interval is based on the standard error of the mean. For the treatment groups,

the 95% confidence intervals are based on the standard error of the treatment effect

estimates.

● In a footnote, explain (1) that the treatment group estimates are based on our estimates of

the treatment effects per treatment arm and covariate adjusted (if applicable); (2) that the

error bars represent 95% confidence intervals; and (3) that the confidence interval for

each treatment condition reflects uncertainty in our treatment effect estimate and

covariate adjustment (if applicable). The last point is important to make clear that the

confidence intervals for the two groups are calculated in different ways and represent

different sources of uncertainty;

○ Here is an example to be adapted as appropriate: “The predicted means for the

information-only email group and the risk-framing email group are based on our

estimates of the treatment effects after adjusting for age, race/ethnicity, and other

covariates. Error bars represent 95% confidence intervals, and the confidence

interval for each email group reflects statistical uncertainty in our estimate of the

treatment effect after adjustment for covariates.”

● Optional markers of statistical significance (e.g., an asterisk), with a legend indicating the



-value threshold for statistical significance;

● A horizontal line denoting the observed mean for the control condition is strongly

recommended in most cases, but not necessary;

● Include a y-axis label indicating the scale and type of response (e.g., “proportion

applications”).



Figure 2: Example figure using “dummy”

data. 



R code to produce this figure is included at the end of this document.

 Update September 2020



Figure 2 provides an example using simulated data. This figure contains all of the recommended

elements: a bar plot starting at zero on the y-axis, the control group mean, the predicted

treatment means, and 95% confidence intervals.

Code Appendix

Sample code is provided in both R and Stata:

R Code

The below code was used to generate Figures 2 and 4. The source code for the oes_plot(...)

function can be found here (this link might not be accessible from outside OES). To use the code,

just click “Raw” to view the raw script file and then copy and paste the URL where the placeholder

URL is below.

# pull oes_plot function from source code

gitHub.url <- "https://Place.Holder…"

source(gitHub.url)

# load and attach estimatr library

library(estimatr)

# df : a data frame of simulated data

# x1 : binary indicator for treatment assignment (two arms)

# x2 : factor indicating assignment to multi-arm treatment (four

arms)

# y1 : response for binary treatment

# y2 : response for multi-arm treatment

# fit models

fit1 <- lm_robust(y1 ~ x1, df)

fit2 <- lm_robust(y2 ~ x2, df)

# plot results for each (figures 2 and 4 in document)

oes_plot(fit1,treatment_arms = 1,title = "Outcomes")

oes_plot(fit2,treatment_arms = 3,title = "Outcomes")

Stata Code

The below code was used to generate Figure 4. The source code for the oes_plot_stata

program can be found here (this link might not be accessible from outside OES). To run a program

in Stata, you can save the GitHub file locally (either pull the file from GitHub or copy and paste it

 Update September 2020



into a local do file). Then load the program as per the “configure program” line below and run it

with any arguments specific to your dataset.

/*Note that to maintain consistency, this example code uses the

simulated dataset referenced in the R code but renames x2 ->

treatment and y2 -> y given that the Stata code does not

differentiate between datasets with one or more treatments.*/

//generate dummies for each treatment group

forval i = 0/3 {

gen treat_`i' = (treatment == `i')

}

//run specification from which to estimate core treatment effects

reg y treat_1 treat_2 treat_3, vce(hc2)

//capture standard errors

forval i = 1/3 {

local treat_`i'_se = _se[treat_`i']

}

//capture p-values

forval i = 1/3 {

lincom _b[treat_`i']

local p_`i' = (2 * ttail(e(df_r), abs(r(estimate)/r(se))))

}

//graph arguments

local ytitle = "Response" // y axis label

local alpha_level = .05 // pvalue cutoff

local ymin = 0 //minimum y axis value (strongly recommend starting at

0!)

local ymax = .15 //maximum y axis value

local ygap = .05 //gap between each y axis tick mark

//configure program

do ".../GitHub/code_library/data_visualization/oes_plot_stata.do"

//run figure - Note: arguments should always appear in the order they

are listed in the program

oes_plot_stata "`ytitle'" "`alpha_level'" "`ymin'" "`ymax'" "`ygap'"

"`p_1'" "`treat_1_se'" "`p_2'" "`treat_2_se'" "`p_3'" "`treat_3_se'"

//export your graph locally!



 Update September 2020