AIR NCES Data Institute
Restricted-Use Data and License Webinar
Script Draft
Approximate Time
Dialogue
Welcome,
introduction, and
purpose 5 minutes
Hi everyone.
Thank you for joining us for today’s webinar on restricted use data and licenses at
the National Center for Education Statistics.
My name is Alli Bell and I will be chatting with you today a little bit about the
difference between restricted and public-use data, when you might need to access
restricted data, and how to obtain a license to do that.
I have familiarity with the data that the National Center for Education Statistics
makes available to researchers both as a user I used restricted-use data for my
dissertation, on other projects as a graduate student, and professionally as well
as a data provider as I was a survey director at IPEDS.
Before we jump into the content, let’s take a look at what we will be talking about
over the next hour or so.
First, I’ll provide a little background on the kinds of data the National Center for
education Statistics makes available and the difference between public-use and
restricted data.
We will follow that with a quick overview on why restricted data are restricted and
why you need a license to access them.
We will also talk about how to determine whether or not you need restricted-use
data or if you can use the publicly-available data to address your research question
or questions.
If you determine that you do need access to the restricted data, you will need to
obtain a license, and we will talk about what you need to do to get a restricted use
license.
Once you have a license, you have a number of responsibilities in order to ensure
the data remain secure while you use them. We will talk about license holders’
requirements and obligations.
At the end of the webinar, there will be a chance for you to get answers to any
questions that have come up along the way.
Background on
restricted use data and
licenses 10 minutes
Let’s start by talking about the kinds of data the National Center for Education
Statistics, or NCES, makes available to researchers.
NCES is one of four centers at the Institute of Education Sciences, or IES.
As you can see from this chart, there are a number of programs and surveys at
NCES.
The data collected by NCES may be through a sample survey of individuals or
schools. Some examples include the National Postsecondary Student Aid Study and
the Middle Grades Longitudinal Study.
NCES also collects information from a population of schools, colleges, and
universities. For example, the Integrated Postsecondary Education Data System
IPEDS is a collection that includes information from all the postsecondary
institutions that receive Title IV federal aid.
With few exceptions, the data collected by NCES are collected under a pledge of
confidentiality to protect personally identifiable information.
However, NCES also has an obligation to make data available to policymakers,
researchers, and other stakeholders.
Most of the data NCES collects is available through online data tools and data files.
The data are either made available through public-use data or restricted-use data.
Let’s first talk a little bit about publicuse data. There are a lot of data that are
publicly available. These data have undergone procedures to ensure they have
been de-identified.
Through these procedures, there have been measures taken to ensure that none of
the individual cases in the data set could be traced back to a specific individual.
In some cases, the data have been swapped, recoded, or otherwise perturbed,
particularly in the cases of variables that may have small cell sizes.
Data may have also been eliminated from the data set. For example, ZIP code is not
available on publicly available data sets, and neither is some of the more detailed
categories in variables that have been collapsed.
Because of these procedures, there are some examples of policy or research
questions that are not able to be answered with publicly-available data.
To make sure that these data can be used to answer questions that require more
detailed data, IES has a restricted-use data program for data sets with more detail.
These restricted use files have microdata variables that could potentially allow
respondents to be identified and these variables are never made public.
Through the restricted-use data program, IES loans data to non-IES data users
through a license or contract.
Licensees are allowed to use the restricted data under strict privacy and
confidentiality agreements for specific research projects.
As of 2016, there were over 1,200 restricted-use licenses.
How do you know if you need to access the restricted data?
Why do you need a
restricted-use license
and when is the
public data sufficient
15 minutes
The answer is that for a majority of research questions, the data that are available
through the public releases are sufficient.
Let’s look at examples from the Postsecondary Branch—IPEDSand the
Longitudinal Surveys Branch the sample surveys.
All IPEDS data are publicly available through the IPEDS data center. IPEDS is not
collected under a pledge of confidentiality.
Using the IPEDS Data Center, you can:
explore data for a single institution,
compare institutions,
create summary statistics,
download custom data files,
download survey data files, and
explore trends via Trend Generator.
If your research question is about postsecondary institutions, it is likely that you
can use the publicly-available IPEDS data to answer your question.
The sample surveys, however, are different.
These data sets contain individual level information. They each have publicly
available data that researchers can access through the tools available in DataLab.
The DataLab tools include:
QuickStats, which you can use to make quick descriptive tables;
PowerStats, which you can use to perform more sophisticated tables and
perform regression analysis; and
TrendStats, which allows you to create tables of trends over time.
In addition, there is a library of tables that are publicly available.
In order for these data to be able to be released to the public, they have been de-
identified so that a single record cannot be traced back to a specific individual.
The statistical methods used to protect the data are such that a record swapping or
recoding will not have a statistically significant impact on the results in the
aggregate.
Through the data tools, users do not have access to the individual level data, so
there is no risk of identification.
However, the public-use data files still have a lot of important and useful
information that are sufficient to answer a wide range of research questions.
Let’s take Baccalaureate and Beyond: 2008/2012 as an example.
In the publicly-available data, researchers have access to all kinds of demographic,
financial, educational, and outcomes information.
Using PowerStats, you can create a table or run a regression using a number of
variables without obtaining a restricted-use license.
In fact, the tools NCES has made available are constantly being improved so that
researchers are able to ask sophisticated questions and access a large amount of
data.
Questions that can be answered using the public-use Baccalaureate and Beyond
data include:
Are different groups of students more or less likely to have loans and be
able to pay back those loans?
From which majors are students more likely to be employed?
How is institutional type related to post-collegiate outcomes?
And many, many more.
However, there are times that the public-use data may be insufficient to answer a
research question.
For example, you may want to employ a statistical technique that you are unable to
accomplish through PowerStats.
In PowerStats, you can analyze data using a linear regression and logistic
regression. You can also create a correlation matrix. For some analysis, you might
want to employ other techniques, such as a multinomial logistic model or a time-
series analysis for longitudinal data.
Or, there might be a variable that you need that is not available in the public file.
For example, the National Postsecondary Student Aid Study or NPSAS has
restricted-use data available. The difference between the publicly-available data
and the restricted-use data may seem minor, however, it could be important
depending on your research questions.
How can you tell if you need the restricted-use data file to answer your research
question?
I recommend starting with the publicly-available data through DataLab. Unless you
have a specific variable that is not available in DataLab or are using an analytical
technique that isn’t available through the tools, it is likely that you can answer your
research question by using publicly-available data.
If your analysis relies on a technique that is not available in DataLab, you will likely
need to adjust your statistical plan or use the restricted-use data.
It’s more difficult to know if you need a restricteduse license because the variable
you want isn’t publicly available.
The Electronic Codebook or ECB isn’t made publicly available. You have to have
a restricted-use license to access it.
However, each survey has a data documentation report that has information on
the surveys or interviews used to collect data.
Looking at the NPSAS documentation, I see that in the interview respondents are
asked about their marital status. The possible responses are single, married,
separated, divorced, widowed, and living with partner in a marriage-like
arrangement. In the public-use data, these categories are collapsed to (1) single,
divorced, or widowed; (2) married; and (3) separated. This aggregation may not be
appropriate for an analysis that looks at the relationship between marital status
and financial aid.
Once you’ve done a little bit of background research, you can also reach out to the
NCES survey director in charge of the survey you’re interested, or help desk, to see
if a restricted-use license may be best for your analysis.
How to obtain a
restricted-use license
10 minutes
So, if you have looked at your research question and the codebooks for the data
set you will be using to conduct your research and have determined that you need
to access the restricted-use data, you must obtain a license.
The license is an agreement between IES, the researcher, and the researcher’s
organization.
IES does not grant licenses to individuals without an organizational affiliation.
The organization can be a university, a think-tank or other research organization, or
company.
IES then lends the restricted data to the organization for the researchers named on
the license to use.
Licenses are only granted to qualified organizations in the 50 United States and
Washington, D.C.
If your organization does not already have a license for restricted-use data, you
must submit a formal online request that includes:
A designated Principal Project Officer or PPO (who must be at least a post-doc),
Senior Official or SO (who must have legal authority to sign the license on behalf of
the organization), and Systems Security Officer Or SSO (who is ultimately
responsible for the security of the data).
Name, title, institutional affiliation, full address, phone number and email for the
researcher, senior official, and system security officer.
A list of all authorized users on the data license. Up to seven staff can be identified.
A description of the project, include the name, year, and subject matter of the data
files requested; the project title and brief description of research objective and
how data will be used; an explanation of why public-use files cannot be used; a
description of any linked data, an indication of which education sectors will be
served by the project; an agreement that data will not be used for administrative
or regulatory purposes; and length of requested loan.
A hard copy of the signed data-use agreements; signed and notarized affidavits of
nondisclosure; and a signed security plan form.
Licensees agree to:
Keep data safe
Participate in unannounced inspections to ensure compliance
Read the restricted-use data procedures
Adhere to approved IES procedures for reporting tabular results
If your organization already has a license, it is possible to be added through an
amendment to the license. You will need to speak with the PPO to start this
process.
Requirements and
obligations for people
with a restricted-use
license 5 to 10
minutes
Once a license has been awarded to an organization and data have been received,
the PPO and SSO are responsible for making sure the data are protected.
This includes making sure that:
Only authorized users can access the data.
Data must be accessed in the secure project office listed in the security plan and
cannot be transferred via the internet or a USB memory stick.
All draft documents must be submitted to IES for disclosure review before
publication.
The requirements for presenting results in a table using restricted data are fairly
straightforward. Researchers must round figures to the nearest 10 (or 50 for ECLS-
B)
The PPO is also responsible for maintaining the restructured use data and agrees
to:
Notify IES if there are any demands made for the data;
Notify IES of any changes to the license; and
Maintain a file with all documents related to the license.
When the research is complete or the license period has lapsed, the PPO must
close out the license officially with IES. The data must be returned or destroyed
under IES supervision and procedures.
Data security is taken very seriously. Willful disclosure of restricted-use data is a
felony. Penalization includes fines up to $250,000 and/or up to five years in prison.
Real life example
I used a restricted-use data set to for my dissertation. Remember that licenses are
only granted to organizations, so I was added to my graduate school’s restricted-
use license.
The process of being added was fairly straightforward. I needed to sign an affidavit
that I would follow the rules and was aware of the policies.
I needed a restricted-use license because I was using a multinomial logistic model
to examine the relationship of the formation of college aspirations and concerns
about college costs. I could not do a multinomial logit through publicly available
data sources.
Additionally, I needed the restricted data for my outcome variable. My dependent
variable had three categories:
Student plans to attend college;
Student does not plan to attend college because of financial concerns; or
Student does not plan to attend for another reason.
Ultimately, I think I could have changed the design of my study and probably come
to a similar conclusion, but the analytical rigor needed to satisfy my dissertation
committee was best achieved through the use of restricted-use data.
The challenge I had with the data, mostly, was that it needed to be in a physical
location for my analysis, especially considering that I took a job in a different state
than my graduate school. I planned ahead and did most of my analysis prior to
moving, but there were a few times that I needed to go back to finish my analysis
and that process added time to completing my dissertation.
I found that working with NCES was straightforward and easy. I followed the rules
and they were quick to review my analysis. Essentially, they looked at the tables to
ensure that I had followed the statistical standards.
Q&A and closing 10
minutes
In general, NCES does a really good job of making sure that data are available
through the public-use files. With some exploration, you’ll find that those data are
sufficient to answer a wide variety of questions.
But, if not, it is possible to access more detailed data. IES wants to make sure that
researchers who need additional data are able to access it at the same time that it
maintains the security and privacy of the data it collects.
If you have additional questions about restricted data, we can take those now.
There are also a number of resources available online that you can access at the
URL you see on the screen.