Data Overview
Data held by the Michigan Education Data Center cover numerous topics of interest to education researchers. From early childhood through postsecondary and from individual-level to district-level data, many of these elements can be joined to establish a powerful longitudinal view of Michigan's education context. More about the data
Record Matching
Individual student records within educational datasets maintained by MEDC are labeled with a unique identifier, generated by the State of Michigan, which allows researchers looking to perform analysis across multiple datasets to be fairly certain which records correspond to the same individual. However, this limits the scope of any analysis to these internal datasets and the variables they contain. In an instance where a researcher has access to an external dataset, they would be unable to investigate any relations between that dataset and MEDC’s data.
In addition to educational research datasets, MEDC also maintains a dataset containing the personally identifiable information (PII) of a large proportion of Michigan’s K-12 student population, including full names, dates of birth, racial/ethnic status, and addresses, each of which is associated with the state’s unique identifier. With this dataset, MEDC has developed a probabilistic matching model that allows it to match MEDC data with external data in cases where the external data each dataset contains at least some personally identifiable information in common.
The following is an overview of the process by which MEDC performs a probabilistic match between any incoming dataset and the PII dataset maintained in house. We provide a broad overview of some of the major concepts relevant to record linkage, including data cleaning, blocking, performing field and record level comparisons, and evaluation metrics and techniques. Read more about the MEDC matching process (PDF).
Data Security Guidelines
Data made available through the Michigan Education Research Institute describe Michigan's children. It is critical that researchers keep data security at the forefront during every stage. Before submitting a research application, researchers should review these guidelines and work with their institution's IT and data security experts to make sure best practices are followed.
- In most cases, the use of cloud storage (e.g., Box, Dropbox) or work computers will not be approved for data storage. Talk with your institution's IT staff and ask for secured network storage.
- Data must be stored within the United States.
- Describe how account management will be used to ensure only approved users have access to the data. Group-based policies (vs. allowing access on a one-off basis) are preferred. Your institution should have a role in issuing accounts that requires personal information (e.g., date of birth, address) to confirm the identity.
- How often will data access be reviewed and updated?
- Describe how you plan to access and analyze the data. We recommend only using "work" computers or remote desktops that are monitored by your institution's IT staff.
- How will you access data from off-campus? Whatever the answer, it should include the use of a VPN or other means to ensure end-to-end encryption of data.
The following is an example of data security verbiage describing the infrastructure used by the Michigan Education Data Center and affiliated researchers at the University of Michigan.
Sensitive data housed by MEDC reside solely on file, database or computational servers hosted within U-M data centers in Washtenaw County, MI. These servers are highly secure and approved for use with FERPA, Export Controlled (ITAR, EAR), PII, HIPAA and Sensitive Human Subject Research data. All servers are monitored 24/7 for network and physical intrusion and regularly patched by U-M Information Technology Services. Group-based access controls ensure data access follows the principle of least privilege. Separate server instances are used to ensure identifiable and de-identified data are not co-mingled. Virtual data enclaves requiring a VPN connection and two-factor authentication allow approved researchers to analyze data without removing it from protected data centers.
Data Delivery Schedule
MEDC generally receives data for the most recently completed academic year 6-12 months after completion of the academic year. The schedule below shows roughly when we expect data from the preceding academic year to arrive.
Topic | Dataset | Start Year | Estimated Arrival |
---|---|---|---|
Early Learning | Early On Program Eligibility and Enrollment | 2013 | Aug |
Early Learning | Early Childhood Demographics and Program Enrollment | 2012 | Oct |
K12 Student | K-12 Student Demographic and Enrollment Data | 2003 | Aug |
K12 Student | K12 Graduation | 2007 | Mar |
K12 Student | K-12 Student Coursework | 2011 | Oct |
K12 Student | K-12 Student Infractions and Discipline | 2011 | Aug |
Assessment | K-12 Student Assessments | 2008 | Nov |
Assessment | K-12 Student Assessments - Accommodations | 2008 | Nov |
Assessment | K-12 Student Assessments - English Language Learner | 2008 | Nov |
Assessment | K-12 Student Assessments - Career and College Readiness | 2008 | Nov |
Infrastructure | K-12 District Finance | 2004 | Jul |
Infrastructure | K-12 Educational Institutions | 2006 | Aug |
Infrastructure | Postsecondary Educational Institutions | 2008 | Oct |
Staff | K-12 Staff Demographic, Employment and Assignment Data | 2004 | Sep |
Staff | K-12 Staff Education and Certification (Endorsements) | 2012 | Sep |
Staff | K-12 Staff Education and Certification (Professional Development) | 2004 | Jan |
Staff | K-12 Staff Education and Certification (Michigan Test for Teacher Certification) | 1992 | Nov |
Postsecondary | Postsecondary Student | 2010 | Feb |
Postsecondary | Postsecondary Student Awards | 2010 | Feb |
Postsecondary | Postsecondary Student Coursework | 2010 | Feb |