Accuracy: The correctness of evaluation findings. A key component of evaluation standards, accuracy helps ensure that an evaluation produces technically adequate information (e.g., clear descriptions of program and context, use of appropriate methods in data collection and analysis, and impartial reporting of results).

Advocacy: A wide range of activities that aim to influence decision makers at various levels. These can include traditional advocacy work like litigation, lobbying, and public education, as well as activities that promote change in systems, policies, or actions like organizational capacity building, networking, relationship building, communication, and leadership development.

Average: Also known as the mean, this is calculated by adding the values of all items in a data set, then dividing by the total number of items.

Bias: A systematic error in measurement due to factors such as how a sample is selected, who responds, the measurement tools used, the types of questions asked, and who collects the data.  Although it is difficult to eliminate bias completely, it can be minimized by using good data collection tools and carefully training data collectors.

Case study: A descriptive, in-depth analysis of a program, organization, or activity over time. In program evaluation, case studies aim to answer questions like “What happened?” and “Why?”

CDC’s Framework for Program Evaluation: The predominant model used in public health, this is a practical, non-prescriptive tool summarizing the key steps and quality standards in an effective program evaluation. The framework was designed to encourage systematic, targeted, useful, and accurate evaluations that contribute to improved public health outcomes and program accountability. 

Cross-tabulation: An easy way to show the relationship between two or more variables and show how they are related.

For example, cross-tabulation can be used to create a table summarizing hair color by sex:

  Blonde Brown Black Total
















Cultural competency: Understanding and awareness of the different beliefs, values and practices of diverse groups and how such differences may impact program delivery, data collection, and community engagement. Cultural competency means that interactions with people are appropriate to their culture, including communication style and language.

Document review: A data collection method in which information is obtained by reading and analyzing relevant literature or documents (e.g., medical charts, program administrative records, and previous evaluations). Depending on the material analyzed, document review may be considered either primary or secondary data analysis.

Evaluation: The systematic collection of information about the activities, characteristics, and results of a program or an organization’s work in order to make judgments, improve effectiveness, inform decisions, and/or increase understanding.

Evaluation plan: A document that outlines the major components of an evaluation. A good evaluation plan includes a program description or logic model, a statement of the evaluation’s purpose, evaluation questions and carefully selected indicators to answer them, data collection methods and sources, responsible parties assigned to various activities, a timeline for execution, and a budget, as well as plans for engaging stakeholders in planning, implementation, and dissemination of results.

Evaluation questions: The key overarching questions that an evaluation is intended to answer. Evaluation questions help focus your evaluation.

Evaluation report: A written report summarizing the findings of a completed evaluation. A good evaluation report includes descriptions of the evaluation’s purpose and design, the program, the data collection methods, and the results, as well as conclusions and recommendations. An evaluation report is a common way for an organization to share its experiences and lessons learned with clients and stakeholders.

Evaluation standards: A set of 30 standards adopted by the Joint Committee on Standards for Educational Evaluation, a coalition of professional associations concerned with evaluation quality. Widely adopted in the evaluation field, these standards ensure that an evaluation plan is realistic, ethical, and likely to produce accurate and useful results.

Evaluative thinking: The practice of continually using systematically collected data to inform actions and decisions throughout an organization. Key components include asking meaningful questions, gathering and analyzing the appropriate data, sharing results, and developing strategies to act on the findings.

Feasibility: How practical and reasonable a plan is, which contributes to the likelihood of it being accomplished. In program evaluation, feasibility standards are about using practical, nondisruptive data collection methods; anticipating and acknowledging stakeholders’ differing interests; and prudently using resources to produce valuable findings.

Frequency distribution: A way to visually display the number of times (the frequency) that a particular item appears in the data. Frequency distributions are often presented as tables or graphs. For example:

Hair color Number of









Focus group: A data collection method for obtaining in-depth information from a group through a facilitated discussion about their experiences and perceptions related to a specific issue.

Indicator: A specific and meaningful measure that can be used to evaluate a program. Indicators define the criteria by which a program will be judged, answering the question “If the outcome is achieved, how will you know it?”

Inputs: The resources needed to make a program run (e.g., staff, funding, facilities, materials, technical assistance, partner entities).

Interview: A data collection method that entails gathering information by speaking one-on-one with people who have special knowledge about an issue to understand their impressions and experiences in detail.  Interviews may be structured or informal, but generally follow an interview protocol that outlines key topics and/or questions to be covered.

Institutional Review Board (IRB): A formally designated committee that reviews proposed studies to ensure that U.S. government regulations about protecting human research participants are met. IRBs are found at most universities, as well as many other large organizations that receive federal funding, such as hospitals, research institutes, and public health departments.

Legacy evaluation: An evaluation conducted at some point after a program has ended to determine the sustainability of its outcomes and its long-term impact.

Logic model: A systematic, visual way to present and share your understanding of the relationships among the resources you have to operate your program, the activities you plan, and the changes you hope to achieve.

Media tracking: A systematic way to collect information on media coverage of an issue by counting specific search terms or phrases appearing in targeted media outlets during specific periods of time and analyzing characteristics such as story content, framing, and placement.

Mixed methods: The use of both quantitative and qualitative data collection methods in an evaluation.

Observation: A data collection method for directly collecting information about a program, its staff, program participants, or physical characteristics (e.g., setting, materials) as the program’s activities are underway.

Ordered responses: A set of possible survey item responses that are organized in a logical sequence along a continuum. For example, “strongly agree, somewhat agree, somewhat disagree, strongly disagree.”

Outcome evaluation: An evaluation focused on a program’s effectiveness, or the extent to which it accomplished what it was intended to.

Outcomes: What the program aims to accomplish, or the change that occurs as a result of the activities and investments. The timeframe varies depending upon the type of program, but outcomes are generally categorized as short-term, intermediate, and long-term.

Outputs: The specific activities and direct products of a program, as well as the people who will participate in or be reached by those activities.

Percentages:Calculations used to express how large/small one quantity is relative to another quantity.  Percentages are a way to show relationships and make comparisons.

Primary data: Data collected firsthand (e.g., via surveys, interviews, observations, administrative records) for a specific purpose.

Process evaluation: An evaluation of the input and output sections of a logic model to determine how a program is implemented and whether it is producing the expected outputs. A process evaluation can be used to determine how closely a program is following its intended plan, or to identify factors that may be influencing program delivery.

Program: Any set of coordinated activities an organization is undertaking in service to a goal. A program can include a variety of activities, such as direct services, community mobilization efforts, research initiatives, policy development activities, communication campaigns, infrastructure-building projects, training and educational services, and development of administrative systems.

Proxy indicator: An indicator that is related to your outcome but is not a direct measure of it. Proxy indicators are useful when you cannot directly measure the thing you want to measure and need a substitute indicator that will provide relevant data.

Propriety: The ethical correctness of your research methods. Propriety standards address using protocols like informed consent agreements; making sure human subjects are being treated respectfully; making sure findings are fully disclosed; and being up front about conflicts of interest.

Qualitative data: Data that are descriptive or expressed in words, rather than numbers or predetermined categories.

Quantitative data: Data that can be counted or expressed in numbers.

Rating scale: A type of ordered response, where options are ranked on a continuum. Rating scales allow a respondent to indicate the degree to which something is the case.

Reliability: The degree to which an assessment produces consistent results over many uses. Reliability is important to high quality data collection.

Research: An organized, structured, and purposeful investigation to discover, interpret, and/or revise human knowledge. Research settings vary from the natural world to controlled laboratory environments.

Response rate: The proportion of intended respondents who actually complete an assessment as requested (e.g., the number of completed interviews, divided by the number of people contacted).

Sample: The number of people or objects included in a data collection effort, such as the number of individuals included in a survey.

Secondary data: Data collected in the past for a use other than your current needs. It can include data collected by other parties, previously published, or originally collected yourself but for another purpose.

Stakeholders: People or organizations invested in a program who are interested in the results of an evaluation and/or have a stake in what will be done with the results.

Survey: A data collection method for gathering uniform information quickly from a large number of people by asking them to respond to a series of questions.

Strategic plan: A guiding document describing the implementation of an organization’s long-term vision. A strategic plan lays out an organization’s overall direction as well as its components, such as programs, activities, resources, and so on.

Utility: The usefulness of something. Utility standards help ensure that an evaluation stays on target and provides important and relevant information. These standards address who will be impacted by the evaluation and guide the amount and type of information collected, how the findings are interpreted, and the clarity and timeliness of evaluation reports.

Validity: The degree to which an assessment actually measures what you intend it to measure when collecting data. Validity is important to high quality data collection.

Weighted average: A way to compare responses between items in a set when the items have the same response options. The calculation involves assigning a value, or “weight,” to each response option then calculating an overall numeric value or average score for each item.