Che materia stai cercando?

Anteprima

ESTRATTO DOCUMENTO

Directorate for Education

and Human Resources

Division of Research,

Evaluation and Communication

National Science Foundation

The 2002 User-Friendly Handbook

for Project Evaluation

The 2002 User Friendly Handbook

for Project Evaluation

Prepared under Contract

REC 99-12175

by

Joy Frechtling

Westat

with a special section by

Henry Frierson

Stafford Hood

Gerunda Hughes

Conrad Katzenmeyer

Program Officer and COTR

Division of Research, Evaluation and Communication

National Science Foundation

NOTE: Any views, findings, conclusions, or recommendations expressed in this report are those of the authors, and do not necessarily

represent the official views, opinions, or policy of the National Science Foundation.

January 2002

The National Science Foundation

Directorate for Education & Human Resources

Division of Research, Evaluation, and Communication

TABLE OF CONTENTS

Section Page

Introduction ................................................................. 1

References ....................................................... 2

I Evaluation and Types of Evaluation ............................... 3

1. Reasons for Conducting Evaluations ........................ 3

2. Evaluation Prototypes ............................................. 6

The Different Kinds of Evaluation ........................... 7

Formative Evaluation........................................ 8

Summative Evaluation ...................................... 10

Evaluation Compared to Other Types of Data

Gathering......................................................... 11

Summary ............................................................... 13

References ....................................................... 13

II The Steps in Doing an Evaluation .................................. 15

3. The Evaluation Process—Getting Started................. 15

Develop a Conceptual Model of the Project and

Identify Key Evaluation Points .......................... 16

Develop Evaluation Questions and Define

Measurable Outcomes....................................... 20

Develop an Evaluation Design................................. 24

Selecting a Methodological Approach................ 24

Determining Who Will be Studied and When ..... 25

References ....................................................... 30

4. The Evaluation Process: Carrying Out the Study

and Reporting......................................................... 31

Conducting Data Collection .................................... 31

Analyzing the Data ................................................. 34

Reporting............................................................... 35

Background...................................................... 36

Evaluation Study Questions ............................... 36

Evaluation Procedures....................................... 36

Data Analysis ................................................... 37

TABLE OF CONTENTS (CONTINUED)

Section Page

Findings ........................................................... 37

Conclusions (and Recommendations)................. 38

Other Sections .................................................. 38

How Do You Develop an Evaluation Report ...... 38

Disseminating the Information................................. 41

References ....................................................... 42

III An Overview of Quantitative and Qualitative

Data Collection Methods ............................................... 43

5. Data Collection Methods: Some Tips and

Comparisons .......................................................... 43

Theoretical Issues................................................... 43

Value of the Data.............................................. 43

Scientific Rigor ................................................ 44

Philosophical Distinction .................................. 44

Practical Issues....................................................... 45

Credibility of Findings ...................................... 45

Staff Skills ....................................................... 45

Costs ............................................................... 46

Time Constraints .............................................. 46

Using the Mixed-Method Approach......................... 46

References ....................................................... 48

6. Review and Comparison of Selected Techniques ...... 49

Surveys .................................................................. 49

When to Use Surveys........................................ 50

Interviews .............................................................. 50

When to Use Interviews.................................... 51

Focus Groups ......................................................... 52

When to Use Focus Groups ............................... 53

TABLE OF CONTENTS (CONTINUED)

Section Page

Observations .......................................................... 53

When to Use Observations ................................ 55

Tests...................................................................... 55

When to Use Tests............................................ 56

Other Methods........................................................ 57

Document Studies............................................. 57

Key Informant.................................................. 59

Case Studies..................................................... 61

Summary ............................................................... 62

References ....................................................... 62

IV Strategies That Address Culturally Responsive

Evaluation.................................................................... 63

7. A Guide to Conducting Culturally Responsive

Evaluations ............................................................ 63

The Need for Culturally Responsive Evaluation........ 64

Preparing for the Evaluation.................................... 65

Engaging Stakeholders............................................ 65

Identifying the Purpose(s) and Intent of the

Evaluation........................................................ 66

Framing the Right Questions ................................... 67

Designing the Evaluation ........................................ 68

Selecting and Adapting Instrumentation ................... 68

Collecting the Data ................................................. 69

Analyzing the Data ................................................. 70

Disseminating and Utilizing the Data ....................... 71

References ....................................................... 72

Other Recommended Reading ....................................... 74

Glossary....................................................................... 77

Appendix A. Finding an Evaluator ................................ 84

TABLE OF CONTENTS (CONTINUED)

List of Exhibits

Exhibit Page

1 The project development/evaluation cycle ...................... 4

2 Levels of evaluation...................................................... 7

3 Types of evaluation ...................................................... 8

4 Types of data gathering activities................................... 12

5 Logic model................................................................. 16

6 Conceptual model for Local Systemic Change

Initiatives (LSCs) ......................................................... 18

7 Identifying key stakeholders.......................................... 21

8 Goal and objective writing worksheet ............................ 23

9 Three types of errors and their remedies......................... 26

10a Matrix showing crosswalk of study foci and data

collection activities ....................................................... 29

10b Crosswalk of study sample and data collection activities . 30

11 Formal report outline .................................................... 40

12 Example of mixed-methods design ................................ 47

13 Advantages and disadvantages of surveys....................... 50

14 Advantages and disadvantages of interviews .................. 52

15 Which to use: Focus groups or indepth interviews?........ 54

16 Advantages and disadvantages of observations ............... 55

17 Advantages and disadvantages of tests........................... 57

18 Advantages and disadvantages of document studies ........ 59

19 Advantages and disadvantages of using key informants... 60

20 Advantages and disadvantages of using case studies ....... 61

I

NTRODUCTION

T

his Handbook was developed to provide managers working

with the National Science Foundation (NSF) with a basic guide

for the evaluation of NSF’s educational programs. It is aimed at

people who need to learn more about both what evaluation can do and

how to do an evaluation, rather than those who already have a solid

base of experience in the field. It builds on firmly established

principles, blending technical knowledge and common sense to meet

the special needs of NSF and its stakeholders.

The Handbook discusses quantitative and qualitative evaluation

methods, suggesting ways in which they can be used as complements

in an evaluation strategy. As a result of reading this Handbook, it is

expected that program managers will increase their understanding of

the evaluation process and NSF’s requirements for evaluation, as well

as gain knowledge that will help them to communicate with

evaluators and manage the actual evaluation.

To develop this Handbook, we have drawn on the similar handbooks

and tools developed for the National Science Foundation (especially

the 1993 User-Friendly Handbook for Project Evaluation and the

1997 User-Friendly Handbook for Mixed-Method Evaluations) and

the National Aeronautics and Space Administration. However,

special attention has been given to aligning the Handbook to NSF’s

unique needs and experiences. In addition, several NSF program

areas have been selected to provide concrete examples of the

evaluation issues discussed. The Handbook is divided into four major

sections:

• Evaluation and types of evaluation

• The steps in doing an evaluation

• An overview of quantitative and qualitative data collection

methods

• Strategies that address culturally responsive evaluation

We have also provided a glossary of commonly used terms as well as

references for those who might wish to pursue some additional

readings. Appendix A presents some tips for finding an evaluator. 1

References

Frechtling, J., Stevens, F., Lawrenz, F., and Sharp, L. (1993). The

User-Friendly Handbook for Project Evaluation: Science,

Mathematics and Technology Education. NSF 93-152.

Arlington, VA: NSF.

Frechtling, J., and Sharp, L. (1997). The User-Friendly Handbook for

Mixed-Method Evaluations. NSF 97-153. Arlington, VA: NSF.

2 1 EVALUATION AND TYPES OF EVALUATION

Section 1. REASONS FOR CONDUCTING EVALUATIONS

The notion of evaluation has been around a long time—in fact, the

Chinese had a large functional evaluation system in place for their

civil servants as long ago as 2000 B.C. In addition to its long history,

evaluation also has varied definitions and may mean different things

to different people. Evaluation can be seen as synonymous with tests,

descriptions, documents, or even management. Many definitions have

been developed, but a comprehensive definition presented by the Joint

Committee on Standards for Educational Evaluation (1994) holds that

evaluation is “systematic investigation of the worth or merit of an

object.”

This definition centers on the goal of using evaluation for a purpose.

Accordingly, evaluations should be conducted for action-related

reasons, and the information provided should facilitate deciding a

course of action.

Why should NSF grantees do evaluation? There are two very Evaluation

important answers to this question. First and foremost, provides

evaluation provides information to help improve the project. information to

Information on whether goals are being met and on how help improve a

different aspects of a project are working are essential to a project.

continuous improvement process. In addition, and equally

important, evaluation frequently provides new insights or new

information that was not anticipated. What are frequently called

“unanticipated consequences” of a program are among the most

useful outcomes of the assessment enterprise.

Over the years, evaluation has frequently been viewed as an

adversarial process. Its main use has been to provide a “thumbs- Evaluations

up” or “thumbs-down” about a program or project. In this role, need not be

it has all too often been considered by program or project conducted in an

directors and coordinators as an external imposition that is adversarial

threatening, disruptive, and not very helpful to project staff. mode.

While that may be true in some situations, evaluations need not

be, and most often are not, conducted in an adversarial mode.

The current view of evaluation stresses the inherent interrelationships

between evaluation and program implementation. Evaluation is not

separate from, or added to, a project, but rather is part of it from the

beginning. Planning, evaluation, and implementation are all parts of a

whole, and they work best when they work together. Exhibit 1 shows

the interaction between evaluation and other aspects of your NSF

project. 3

Exhibit 1.—The project development/evaluation cycle

Project

planning/modification Needs assessment and

Project evaluation collection of baseline data

Project implementation

Second, evaluation provides information for communicating to

a variety of stakeholders. It allows projects to better tell their

Evaluation story and prove their worth. It also gives managers the data they

provides need to report “up the line,” to inform senior decisionmakers

information for about the outcomes of their investments. This notion of

communicating to reporting on the outcomes of federal investments has received

a variety of increased emphasis over the last several years with the

establishment of the Government Performance and Results Act

stakeholders. (GPRA). GPRA requires federal agencies to report annually on

the accomplishments of their funded efforts. This requirement

includes establishing broad goals or strategic outcomes, performance

outcomes, and performance indicators against which progress will be

assessed. GPRA goes beyond counts of who is funded or who is

served, placing the focus instead on results or impacts of the federal

investment. In response, NSF has chosen to focus on three general

1

strategic outcomes:

• Developing a diverse internationally competitive and globally

engaged workforce of scientists, engineers, and well-prepared

citizens;

• Enabling discoveries across the frontiers of science and

engineering connected to learning, innovations, and service to

society; and

• Providing broadly accessible, state-of-the-art information bases

and shared research and education tools.

Projects will be asked to provide data on their accomplishments in

these areas, as relevant. Detailed requirements for the information to

be provided have been developed on a program-by-program basis.

1 NSF, FY 2002 GPRA Performance Plan, April 19, 2001, p. 2.

4 Project directors should keep GPRA and these strategic outcomes in

mind in developing plans for project evaluation (more information on

NSF’s approach to GPRA can be found at www.nsf.gov/od/gpra/

start.htm). 5

2. EVALUATION PROTOTYPES

The purpose of this chapter is to provide a grounding in evaluation

and to discuss the kinds of information evaluation can provide. We

start with the assumption that the term “evaluation” describes

different models or data collection strategies to gather information at

different stages in the life of a project. A major goal of this chapter is

to help project directors and principal investigators understand what

these are and how to use them.

As we undertake this discussion, it is important to recognize that

within NSF there are two basic levels of evaluation: program

evaluation and project evaluation. While this handbook is directed at

the latter, it is important to understand what is meant by both. Let’s

start by defining terms and showing how they relate.

A program is a coordinated approach to exploring a specific area

related to NSF’s mission of strengthening science, mathematics, and

technology. A project is a particular investigative or developmental

activity funded by that program. NSF initiates a program on the

assumption that an agency goal (such as increasing the strength and

diversity of the scientific workforce) can be attained by certain

educational activities and strategies (for example, providing supports

to selected groups of undergraduate students interested in science or

mathematics). The Foundation then funds a series of discrete projects

to explore the utility of these activities and strategies in specific

situations. Thus, a program consists of a collection of projects that

seek to meet a defined set of goals and objectives.

Now let’s turn to the terms “program evaluation” and “project

evaluation.” A program evaluation determines the value of this

collection of projects. It looks across projects, examining the utility of

the activities and strategies employed. Frequently, a full-blown

program evaluation may be deferred until the program is well

underway, but selected data on interim progress are collected on an

annual basis. Project evaluation, in contrast, focuses on an individual

project funded under the umbrella of the program. The evaluation

provides information to improve the project as it develops and

progresses. Information is collected to help determine whether the

project is proceeding as planned and whether it is meeting its stated

program goals and project objectives according to the proposed

timeline. Ideally, the evaluation design is part of the project proposal,

and data collection begins soon after the project is funded. Data are

examined on an ongoing basis to determine if current operations are

satisfactory or if some modifications might be needed.

Project evaluations might also include examination of specific critical

components, as shown in Exhibit 2. A component of a project may be

a specific teacher training approach, a classroom practice, or a

6 governance strategy. An evaluation of a component frequently looks

to see the extent to which its goals have been met (these goals are a

subset of the overall project goals), and to clarify the extent to which

the component contributes to the success or failure of the overall

project.

Exhibit 2.—Levels of evaluation PROGRAM

PROJECT PROJECT PROJECT

Component Component Component Component Component Component

The information in this Handbook has been developed primarily for

the use of project directors and principal investigators, although

project evaluators may also find it useful. Our aim is to provide tools

that will help those responsible for the examination of individual

projects gain the most from their evaluation efforts. Clearly, however,

these activities will also benefit program studies and the work of the

Foundation in general. The better the information is about each of

NSF’s projects, the more we can all learn.

The Different Kinds of Evaluation

Educators typically talk about two kinds or stages of evaluation—

formative evaluation and summative evaluation. The purpose of a

formative evaluation is to assess initial and ongoing project activities.

The purpose of a summative evaluation is to assess the quality and

impact of a fully implemented project (see Exhibit 3). 7

Exhibit 3.—Types of evaluation Evaluation

Formative Summative

Implementation Progress

Early stages Later stagess

Time

Formative Evaluation

Formative evaluation begins during project A formative

development and continues throughout the life of the evaluation

project. Its intent is to assess ongoing project activities assesses ongoing

and provide information to monitor and improve the

project. It is done at several points in the project activities.

developmental life of a project and its activities.

According to evaluation theorist Bob Stake,

“When the cook tastes the soup, that’s formative;

When the guests taste the soup, that’s summative.”

Formative evaluation has two components: implementation evaluation

and progress evaluation.

Implementation Evaluation. The purpose of The purpose of

implementation evaluation is to assess whether the implementation

project is being conducted as planned. This type of evaluation is to

evaluation, sometimes called “process evaluation,” may

occur once or several times during the life of the assess whether

program. The underlying principle is that before you can the project is

being conducted

evaluate the outcomes or impact of a program, you must as planned.

make sure the program and its components are really

operating and, if they, are operating according to the

proposed plan or description.

A serie s of implementation questions guides an implementation

evaluation. For example, questions that might be posed for the NSF

Louis Stokes Alliances for Minority Participation (LSAMP) are as

follows:

8 • Were appropriate students selected? Were students with deficits

in precollege preparation included as well as ones with stronger

records? Was the makeup of the participant group consistent

with NSF’s goal of developing a more diverse workforce?

• Were appropriate recruitment strategies used? Were students

identified early enough in their undergraduate careers to

provide the transitional supports needed?

• Do the activities and strategies match those described in the

plan? Were students given both academic and personal

supports? To what extent were meaningful opportunities to

conduct research provided?

• Was a solid project management plan developed and followed?

Sometimes the terms “implementation evaluation” and “monitoring

evaluation” are confused. They are not the same. An implementation

evaluation is an early check by the project staff, or the evaluator, to

see if all essential elements are in place and operating. Monitoring is

an external check. The monitor typically comes from the funding

agency and is responsible for determining progress and compliance on

a contract or grant for the project. Although the two differ,

implementation evaluation, if effective, can facilitate project

implementation and ensure that there are no unwelcome surprises

during monitoring.

Progress Evaluation. The purpose of a progress evaluation is

The purpose to assess progress in meeting the goals of the program and the

of a progress project. It involves collecting information to learn whether or

evaluation is not the benchmarks of participant progress were met and to

to assess point out unexpected developments. Progress evaluation

progress in collects information to determine what the impact of the

meeting the activities and strategies is on participants, curriculum, or

goals. institutions at various stages of the intervention. By measuring

progress, program staff can eliminate the risk of waiting until

participants have experienced the entire program to assess likely

outcomes. If the data collected as part of the progress evaluation fail

to show expected changes, the information can be used to fine tune

the project. Data collected as part of a progress evaluation can also

contribute to, or form the basis for, a summative evaluation conducted

at some future date. In a progress evaluation of the LSAMP program,

the following questions can be addressed:

• Are the participants moving toward the anticipated goals of the

project? Are they enhancing their academic skills? Are they

gaining confidence in themselves as successful learners? Are

they improving their understanding of the research process?

• Are the numbers of students reached increasing? How do

changes in project participation relate to changes in the overall

enrollments in mathematics, science, and technology areas at 9

their institutions? Are students being retained in their programs

at an increasing rate?

• Does student progress seem sufficient in light of the long range

goals of the program and project to increase the number of

traditionally underrepresented students who receive degrees in

science, mathematics, or technology?

Progress evaluation is useful throughout the life of the project, but is

most vital during the early stages when activities are piloted and their

individual effectiveness or articulation with other project components

is unknown.

Summative Evaluation The purpose of

The purpose of summative evaluation is to assess a summative

mature project’s success in reaching its stated goals. evaluation is to

Summative evaluation (sometimes referred to as assess a mature

impact or outcome evaluation) frequently addresses project’s success

many of the same questions as a progress evaluation, in reaching its

but it takes place after the project has been stated goals.

established and the timeframe posited for change has

occurred. A summative evaluation of an LSAMP

project might address these basic questions:

• To what extent does the project meet the stated goals for

change or impact?

• Are greater numbers of students from diverse backgrounds

receiving bachelor’s of science degrees and showing increased

interest in scientific careers?

• Are there any impacts on the schools participants attend? Are

there any changes in courses? Are there any impacts of the

LSAMP program on overall course offering and support

services offered by their institution(s)?

• Which components are the most effective? Which components

are in need of improvement?

• Were the results worth the program’s cost? Summative

evaluation collects

• Can the program be sustained? information about

• Is the program replicable and transportable? outcomes and

related processes,

strategies, and

activities that have

led to them.

10 Summative evaluation collects information about outcomes and related

processes, strategies, and activities that have led to them. The evaluation

is an appraisal of worth, or merit. Usually this type of evaluation is

needed for decisionmaking. The decision alternatives may include the

following: disseminate the intervention to other sites or agencies;

continue funding; increase funding; continue on probationary status;

modify and try again; and discontinue.

In most situations, especially high-stakes situations or situations that are

politically charged, it is important to have an external evaluator who is

seen as objective and unbiased. Appendix A provides some tips for

finding an evaluator. If this is not possible, it is better to have an internal

evaluation than none at all. One compromise between the external and

the internal model is to conduct an internal evaluation and then hire an

outside agent to both review the design and assess the validity of the

findings and conclusions.

When conducting a summative evaluation, it is important to consider

unanticipated outcomes. These are findings that emerge during data

collection or data analyses that were never anticipated when the study

was first designed. For example, consider an NSF program providing

professional development activit ies for teacher leaders. An evaluation

intended to assess the extent to which participants share their new

knowledge and skills with their school-based colleagues might uncover a

relationship between professional development and attrition from the

teaching force. These results could suggest new requirements for

participants or cautions to bear in mind.

Evaluation Compared to Other Types of Data Gathering Activities

It is useful to understand how evaluation complements, but

may differ from, other types of data collection activities that Evaluation

complements

provide information on accountability for an NSF-funded but is different

project. Exhibit 4 shows various types of data collection from other

activities, each of which provides somewhat different kinds of

information and serves somewhat differing purposes. The

continuum includes descriptive statistics, performance data collection

indicators, formative evaluation, summative evaluation, and activities.

research studies.

At the center of the effort is the project description, which provides

general information about a project. These data are commonly used to

monitor project activities (e.g., funding levels, total number of

participants), to describe specific project components (e.g., duration of

program activity, number of participants enrolled in each activity), and to

identify the types of individuals receiving services. Descriptive

information may be collected annually or even more frequently to 11

provide a basic overview of a project and its accomplishments. Obtaining

descriptive information usually is also part of each of the other data

gathering activities depicted. NSF has developed the FASTLANE system

as one vehicle for collecting such statistics.

FASTLANE allows for basic data to be collected across all programs in a

consistent and systematic fashion. In addition, some programs have

added program-specific modules aimed at collecting tailored data

elements.

Exhibit 4.—Types of data gathering activities

Formative Evaluation

Project Description Basic Research

Performance

Indicators Basic Research

Summative Evaluation

Formative and summative evaluatio ns are intended to gather information

to answer a limited number of questions. Evaluations include descriptive

information, but go well beyond that. Generally, formative and

summative evaluations include more indepth data collection activities,

are intended to support decisionmaking, and are more costly.

Performance indicators fall somewhere between general program

statistics and formative/summative evaluations. A performance indicator

system is a collection of statistics that can be used to monitor the

ongoing status of a program against a set of targets and metrics. Going

beyond descriptive statistics, performance indicators begin to provide

information that can be measured against a set of goals and objectives.

Indicator systems are typically used to focus policymakers, educators,

and the public on (1) key aspects of how an educational program is

operating, (2) whether progress is being made, and (3) where there are

problems (Blank, 1993). Because performance indicators focus on

tangible results, they often go beyond traditional reviews of program

expenditures and activity levels. In fact, the term “performance”

underscores the underlying purpose of indicator systems, i.e., to examine

a program’s accomplishments and measure progress toward specific

12 goals. Performance indicators provide a snapshot of accomplishments in

selected areas; however, in contrast to evaluations, the information is

limited and is unlikely to provide an explanation of why a project may

have succeeded or failed.

Research studies include descriptive information and provide targeted

indepth exploration of issues, but differ along other dimensions. Instead

of being intended for decisionmaking, research efforts typically are

designed to explore conceptual models and alternative explanations for

observed relationships.

Summary

The goal of evaluation is to determine the worth or merit of some

procedure, project, process, or product. Well-designed evaluations also

provide information that can help explain the findings that are observed.

In these days of reform, educators are continually faced with the

challenges of evaluating their innovations and determining whether

progress is being made or stated goals have, in fact, been reached. Both

common sense and accepted professional practic e would suggest a

systematic approach to these evaluation challenges. The role that

evaluation may play will vary depending on the timing, the specific

questions to be addressed, and the resources available. It is best to think

of evaluation not as an event, but as a process. The goal should be to

provide an ongoing source of information that can aid decisionmaking at

various steps along the way.

References

Blank, R. (1993) Developing a System of Education Indicators:

Selecting, Implementing, and Reporting Indicators. Educational

Evaluation and Policy Analysis, 15 (1, Spring): 65-80. 13

14

II

Section THE STEPS IN DOING AN EVALUATION

3. THE EVALUATION PROCESS—

GETTING STARTED

In the preceding chapter, we outlined the types of evaluations that

should be considered for NSF’s programs. In this chapter, we talk

further about how to carry out an evaluation, expanding on the steps

in evaluation design and development. Our aim is to provide an

orientation to some of the basic language of evaluation, as well as to

share some hints about technical, practical, and political issues that

should be kept in mind when conducting evaluation studies.

Whether they are summative or formative, evaluations can be thought

of as having six phases:

• Develop a conceptual model of the program and identify key

evaluation points

• Develop evaluation questions and define measurable outcomes

• Develop an evaluation design

• Collect data

• Analyze data

• Provide information to interested audiences

Getting started right can have a major impact on the Getting started

progress and utility of the evaluation all along the right can have

way. However, all six phases are critical to a major impact

providing useful information. If the information on the progress

gathered is not perceived as valuable or useful (the and utility of

wrong questions were asked), or the information is the evaluation

not seen to be credible or convincing (the wrong all along the

techniques were used), or the report is presented too way.

late or is not understandable (the teachable moment

is past), then the evaluation will not contribute to

the decisionmaking process.

In the sections below, we provide an overview of the first three

phases, which lay the groundwork for the evaluation activities that

will be undertaken. The remaining three phases are discussed in

Chapter 4. 15

Develop a Conceptual Model of the

Project and Identify Key Evaluation Points

Every proposed evaluation should start with a conceptual model to

which the design is applied. This conceptual model can be used both

to make sure that a common understanding about the project’s

structure, connections, and expected outcomes exists, and to assist in

focusing the evaluation design on the most critical program elements.

Exhibit 5 presents the shell for a particular kind of conceptual model,

2

a “logic model.” The model describes the pieces of the project and

expected connections among them. A typical model has four

categories of project elements that are connected by directional

arrows. These elements are:

• Project inputs

• Activities

• Short-term outcomes

• Long-term outcomes

Exhibit 5.—Logic model

Inputs Activities Short-Term Outcomes Long-Term Outcomes

2 There are several different ways to show a logic model. The model presented here is one that

has been useful to the author.

16 Project inputs are the various funding sources and resource streams

that provide support to the project. Activities are the services,

materials, and actions that characterize the project’s thrusts. Short-

term impacts are immediate results of these activities. Long-term

outcomes are the broader and more enduring impacts on the system.

These impacts will reflect NSF’s strategic outcomes discussed on

page 4. A logic model identifies these program elements and shows

expected connections among them. PIs and PDs may find this model

useful not only for evaluation but also for program management. It

provides a framework for monitoring the flow of work and checking

whether required activities are being put in place.

The first step in doing an evaluation is to describe the project in terms

of the logic model.

• One set of inputs is the funds that NSF provides. Other inputs

may come from other federal funding sources, local funding

sources, partnerships, and in-kind contributions.

• The activities depend on the focus of the project. Potential

activities include the development of curricula and materials,

provision of professional development, infrastructure

development, research experiences, mentoring by a senior

scientist, or public outreach, alone or in combinations.

• Short-term outcomes in a variety of shapes and sizes. One type

of outcome is sometimes called an “output.” An output is an

accounting of the numbers of people, products, or institutions

reached. For example, an output of a professiona l development

program for teachers could be “200 teachers trained.” The

output of a research program could be “17 students received

mentoring from NSF scientists.” The other type of outcome

looks at short-term changes that result from the experience.

Such an outcome might be “reported sense of renewal” for a

teacher given professional development support or “an impact

on choice of major” for an undergraduate receiving a research

experience.

• Long-term outcomes are the changes that might not be

expected to emerge until some time after the experience with

the project. To continue with the examples provided above, a

long-term outcome of professional development could be

“changes in instructional practice reflective of a standards-

based approach.” For the undergraduate student, “selecting a

career in NSF-related research activity” would be a comparable

outcome.

The logic model shows a process that flows from inputs to long-term

outcomes. In developing a model for your project, it may be useful to

reverse this flow. That is, project teams frequently find it more useful

to “work backwards,” starting from the long-term outcome desired 17

18 Exhibit 6.—Conceptual model for Local Systemic Change Initiatives (LSCs)

Inputs Activities Short-Term Outcomes Long-Term Outcomes

Adoption of High- Effective Use of New Institutionalization of

NSF Funds Quality Curricula and Materials and Curricula Challenging Instruction

Materials Enhanced Student

Formation of Extended Adoption of New Learning and

Local and State Funds Standards-Based Pedagogies That Performances

Professional Encourage Inquiry

Development and Problem Solving Improved Student

Other Professional Review of Instruction Tailored to Achievement

the Needs of Diverse

Development Grants New Polic ies Populations

and then determining critical conditions or events that will need to be

established before these outcomes might be expected to occur. Exhibit 6

shows a preliminary conceptual model for one of NSF’s major

professional development pr ograms, Local Systemic Change Initiatives

(LSCs) projects.

Under “inputs,” we have listed three streams of funding:

• NSF funds

• Local and state funds

• Other professional development grants

For “activities,” we have highlighted:

• Adoption of high-quality curricula and materials

• Provision of extended standards-based professional development

• Review of new policies

The short-term outcomes are linked to, and flow from, the overall goals

of the LSCs. Thus, we would look for:

• Effective use of new materials and curricula

• Adoption of new pedagogies that encourage inquiry and problem

solving

• Instruction tailored to the individual needs or students from

diverse populations

Finally, over time, the LSCs should result in:

• Consistently challenging instruction for all students

• Enhanced student learning and performance

• Higher scores on assessments of student achievement

Once this logic model is developed and connections are established, the

next step is to clarify the timing for when the activities and impacts

would be expected to emerge. This is an area that should have been

addressed during the project’s planning phase, and determining expected

timeframes should be a revisiting of decisions rather than a set of new

considerations. However, either because some aspect was overlooked in

the initial discussions or some conditions have changed, it is important to

review the time schedule and make sure that the project is willing to be

held accountable for the target dates. Finally, the model can be used to 19

identify critical achievements as indicated by the logic model and

critical timeframes that need to met. These provide the starting point for

the next step, developing the evaluation questions.

Develop Evaluation Questions and Define Measurable Outcomes

The development of evaluation questions builds on the conceptual model

and consists of several steps:

• Identifying key stakeholders and audiences

• Formulating potential evaluation questions of interest to the

stakeholders and audiences

• Defining outcomes in measurable terms

• Prioritizing and eliminating questions

While it is obvious that NSF program managers and the directors of

individual projects are key stakeholders in any project, it is important in

developing the evaluation design to go beyond these individuals and

consider other possible audiences and their needs for information. In all

projects, multiple audiences exist. Such audiences may include the

participants, would-be participants, community members, NSF scientists,

school administrators, parents, etc. Further, some of the audiences may

themselves be composed of diverse groups. For example, most

educational interventions address communities made up of families from

different backgrounds with different belief structures. Some are

committed to the status quo; others may be strong advocates for change.

In developing an evaluation, it is important to identify

It is important to stakeholders early in the design phase and draw upon their

identify knowledge as the project is shaped. A strong stakeholder

stakeholders early group can be useful at various points in the project—

in the design shaping the questions addressed, identifying credible

phase. sources of evidence, and reviewing findings and assisting

in their interpretation.

Although, in most cases, key stakeholders will share a number of

information needs (in a professional development program the impacts

on teaching quality will be of interest to all), there may be audience-

specific questions that also need to be considered. For example, while

exposure to the new technologies in an NSF lab may provide teachers

with important new skills, administrators may be concerned not only

with how the introduction of these skills may impact the existing

curriculum, but also in the long-term resource and support implications

for applying the new techniques. Depending on the situation and the

political context in which a project is being carried out, a judicious mix

of cross-cutting and audience-specific issues may need to be included.

20 Exhibit 7 presents a shell for organizing your approach to identifying

stakeholders and their specific needs or interests.

Exhibit 7.—Identifying key stakeholders Describe the particular values,

interests, expectations, etc.,

List the audiences for your Identify persons/spokespersons that may play a key role as

evaluation for each audience criteria in the analysis and

interpretation stage of your

evaluation

The process of identifying potential information needs usually results in

many more questions than can be addressed in a single evaluation effort.

This comprehensive look at potential questions, however, makes all of

the possibilities explicit to the planners of the evaluation and allows them

to make an informed choice among evaluation questions. Each potential

question should be considered for inclusion on the basis of the following

criteria:

• The contribution of the information to the goals of NSF and the

projects’ local stakeholders

• Who would use the information

• Whether the answer to the question would provide information

that is not now available

• Whether the information is important to a major group or several

stakeholders

• Whether the information would be of continuing interest 21

• How the question can be translated into measurable terms

• How it would be possible to obtain the information, given

financial and human resources

These latter two points require some additional explanation. First, the

question of measurability. There are some evaluation questions that

while clearly important, are very challenging to address because of the

difficulty of translating an important general goal into something that can

be measured in a reliable and valid way. For example, one of the goals of

a summer research experience for teachers might be generally stated “to

increase the extent to which teachers use standards-based instruction in

their science teaching.” To determine whether or not this goal is met, the

evaluation team would have to define an indicator or indicators of

standards-based instruction, establish a goal for movement on the part of

the teachers, and then set interim benchmarks for measuring success. A

variety of possible articulations exist. One could talk about the

percentage of teachers moving through various levels of proficiency in

standards-based instruction (once those levels were established); or the

outcome could be measured in terms of the percentage of time devoted to

different practices; or understanding, rather than actual practice, could be

examined. Each approach probably has strengths and weaknesses. The

critical thing, however, is determining a shared definition of what is

meant and what will be accepted as credible evidence of project success.

Exhibit 8 illustrates the steps to translating a general goal into a

measurable objective.

A particular challenge in developing measurable objectives is

determining the criteria for success. That is, deciding how much change

is enough to declare the result important or valuable. The classical

approach to this question is to look for changes that are statistically

significant, i.e., typically defined as unlikely to occur by chance in more

than 1 to 5 percent of the observations. While this criterion is important,

statistical significance may not be the only or even the best standard to

use. If samples are large enough, a very small change can be statistically

significant. When samples are very small, achieving statistical

significance may be close to impossible.

What are some ways of addressing this problem? First, for very large

samples, “effect size” is frequently used as a second standard against

which to measure the importance of an outcome. Using this approach,

the change is measured against the standard deviation, and only those

significant outcomes that result in a change that exceed one-third of a

standard deviation are considered meaningful. Second, it may be

possible to use previous history as a way of determining the importance

of a statistically significant result. The history can provide a realistic

baseline against which the difference made by a project can be assessed.

22

Exhibit 8.—Goal and objective writing worksheet

G O W

OAL AND BJECTIVE ORKSHEET

1. Briefly describe the purpose of the project.

2. State the above in terms of a general goal:

3. State an objective to be evaluated as clearly as you can:

4. Can this objective be broken down further? Break it down to the smallest unit. It must be

clear what specifically you hope to see documented or changed.

5. Is this objective measurable (can indicators and standards be developed for it)?

If not, restate it.

6. Once you have completed the above steps, go back to #3 and write the next objective.

Continue with steps 4, and 5, and 6. 23

Third, with or without establishing statistical significance, expert

judgment may be called on as a resource. This is a place where

stakeholder groups can again make a contribution. Using this approach,

standards are developed after consultation with differing stakeholder

groups to determine the amount of change each would need to see to find

the evidence of impact convincing.

There is also the issue of feasibility given resources. Three kinds of

resources need to be considered: time, money, and staff capability. The

presence or absence of any of these strongly influences whether or not a

particular question can be addressed in any given evaluation.

Specifically, there are some questions that may require specialized

expertise, extended time, or a large investment of resources. In some

cases, access to these resources may not be readily available. For

example, it might be considered useful conceptually to measure the

impact of a student’s research experience in terms of the scientific merit

of a project or presentation that the student completes before the end of a

summer program. However, unless the evaluation team includes

individuals with expertise in the particular content area in which the

student has worked, or can identify consultants with the expertise,

assessing scientific merit may be too much of a stretch. Under these

circumstances, it is best to eliminate the question or to substitute a

reasonable proxy, if one can be identified. In other cases, the evaluation

technique of choice may be too costly. For example,

A general classroom observations are valuable if the question of

guideline is to interest is “How has the LSC affected classroom practices?”

allocate 5 to But observations are both time-consuming and expensive.

10 percent of If sufficient funds are not available to carry out

project cost for observations, it may be necessary to reduce the sample size

the evaluation. or use another data collection technique such as a survey. A

general guideline is to allocate 5 to 10 percent of project

cost for the evaluation.

Develop an Evaluation Design

The next step is developing an evaluation design. Developing the design

includes:

• Selecting a methodological approach and data collection

instruments

• Determining who will be studied and when

Selecting a Methodological Approach

In developing the design, two general methodological approaches—

quantitative and qualitative—frequently have been considered as

alternatives. Aside from the obvious distinction between numbers

(quantitative) and words (qualitative), the conventional wisdom among

24 evaluators is that quantitative and qualitative methods have different

strengths, weaknesses, and requirements that will affect evaluators’

decisions about which are best suited for their purposes.

In Chapter 5 we review the debate between the protagonists of each of

the methods and make a case for what we call a “mixed-method” design.

This is an approach that combines techniques traditionally labeled

“quantitative” with those traditionally labeled “qualitative” to develop a

full picture of why a project may or may not be having hoped-for results

and to document outcomes. There are a number of factors that need to be

considered in reaching a decision regarding the methodologies that will

be used. These include the questions being addressed, the timeframe

available, the skills of the existing or potential evaluators, and the type of

data that will be seen as credible by stakeholders and critical audiences.

Determining Who Will be Studied and When

Developing a design also requires considering factors such as sampling,

use of comparison groups, timing, sequencing, and frequency of data

collection.

Sampling. Except in rare cases when a project is very small and affects

only a few participants and staff members, it is necessary to deal with a

subset of sites and/or informants for budgetary and managerial reasons.

Sampling thus becomes an issue in the development of an evaluation

design. And the approach to sampling will frequently be influenced by

the type of data collection method that has been selected.

The preferred sampling methods for quantitative studies are those that

enable evaluators to make generalizations from the sample to the

universe, i.e., all project participants, all sites, all parents. Random

sampling is the appropr iate method for this purpose. However, random

sampling is not always possible.

The most common misconception about sampling is that

large samples are the best way of obtaining accurate When planning

findings. While it is true that larger samples will reduce allocation of

sampling error (the probability that if another sample of resources,

the same size were drawn, different results might be evaluators should

obtained), sampling error is the smallest of the three give priority to

components of error that affect the soundness of sample procedures that will

designs. Two other errors—sample bias (primarily due reduce sample bias

to loss of sample units) and response bias (responses or and response bias,

observations that do not reflect “true” behavior, rather than to the

characteristics or attitudes)—are much more likely to selection of larger

jeopardize validity of findings (Sudman, 1976). When samples.

planning allocation of resources, evaluators should give

priority to procedures that will reduce sample bias and

response bias, rather than to the selection of larger

samples. 25

Let’s talk a little more about sample and response bias. Sample bias

occurs most often because of nonresponse (selected respondents or units

are not available or refuse to participate, or some answers and

observations are incomplete). Response bias occurs because questions

are misunderstood or poorly formulated, or because respondents

deliberately equivocate (for example, to protect the project being

evaluated). In observations, the observer may misinterpret or miss what

is happening. Exhibit 9 describes each type of bias and suggests some

simple ways of minimizing them.

Exhibit 9.—Three types of errors and their remedies

Type Cause Remedies

Sampling Error Using a sample, not the entire Larger samples—these reduce but do not

population to be studied. eliminate sampling error.

Sample Bias Some of those selected to Repeated attempts to reach nonrespondents.

participate did not do so or Prompt and careful editing of completed

provided incomplete information. instruments to obtain missing data;

comparison of characteristics of non-

respondents with those of respondents to

describe any suspected differences that may

exist.

Response Bias Responses do not reflect “true” Careful pretesting of instruments to revise

opinions or behaviors because misunderstood, leading, or threatening

questions were misunderstood or questions. No remedy exists for deliberate

respondents chose not to tell the equivocation in self-administered interviews,

truth. but it can be spotted by careful editing. In

personal interviews, this bias can be reduced

by a skilled interviewer.

Statistically valid generalizations are seldom a goal of qualitative

evaluation; rather, the qualitative investigation is primarily interested in

locating information-rich cases for study in depth. Purposeful sampling is

therefore practiced, and it may take many forms. Instead of studying a

random sample or a stratified sample of a project’s participants, an

evaluation may focus on the lowest achievers admitted to the program, or

those who have never participated in a similar program, or participants

from related particular regions. In selecting classrooms for observation of

the implementation of an innovative practice, the evaluation may use

deviant-case sampling, choosing one classroom where the innovation is

reported as “most successfully” implemented and another where major

problems are reported. Depending on the evaluation questions to be

answered, many other sampling methods, including maximum variation

sampling, critical case sampling, or even typical case sampling, may be

appropriate (Patton, 1990). The appropriate size of the sample may also

differ when the different methodologies are adopted, with precision in

numbers based on statistical considerations playing a much larger role

for the quantitative approach.

26 In many evaluations, the design calls for studying a population at several

points in time, e.g., students in the 9th grade and then again in the 12th

grade. There are two ways to do this. In a longitudinal approach, data are

collected from the same individuals at designated time intervals; in a

cross-sectional approach, new samples are drawn for each successive

data collection. While longitudinal designs that require collecting

information from the same students or teachers at several points in time

are best in most cases, they are often difficult and expensive to carry out

both because students and teachers move and because linking

individuals’ responses over time is complicated. Furthermore, loss of

respondents because of failure to locate or to obtain cooperation from

some segments of the original sample is often a major problem.

Depending on the nature of the evaluation and the size of the population

studied, it may be possible to obtain good results with cross-sectional

designs.

Comparison Groups. In project evaluation, especially summative

evaluation, the objective is to determine whether or not a set of

experiences or interventions results in a set of expected outcomes. The

task is not only to show that the outcomes occurred, but to make the case

that the outcomes can be attributed to the intervention and not to some

other factors. In classical evaluation design, this problem of attribution is

addressed by creating treatment and control or comparison groups and

randomly assigning the potential pool of participants to these varying

conditions. In the ideal world, project evaluators would like to be able to

adopt this same approach and examine program impacts under well-

controlled experimental conditions. Unfortunately, in most real-world

applications and most NSF projects, these conditions simply cannot be

created.

There are two basic problems: first, there is self- In designing an

selection. Teachers, students, and faculty participate in evaluation it is

NSF efforts because they choose to, by and large. important to

While there may be circumstances under which a address, rather

participant is encouraged or even coerced into than ignore, the

participating, that is likely to be the exception. Thus, attribution

there is reason to believe that those who volunteer or question.

seek out programs are different from those who don’t.

Second, it is frequently difficult to identify a valid

comparison group and obtain its cooperation with study

efforts. The more elaborate and potentially intrusive the

evaluation, the more difficult the task.

There is no perfect way to solve the problem, but in designing an

evaluation it is important to address, rather than ignore, the attribution

question. Sometimes this is possible by drawing a comparison group

from a waiting list (when one exists) and comparing those who

participated with those who self-selected but applied too late. Assuming

that the groups are found to be equivalent on critical variables that might

be associated with the outcome of interest, it is possible to relate

differences to differences in program experiences. 27

In other cases, it may be possible to use historical data as a benchmark

against which to measure change, such as comparing a school’s previous

test score history to test scores after some experience or intervention has

taken place. If the historical approach is adopted, it is important to rule

out other events occurring over time that might also account for any

changes noted. In dealing with student outcomes, it is also important to

make sure that the sample of students is sufficiently large to rule out

differences associated with different cohorts of students. To avoid what

might be called a “crop effect,” it is useful to compare average outcomes

over several cohorts before the intervention with average outcomes for

multiple cohorts after the intervention.

A third alternative is to look for relationships between levels of

implementation of some program and the outcome variable(s) of interest

(Horizon and Westat, 2001). To some extent, a set of internal comparison

groups is created by drawing on actual implementation data or a

surrogate such as years in the program or level of treatment. For

example, in a teacher enhancement project where teachers received

different amounts of professional development, subgroups could be

created (derived from teacher surveys and/or classroom observation) to

categorize classrooms into high, medium, and low implementation status.

With this approach, the outcome of interest would be differences among

the project subgroups. It is assumed in this design that there is generally

a linear relationship between program exposure or implementation and

change along some outcome dimension. The evaluation thus examines

the extent to which differences in exposure or implementation relate to

changes in outcomes.

Finally, checking the actual trajectory of change against the conceptual

trajectory, as envisioned in the logic model, often provides support for

the likelihood that impacts were in fact attributable to project activities.

Timing, Sequencing, Frequency of Data Collection, and Cost. The

evaluation questions and the analysis plan largely determine when data

should be collected and how often various data collections should be

scheduled. In mixed-method designs, when the findings of

Project qualitative data collection affect the structuring of quantitative

evaluations instruments (or vice versa), proper sequencing is crucial. As a

are strongest general rule, project evaluations are strongest when data are

when data collected at least two points in time: before an innovation is

are collected first introduced, and after it has been in operation for a sizable

in at least two period of time. Studies looking at program sustainability need

points in time. at least one additional point of evidence: data on the program

after it has been established and initial funding is completed.

All project directors find that both during the design phase, when plans

are being crafted, and later, when fieldwork gets underway, some

modifications and tradeoffs may become necessary. Budget limitations,

problems in accessing fieldwork sites and administrative records, and

28 difficulties in recruiting staff with appropriate skills are among the

recurring problems that should be anticipated as far ahead as possible

during the design phase, but that also may require modifying the design

at a later time.

What tradeoffs are least likely to impair the integrity and usefulness of an

evaluation, if the evaluation plan as designed cannot be fully

implemented? A good general rule for dealing with budget problems is to

sacrifice the number of cases or the number of questions to be explored

(this may mean ignoring the needs of some low-priority stakeholders),

but to preserve the depth necessary to fully and rigorously address the

issues targeted.

Once decisions are reached regarding the actual aspects of your

evaluation design, it is useful to summarize these decisions in a design

matrix. Exhibit 10 presents the shell for each matrix using the Minority

Research Fellowship Program as an illustrative example. This matrix is

also very useful later on when it is time to write a final report (see

Chapter 4).

Exhibit 10a.—Matrix showing crosswalk of study foci and data collection activities

Data collection activities

Study focus Document M ail Telephone Bibliometric National data

review survey interviews measures analysis

What did MRFP awardees do during their ü ü ü

award period? In an extension if granted?

Specifically, and as appropriate for

postdoctoral scholars, to what extent have

the individual research projects of the

postdoctoral Fellows achieved their ü ü ü ü

narrower and immediate scientific goals?

To what extent is this reflected in the

formal scientific record as publications and

presentations?

How if at all did MRFP awardees use their ü ü ü

experience to shape their career direction

and development?

How do employment and activity patterns

among MRFP awardees compare with

patterns in national data on Ph.D.

recipients who have been postdoctoral

researchers? How does the NSF proposal ü ü ü

and award history of MRFP awardees

compare with that of other faculty

members who received Ph.D.s in the fields

and time period covered by the MRFP

awardees? 29

Exhibit 10b.—Crosswalk of study sample and data collections activities

Data collection activities

Study sample Document Mail Telephone Bibliometric National data

review survey interviews measures analysis

ü ü ü ü

All MRFP awardees (n=157) ü

Sample of MRFP awardees (n=30)

References

Horizon and Westat. (2001). Revised Handbook for Studying the Effects

of the LSC on Students. Rockville, MD: Westat.

Patton, M.Q. (1990). Qualitative Evaluation and Research Methods, 2nd

Ed. Newbury Park, CA: Sage.

Sudman, S. (1976). Applied Sampling. New York: Academic Press.

30 4. THE EVALUATION PROCESS:

CARRYING OUT THE STUDY AND REPORTING

In this section we discuss the steps to be undertaken after a design has

been developed:

• Data collection

• Data analysis

• Reporting

• Dissemination

Conducting Data Collection

Once the appropriate information-gathering techniques have been

determined, the information must be gathered. Both technical and

political issues need to be addressed.

• Obtain necessary clearances and permission.

• Consider the needs and sensitivities of the respondents.

• Make sure your data collectors are adequately trained and will

operate in an objective, unbiased manner.

• Obtain data from as many members of your sample as possible.

• Cause as little disruption as possible to the ongoing effort.

First, before data are collected, the necessary clearances Many groups,

and permission must be obtained. Many groups, especially school

especially school systems, have a set of established systems, have a

procedures for gaining clearance to collect data on set of established

students, teachers, or projects. This may include procedures for

identification of persons to receive/review a copy of the gaining clearance

report, restrictions on when data can be collected, and to collect data on

procedures to safeguard the privacy of students or students, teachers,

teachers. It is important to find out what these procedures or projects.

are and to address them as early as possible, preferably as

part of the initial proposal development. When seeking

cooperation, it is always helpful to offer to provide

information to the participants on what is learned, either through

personal feedback or a workshop in which findings can be discussed. If

this is too time-consuming, a copy of the report or executive summary

may well do. The main idea here is to provide incentives for people or

organizations to take the time to participate in your evaluation.

Second, the needs of the participants must be considered. Being part of

an evaluation can be very threatening to participants, and they should be

told clearly and honestly why the data are being collected and how the 31

results will be used. On most survey type studies, assurances

Participants are provided that no personal repercussions will result from

should be told information presented to the evaluator and, if at all possible,

clearly and individuals and their responses will not be publicly

honestly why the associated in any report. This guarantee of anonymity

data are being frequently makes the difference between a cooperative and a

collected and recalcitrant respondent.

how the results

will be used. There may, however, be some cases when identification of

the respondent is deemed necessary, perhaps to enforce the

credibility of an assertion. In studies that use qualitative

methods, it may be more difficult to report all findings in ways that make

it impossible to identify a participant. The number of respondents is often

quite small, especially if one is looking at respondents with

characteristics that are of special interest in the analysis (for example,

older teachers, or teachers who hold graduate degrees). Thus, even if a

finding does not name the respondent, it may be possible for someone (a

colleague, an administrator) to identify a respondent who made a critical

or disparaging comment in an interview. In such cases, the evaluation

should include a step wherein consent is obtained before including such

information. Informed consent may also be advisable where a sensitive

comment is reported, despite the fact that the report itself includes no

names. Common sense is the key here. The American Evaluation

Association has a set of Guiding Principles for Evaluators (AEA, 1995)

that provide some very important tips in this area under the heading

“Respect for People.”

Third, data collectors must be carefully trained and supervised,

especially where multiple data collectors are used. This training should

include providing the data collectors with information about the culture

and rules of the community in which they will be interacting (especially

if the community differs from that of the data collector) as well as

technical skills. It is important that data collectors understand the idiom

of those with whom they will be interacting so that two-way

communication and understanding can be maximized.

The data collectors must be trained so that they all see things

Periodic checks in the same way, ask the same questions, and use the same

need to be carried prompts. It is important to establish inter-rater reliability:

out to make sure when ratings or categorizations of data collectors for the

that well-trained same event are compared, an inter-rater reliability of 80

data collectors do percent or more is desired. Periodic checks need to be

not “drift” away conducted to make sure that well-trained data collectors do

from the prescribed not “drift” away from the prescribed procedures over time.

procedures over Training sessions should include performing the actual task

time. (extracting information from a database, conducting an

interview, performing an observation), role -playing (for

interviews), and comparing observation records of the same

event by different observers.

32 When the project enters a new phase (for example, when a second round

of data collection starts), it is usually advisable to schedule another

training session, and to check inter-rater reliability again. If funds and

technical resources are available, other techniques (for example,

videotaping of personal interviews or recording of telephone interviews)

can also be used for training and quality control after permission has

been obtained from participants.

Evaluations need to include procedures to guard against possible

distortion of data because of well intended but inappropriate “coaching”

of respondents—an error frequently made by inexperienced or overly

enthusiastic staff. Data collectors must be warned against providing

value-laden feedback to respondents or engaging in discussions that

might well bias the results. One difficult but important task is

understanding one’s own biases and making sure that they do not

interfere with the work at hand. This is a problem all too often

encountered when dealing with volunteer data collectors, such as parents

in a school or teachers in a center. They volunteer because they are

interested in the project that is being evaluated or are advocates for or

critics of it. Unfortunately, the data they produce may reflect their own

perceptions of the project, as much as or more than that of the

respondents, unless careful training is undertaken to avoid this

“pollution.” Bias or perceived bias may compromise the credibility of the

findings and the ultimate use to which they are put. An excellent source

of information on these issues is the section on accuracy standards in The

Program Evaluation Standards (Joint Committee on Standards for

Educational Evaluation, 1994).

Fourth, try to get data from as many members of your It is important to

sample as possible. The validity of your findings depends follow up with

not only on how you select your sample, but also on the individuals

extent to which you are successful in obtaining data from who are non-

those you have selected for study. It is important to responsive to the

follow up with individuals who are nonresponsive to the initial contact to

initial contact to try to get them to participate. This can try to get them to

mean sending surveys out two to three times or participate.

rescheduling interviews or observations on multiple

occasions. An ambitious rule of thumb for surveys is to

try to gather data from at least 80 percent of those

sampled. Wherever possible, assessing whether there is some systematic

difference between those who respond and those who do not is always

advisable. If differences are found, these should be noted and the impact

on the generalizability of findings noted.

Finally, the data should be gathered, causing as little disruption as

possible. Among other things, this means being sensitive to the schedules

of the people or the project. It also may mean changing approaches as

situations come up. For example, instead of asking a respondent to

provide data on the characteristics of project participants—a task that

may require considerable time on the part of the respondent to pull the 33

data together and develop summary statistics—the data collector may

need to work from raw data, applications, and monthly reports, etc., and

personally do the compilation.

Analyzing the Data

Once the data are collected, they must be analyzed and interpreted. The

steps followed in preparing the data for analysis and interpretation differ,

depending on the type of data. The interpretation of qualitative data may

in some cases be limited to descriptive narratives, but other qualitative

data may lend themselves to systematic analyses through the use of

quantitative approaches such as thematic coding or content analysis.

Analysis includes several steps:

• Check the raw data and prepare them for analysis.

• Conduct initial analysis based on the evaluation plan.

• Conduct additional analyses based on the initial results.

• Integrate and synthesize findings.

The first step in quantitative data analysis is the checking of data for

responses that may be out of line or unlikely. Such instances include

selecting more than one answer when only one can be selected, always

choosing the third alternative on a multiple -choice test of science

concepts, reporting allocations of time that add up to more than

100 percent, giving inconsistent answers, etc. Where such problematic

responses are found, it may be necessary to eliminate the item or items

from the data to be analyzed.

After this is done, the data are prepared for computer analysis; usually

this involves coding and entering (keying or scanning) the data with

verification and quality control procedures in place.

The next step is to carry out the data analysis specif ied in the evaluation

plan. While new information gained as the evaluation evolves may well

cause some analyses to be added or subtracted, it is a good idea to start

with the set of analyses that seemed originally to be of interest. Statistical

programs are available on easily accessible software that make the data

analysis task considerably easier today than it was 25 years ago. Analysts

still need to be careful, however, that the data sets they are using meet

the assumptions of the technique being used. For example, in the analysis

of quantitative data, different approaches may be It is very likely

used to analyze continuous data as opposed to that the initial

categorical data. Using an incorrect technique can analyses will

result in invalidation of the whole evaluation raise as many

project. Recently, computerized systems for questions as

quantitative analysis have been developed and are they answer.

becoming more widely used to manage large sets of

narrative data. These provide support to the analyst

34 and a way of managing large amounts of data that are typically collected

(but do not eliminate the need for careful analysis and decisionmaking on

the part of the evaluator.) Two popular programs are Ethnograph and

Nu*Dist.

It is very likely that the initial analyses will raise as many questions as

they answer. The next step, therefore, is conducting a second set of

analyses to address these further questions. If, for example, the first

analysis looked at overall teacher performance, a second analysis might

subdivide the total group into subunits of particular interest—i.e., more

experienced versus less experienced teachers; teachers rated very

successful by mentors versus teachers rated less successful—and

examine whether any significant differences were found between them.

These reanalysis cycles can go through several iterations as emerging

patterns of data suggest other interesting avenues to explore. Sometimes

the most intriguing of these results emerge from the data; they are ones

that were not anticipated or looked for. In the end, it becomes a matter of

balancing the time and money available against the inquisitive spirit in

deciding when the analysis task is completed.

It should be noted that we have not attempted to go into any detail on the

different statistical techniques that might be used for quantitative

analysis. Indeed, this discussion is the subject of many books and

textbooks. Suffice it to say that most evaluations rely on fairly simple

descriptive statistics—means, frequencies, etc. However, where more

complex analyses and causal modeling are derived, evaluators will need

to use analyses of variance, regression analysis, or even structural

equation modeling.

The final task is to choose the analyses to be presented, to integrate the

separate analyses into an overall picture, and to develop conclusions

regarding what the data show. Sometimes this integration of findings

becomes very challenging as the different data sources do not yield

completely consistent findings. While it is preferable to be able to

produce a report that reconciles differences and explains the apparent

contradictions, sometimes the findings must simply be allowed to stand

as they are, unresolved and, it is hoped, thought provoking.

Reporting

The next stage of the project evaluation is reporting what has been found.

This requires pulling together the data collected, distilling the findings in

light of the questions the evaluation was originally designed to address,

and disseminating the findings.

Formal reports typically include six major sections:

• Background

• Evaluation study questions

• Evaluation procedures 35


PAGINE

92

PESO

378.18 KB

AUTORE

Atreyu

PUBBLICATO

+1 anno fa


DETTAGLI
Corso di laurea: Corso di laurea magistrale in amministrazioni e politiche pubbliche
SSD:
Università: Milano - Unimi
A.A.: 2011-2012

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Atreyu di informazioni apprese con la frequenza delle lezioni di Analisi e valutazione delle politiche e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Milano - Unimi o del prof Regonini Gloria.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Analisi e valutazione delle politiche

Valutare gli effetti delle politiche pubbliche
Dispensa
Controllo di gestione e valutazione delle politiche
Dispensa
Riassunto esame di Valutazione dei risultati e ricerca sociale, prof. Gobo, libro consigliato Il nuovo disegno della ricerca valutativa, Claudio Bezzi
Appunto