![]() |
|
|
|
|
|
Publications: Judiciary
|
By Linn Hammergren
Evaluation: Participants in judicial reform programs seem remarkably resistant to evaluation. Courts and governments rarely evaluate their own efforts; NGOs seem to regard evaluations as an infringement of their independence, and even major donor agencies have been very lax in this area. It has been suggested that this is a consequence of a disciplinary bias. Lawyers dominate the programs, and law is not an experimental science. Lawyers are more inclined to argue than to test their positions. Moreover, because of the values involved (justice, rights, fundamental principles) and the politically charged nature of the topic, the notion of measuring or quantifying achievements is often regarded as somehow denigrating the importance of the theme. While there is a recent effort among several donor agencies to conduct evaluations of their entire programs, observers are still wondering whether the results will be released and if so, in what form. Preliminary reports from the evaluation teams suggest that their findings may not please agency leaders who have highlighted their dedication to promoting judicial reform. Aside from evaluation–phobia another major disincentive is that evaluations do add to costs. Like assessments, they are frequently seen as absorbing funds that might be put directly into reform. And since they come at the end, when funds are most likely to be scarce, they are still more obvious targets for elimination. However, in-house policy, including the funds initially allocated for evaluation and monitoring, and decisions, like that of USAID, to make evaluation optional [ 11 ] indicate that more than last-minute cost-cutting is at work. Nonetheless, the increasing impression that programs are beginning to repeat old errors, that strategic programming has if anything weakened over the last years, and that the number of objective pursued has increased with no particular relationship to the content of programs, demonstates a serious failure to learn from and build on experience. Whatever the reason for the inattention to evaluation, it may be the largest single contributor to this situation and an obvious place to work immediate changes. A first lesson and recommendation is thus that systematic evaluations must be done, because without them there is little way of determining whether programs are producing results and how they might be improved. Results indicators, like those USAID has attempted to adopt and other agencies are also pursuing, are no substitute. They will only become one once we have a better internal understanding of how judiciaries and reforms operate. The further advantage of evaluation is that it looks not only at progress but also analyzes the path for arriving there. That second kind of knowledge is essential for anyone attempting to replicate a presumed success. It does little good to know that country X has reduced the average time to disposition of cases by 25 percent if one does not know how it did this. Evaluation can also focus on additional consequences and contributing factors. Delay reduction could be achieved by simplying proceedings, improving in-court management, automation, discouraging filings, doubling the number of judges, or dismissing more cases for lack of merit. Each of these mechanisms would shorten times to disposition, but they have different implications for financial or other costs (e.g. reduced access). Perhaps the reduction coincided with a substantial upturn in the economy, which is usually associated with a reduced recourse to court services. This might mean that the program itself had no impact, as the reduced demand might have been sufficient to allow speedier processing of the remaining caseload. Evaluating individual projects is important, but the value of evaluation substantially increased when evaluations are compared Comparison makes it easier to identify the causal relationships and the intervening variables that may affect them. Especially in an area as complex as judicial reform, where many programmatic interventions are introduced simultaneously and where exogenous variables may have a still stronger influence, it is all too easy to pick out spurious relationships. Was it the training, the higher salaries, or the internal monitoring that caused judges to speed up their handling of cases, change the patterns of their decisions, or become more resistant to bribes? Could one have achieved the same results with only one of the interventions, or achieved more by doubling one of the inputs? Or was the result a consequence of some external change with no relationship to the actual reform? These questions are not easily answered, but there is more opportunity to weed out the irrelevant or to define the conditions for relevancy when several examples are under investigation. Thus a second lesson and recommendation is an emphasis on comparisons of evaluations or even cross-project evaluations of single activities – training programs, automation, delay reduction, or changes in basic laws. As this latter effort approaches research we can leave it for a moment, and look further at the process of comparing evaluations themselves. For this to happen the first obvious steps are that evaluations be done, that they be widely available, and that a program or incentives be introduced to encourage comparative study. A second need is for evaluations to be structured to allow comparison. This has generally not been the case. Evaluations whether done by in-house experts (either in evaluation or on judicial reform, generally not on both), contracted outsiders, or external agencies (a governmental body or an NGO), tend to be shaped by the project at hand and by the specific interests of the evaluators. Consequently, they are usually directly comparable only when the same evaluator does them, and that person’s interests may still not jibe with those of the parties wishing to use the evaluation to improve future programming. Even when general protocols are established they usually are so open ended as to allow the evaluators to pick and choose as to what they will review. The World Bank’s standard evaluation format (prior situation, strategy adopted, what was done, results and lessons learned) is as good as any, but obviously leaves enormous leeway to the evaluator. More specific terms of reference tend to be determined by the project under review and thus often avoid some of the key questions (was this project worth undertaking? Was training or law revision the best way of achieving the desired ends?) as opposed to things like whether the inputs were provided as stipulated, the immediate outputs achieved, and could the quality of the individual activities have been improved . In training programs for example, attention often goes to the training methodologies, the content of individual courses, and the means for selecting students. A training expert, the usual candidate for this part of the evaluation, is hardly likely to question the value of any training at all, but rather look to how what was delivered might have been improved. Where the evaluation is entrusted to one expert, he or she may ask some of those harder questions about less favored elements, but there again, the standards remain dependent on the evaluator’s inherent preferences. What this suggests is the need for a dual focus in evaluations: one part examining the project at hand (to determine how well it has carried out the proposed actions) and the other focusing on more general questions, both the adequacy of the strategy itself and the way it addressed certain common problems. For an individual agency doing many evaluations it should be easy to adopt this format. It may be harder to implement given the inherent limitations of evaluators themselves – their own biases and preferences, a tendency to not want to be overly critical (an evaluator who finds himself declaring even a few strategies not worth following may not be an evaluator for long), and the difficulty of finding people with both specific knowledge and the ability to take an overview. Using generic evaluation experts is usually no solution because of the specialized, substantive knowledge required. USAID’s now famous overview of its rule of law programs [ 12 ] came under heavy criticism from project managers for just this reason. It was claimed that the authors, coming from other disciplines, never understood the purposes of the projects, confounded their objectives with cause lawyering (the use of existing systems to reach immediate benefits for the poor), and misread the multiple and interactive aims of many of the common elements [ 13 ]. Participants have been equally critical of macro-economic analyses of judicial performance and reform impacts and especially of their efforts to derived composite or unidimensional scores for both [ 14 ]. Thus, while good evaluation techniques need to be introduced, this may be more appropriate in setting the standards for the evaluation, but not in actually conducting it. The third need and recommendation thus is thus to make evaluation a little less free form, by establishing general standards (the work of the evaluation experts) and common themes and questions (the responsibility of those designing and overseeing programs). The project-specific criteria will be set by the sponsoring agency (assuming there is one) and the project itself. This is the within-system evaluation, the one which focuses on whether the project followed its intended strategy, complied with any rules the sponsoring agency requires, and achieved its proposed results. These are important questions for evaluating any specific proposal but they are less important in terms of furthering programmatic knowledge. For the latter, the evaluation will be expected to address the adequacy of solutions in addressing certain common problems – delay reduction, combating corruption, ensuring the selection of the most capable judges, encouraging judges to make speedy, fair decisions, increasing access for marginalized groups, and so on. Once again, selecting these categories will be easier for assistance agencies doing multiple programs, but they are also the ones best situated to compare results. ____________________________[10] A series of informal interviews with individuals charged with evaluating programs for the UNDP, IDB, USAID, and the World Bank made it clear that none of them had access to all the documentation that should have been available. I suspect, as all the work was commissioned by the respective agencies, that this reflects an information storage and retrieval problem, not a conscious effort to keep their evaluators in the dark. However, it also demonstrates an inadequate internal usage of the documents; were they being read and used, they would have been easier to locate. [11] This is not only for judicial reform but for all programs and was motivated by the agency’s shift to a management by results mode. Most internal and external observers regard the latter as a poor substitute, as it lends itself to manipulation (participants select results they are sure will be achieved, regardless of their wider significance) and because of the problems, discussed above, as regards selecting performance indicators. [12] Gary Hansen and Harry Blair, USAID, 1994. [13] For example, training which is often used to build support for reform, collect more information on common problems, and provoke judges to take a different view of their role or develop sensitivities to the needs of a widely variety of clients, was relegated to simple capacity building. [14] Many of these analyses rely on surveys or expert opinions to assess quality. A common criticism has been that those surveyed often have a unidimensional view and that in any case, even knowledgeable informants tend to misjudge real operations. See Kritzer In one recent study, lawyers from prestigious law firms were asked about the duration of debt collection cases. As several discussants noted, these lawyers probably rarely if ever did such work. |