
The Nonprofit Fix
"The Nonprofit Fix" is a candid and insightful podcast that explores the challenges of the nonprofit sector and its potential to repair our broken world. Join hosts Pete York and Ken Berger, as they delve into the sector's issues, discussing solutions to foster a more effective and impactful nonprofit community.
Opinions expressed in this podcast are personal and not reflective of the hosts' employers.
The Nonprofit Fix
Impact Measurement 3.0: From Impossible RCTs to Automated Causal Modeling
What if nonprofits could measure their impact without breaking the bank or succumbing to funder demands for costly trials? Join us on "The Nonprofit Fix" as we unravel the complexities of impact measurement, challenging the dominance of randomized control trials (RCTs) and exploring more accessible alternatives. With three decades of experience in program evaluation, we dissect the limitations of RCTs and the pitfalls of non-experimental studies, which often inflate success rates without truly reflecting causality. Our conversation doesn't stop there—we navigate the promising terrain of AI and machine learning, which holds the potential to revolutionize evaluation methods by leveraging existing data for more accurate and cost-effective assessments.
We then examine the paradigm shift in nonprofit data management, advocating for a transition from compliance-driven collection to harnessing data for real-time insights and program enhancement. Imagine reducing evaluation lag time while simultaneously fostering a dynamic, responsive service delivery—technology makes this possible. By repurposing administrative data and adopting machine learning, nonprofits can create feedback loops that enhance decision-making and program effectiveness. This approach not only transforms service delivery but also generates a culture of continuous improvement that benefits both practitioners and beneficiaries.
Finally, we delve into the nuances of differentiating raw data from genuine impact within the nonprofit sector. With a critical eye, we address the challenges of using simplistic surveys and the role of sophisticated program administrative data systems. The conversation extends to the potential of machine learning to build comprehensive models that reflect true program efficacy. We explore key resources like Project Evident and training opportunities that open new horizons for nonprofits seeking to refine their impact measurement. As we wrap up, we offer a glimpse into our podcast's evolution and tease future discussions on the effects of changes in federal administration on nonprofits.
Welcome to the Nonprofit Fix, a podcast about the nonprofit sector where we talk openly and honestly about the many challenges that face the sector where we will discuss current and future solutions to those challenges where we explore how the nonprofit sector can have much more positive impact in the world. A podcast where we explore how the nonprofit sector can have much more positive impact in the world.
Speaker 2:A podcast where we believe that once we fix the nonprofit sector, we can much more dramatically help to fix our broken world.
Speaker 1:Welcome again everybody to the Nonprofit Fix. This episode is something that we've alluded to in many other episodes, and one way I think about it is I'll tell you first of all, hi, peter.
Speaker 2:Hey Ken, hey everybody listening.
Speaker 1:So, peter, if I had a nickel for every time a funder has required us to describe what our impact is. And then we had to do a kabuki dance as often we do where we had to take what are typically defined as outputs and sort of cobble them together in a new form to make them look like outcomes and impact, because the ramifications of what they're asking for is so enormous, as hopefully will be apparent here, I would be a rich man. That's a long answer to that question of what's your impact.
Speaker 2:Yes, you would for sure, and you'd not be alone in that situation. With respect to a lot of other organizations, leaders, program folks and everybody writing proposals and grants we're constantly under this pressure. So this episode we really are going to zero in on specifically impact measurement, what it means, what are the challenges and, just like we always do, talk about the pain points but also be able to talk about not just the pain points but also talk about the solutions and we do actually for this area. This is a topic and if you listen to our podcast, in many episodes we speak to this a lot. So not everything here is going to be new, but the point is it's going to be focused conversation here around what are the pain points. So we can summarize it In previous episodes we've talked about how important impact measurement is to all the different issues and challenges we've been talking about all along.
Speaker 2:In our exhausted sector series especially, we've always brought up measurement of outcomes. We've talked about the difference between outcomes and outputs. But today's episode is really going to focus and zero in on this in a deeper dive way and try to really address some of this stuff. So from that standpoint I am going to dive in and kind of just raise what I think are important things that we should be talking about from the standpoint of the kind of pain points that we want to be able to share and pain points that you know as somebody myself, I've been in an evaluator for 30 years evaluating programs, social programs, through many different methods, approaches. I've spent 30 years doing traditional kind of evaluation work where I've been brought in by funders, philanthropies, nonprofits, government agencies to evaluate the programs, and there are a lot of traditional ways we typically evaluate what's the impact of a program and traditionally that's kind of the way things happen is the evaluation community is sought, there are consultants and research firms and big organizations and academics who get called in to evaluate the impact. This is traditionally how it happens. Otherwise, what it is we'll talk about in a moment. The reality is not many can afford this.
Speaker 2:The first thing I want to talk about is, when we talk about measuring impact, the first thing that folks go to is can we really understand? Does this program cause the outcome it says it causes? Is it really creating the results that we're hoping for? From that standpoint, what I would tell you is that I'm going to start with something that the field and everybody should know is almost like a little PSA that when the government and other philanthropies talk about rigorous evaluation or the kind of evaluation that we can believe will tell us if a program actually has the impact it says it does, they will give you a gold standard methodology, known as the randomized control trial In some cases, where you're basically assigning people to get the program or not get the program and then we're doing that randomly so that we know that they are equally likely to have all kinds of similar backgrounds and stuff. We just kind of randomize that away. We get the program, some get it, some don't, and eventually we evaluate and kind of calculate hey, is there a difference in the outcome If we measured the outcome academic achievement going to college more go to college with the program than those that were held back from didn't get the program. So this is called in a simplified way these are experiments, random control trials, randomized control trials.
Speaker 2:The problem is we call it a trilemma problem, which is cost, time and generalizability. In the nonprofit sector it's a very rare program that gets the kind of resources. This is a type of study that can cost hundreds of thousands of dollars, if not millions of dollars over two periods. Right, it can cost 10, 15, 20, 30% of the entire program budget in order to be able to really do this kind of rigorous work, and so there's a real cost to it. There's a time lag it could take 18 months to get results, 24 months to get results. It can be preliminary, it can take very long and by the time those results come out the world may be very different for that program. Things have changed and, as a result, what can we do with those results except say, yep, in the past it worked, but will it be the same? And this brings us to the third problem of the trilemma. So cost, time lag.
Speaker 2:And the third problem is generalizability. Most of these studies are taking place in place, in situations, circumstances and communities where, if you take that program and we find out in the RCT it works, and then you plant it in another part of the country or another community someplace else, we can't assume that it's going to work unless that study was actually studying a representation of all types of communities everywhere. So we have three problems Cost, takes a really long time and they get do the findings really be relevant? And generalizability Okay, so that's the piece there. That is really the big problem of the ideal All right. So, as a result, very few nonprofit organizations in our sector programs are really rigorously evaluated this way, and when they are, it's a one and done and they leverage it for in perpetuity. The problem with these studies is that. So what's left for us to do if we can't do these kinds of experimental studies Well, a lot of it comes down to, we do what we call correlational studies.
Speaker 1:Before just want to make a quick comment on this.
Speaker 1:So I think what I've seen over the years when it comes to this shiny gold quite literally gold randomized controlled trial notion is, I think, because of the cost.
Speaker 1:There've only been a handful of these that I'm aware of that were done during a period of time where I guess the whole notion of impact measurement was first coming to the fore and being sort of encouraged.
Speaker 1:Maybe it was what like 10, 15, 20 years ago, something like that and there are a few agencies that got that funding and for a period of time they were held up as what we should all aspire to Called evidence-based, evidence-based Right. But then what I recall happening is a number of these agencies because that funding was not sustained, it's just too costly. So it was like, as often happens, the foundations and the other supporters, for a temporary period of time, provided, provided that extra special funding and then, with all the limitations you described on generalizability and the timing, the elapsed time and whatnot, it's really almost completely fallen out of favor in terms of its usefulness for the nonprofit sector, and so that's why I think, as you're segueing into. There needs to be what we've talked about a quicker, better, cheaper way so that nonprofits can actually prove their impact without going down this road of randomized controlled trials.
Speaker 2:Well, and a lot of times too. I mean, there's a few other problems associated with them. I mean part of it is you have to have a very well-developed, mature program in order to even be able to afford or think about doing an RCT, which has implications. And there's another part there's an ethical dilemma. Do we really not give people?
Speaker 2:that need services because they've been randomly assigned to not get it, because they've been randomly assigned to not get it and we might say well, eventually we'll give them that They'll get priority after their study. And that also brings another problem. Do we really think that the people that are not getting the program are not getting elements of that program someplace else if they really need them? And if they do, that contaminates the study. So in the end, I am not dismissing the notion that randomized, controlled trials and controlled experiments are hugely valuable to advancing knowledge, to understanding, you know, do programs work, etc. But I do think that we have to look at the cost, the time lag and some of these issues. So, what's a nonprofit to do? What's a program to do? That's realistic. Well, realistically, what's basically happening out?
Speaker 2:There is a lot of folks are kind of doing their own studies because grantmakers and funders and philanthropies are still asking the question does your program make a difference? Well, if you're doing it independently, you do the best you can, and if you can't set up these costly experiments that take time, you will also often do just kind of like non-experimental studies, correlational studies. So let me give you an example of the kind of studies that typically happen. So if I were to take a job training program and we were to evaluate it and let's say we found that 65, it was a. The evaluation was we basically measured their employment rate before the program and their employment rate after the program what we call a pre-post test and you're running a program, so you just gather some baseline data about their employment Are they employed, are they not? And then we look and compare the same people because it's the same group. We look and compare them after the program. This is a correlational study and people think it's causal because we did pre-post, we're tracking the same people. But what we're not understanding is that we might find that, hey, 65%, the job training program report found that 65% employment rates post-completion, after the program, versus 40% before the program.
Speaker 2:Okay, so the claim we have is we caused 25% improvement. But this is not true. The reality is that participants were pre-screened for educational work or history, so in a lot of ways they were almost creaming right. So if everybody's coming in and you don't take everybody, you cream those that are more likely to find work anyway. Right, that's one and it happens.
Speaker 2:Secondly, we find out that there were all kinds of regional job growth opportunities. So maybe a big employer moved in, other things came in, so there were opportunities in the community, et cetera. And third, the dropouts. Let's say 30% dropout. A lot of times when folks are in the real world doing evaluation on their own or with some outside independent, they will actually then not. They'll take those 30% that drop out and remove them from the study, and so we're not including them. But those dropouts are also very likely to be the ones that didn't find employment, and so you can see how we are assuming we made a 25% improvement when that's not the case.
Speaker 2:So this is a lot of like what I call, kind of like non-experimental evaluation, which is really the impact measurement work that most of us are doing, can afford, can figure out how to do. We might be able to hire a graduate student, a consultant or somebody to really help, but these are not the same thing. So why does that matter? If we were to really do evaluation right, this is the way we need to do it. So what are our options when we don't want these correlational studies that don't sometimes kind of make us look great, even though really, if we looked at the data a little closer, we don't know that it was your program that caused the actual outcome. So what are our options? What do we have to do?
Speaker 2:Well, we have an opportunity now, and that opportunity lies in all the data that we're collecting and I'm going to share with you all in a moment kind of all this data that we're collecting through our program administration, data systems, all the different case management systems, all these kinds of things. We're collecting all this data all the time. There's data out there in the world that can help us really learn new techniques and with AI and machine learning, we're really getting to a place where we can do things we could never do before, and that's a little tease for what we're about to go into. Okay, but I just want to also acknowledge, before we jump into the solution. I want to acknowledge just a couple other pain points really quickly, and the reason I bring up the data piece. The opportunity is right now.
Speaker 2:We are gathering a lot of data. Like I said, we have an opportunity and if we think about AI and ML, where can we go with this? And that's where I'm going to be speaking to the solution side of this. But right now, what's happening is a lot of people are putting data in. You've got a lot of data gathering, asking questions. There have been studies that have shown anywhere from 20 to 30 percent of a case manager's time is spent gathering data and inputting data right Through the means of questionnaires and surveys and tracking and counting and blah, blah blah and documenting everything they do, reporting everything in their data system, their, you know, inputting data, gathering data, asking questions.
Speaker 1:So I often think of this as what I called feeding the paper monster, and that the majority, a lot of the data that's collected towards this data, is it's compliance related to make sure we get the funds. And I don't want to get too far ahead of things, but my impression from agencies that I've worked with that have gone down this road that you're talking about, it's like the starting point for this work you're talking about is using the existing data, even though it's not presumably the way that you evolve over time, because it's like I think as you work with these agencies, my impression is, as they begin to see how this data can be used to come up with some meaningful results, it begins to make them say for the first time oh, in addition to the data that we're currently compiling for somebody else, what do we think we should be compiling that actually matters more directly to the people that we serve, as opposed to what other people are telling us we have to do to justify the funding, and that's a fundamental paradigm shift.
Speaker 2:It is because what you're beginning to do is the pain point that we are speaking to is there's a whole lot of data but not a lot of insights. We're data rich and insight poor, and so part of it is how do we gather those insights? And if there's a way we can leverage this data and there is now we do have technology, machine learning tools we can automate. There's ways to do this and even with some rigor and imagine being able, to your point, be able to then look in that data and begin to see answers to questions that we've had not just putting in data, but what's working for whom, under what circumstances? Why is it not working To your point?
Speaker 2:There may be questions we're not asking. Part of it is what our data can tell us and what it can't tell us, and we find that when you transform data into insights, that comes back, especially to the practitioners on the front lines their managers and supervisors, program designers and developers. When that data comes back and you get that feedback mechanism telling you both how you're doing are we really as effective as we can be but also telling you why, what it is about your program. It both answers questions but allows you to more specifically ask questions and learn. Those questions may be questions you investigate by going and talking to people, or they may be investigated by asking better questions in your data system, and what that does is it creates a learning loop, and that's very different. That's very different than an administrative task of just putting data in Right. It's like it's like converting your data collection from an overhead cost to a program cost, because now it's valuable to your program and they should fund it accordingly and not get caught up in overhead.
Speaker 1:But you know the other thing, that is my impression from this. I'm curious to know what your thoughts are. As this paradigm shift happens and people see the results and they become excited about it and they want to ask these additional questions to get to making certain that meaningful impact is being considered for each individual, do you find that incrementally, over time, that loop becomes even faster and they get more excited and so even more questions come up and there's even more. It gets to a point where they're so excited it's like, oh, let's think about this and think about this and consider this. Yeah, is my impression over time?
Speaker 2:Yeah, absolutely, because that's the other thing is. One of the other challenges is and I want to emphasize, the lag time of typical evaluations both traditional kind of experimental evaluations, as well as some of the correlational types of evaluations is pre post, where they're not doing a good job controlling for things, but they're just trying to answer some basic questions not doing a good job controlling for things, but they're just trying to answer some basic questions. It really is that lag of time is so critically important because we need insights about how we're doing right now. We need feedback that we can ground in our current experience as a practitioner and as the beneficiary. And the problem is in traditional evaluation, everything gets rolled up and it benefits the funders, it benefits maybe policymakers, but what it doesn't do is automate and give real time feedback and learning moments to both being able to feed that back in the moment when I need it to make a decision for a beneficiary, and to be able to see in real time how we're doing overall, so that we can adapt in real time, ask questions, learn more and basically make some changes, even go as far as to try to experiment with how we can maybe move a needle that's not moving with something that we're not measuring or doing, and so that kind of feedback loop becomes. It's something that, by the way, practitioners experience anyway. It's just they're doing it anecdotally, maybe on their own, in isolation, talking to other practitioners, their bosses, their supervisors. They're doing this, but what they're not getting is the kind of feedback that derives from good data collection, that gives them a real, honest picture and a more precise picture of kind of where to focus their learning so they can improve a case, as well as a bigger question how can we prove our program design to meet more people's needs? Those are things that I think think if you don't have real time feedback, if I had to wait two or three years, how relevant are those findings? If it's a snapshot that actually was two or three years ago, right, it becomes a very different type of information In the evaluation space.
Speaker 2:There's a whole conversation around, not conversation. There's a whole body of research and literature around utilization, evaluation, utilization, and there's a lot of efforts qualitatively and a lot of work done and advances in ways you can use evaluation methods, experimental and non-experimental to really inform real time the evaluations. But even there it takes time. It takes time. But now add technology, ai and data that's already being collected and you can begin to see how you can start to really convert that. And that's where the opportunity is and this isn't just speculation. We're now seeing it in use in a lot of work. I mean, this is a lot of the work. I will say that we are doing here at BCT myself, you know, really trying to move on, yeah.
Speaker 1:So I have a question on that, because, getting back to this notion that a lot of agencies, before they go down this path, what they're compiling is more funder requirement based, what I also find with a lot of that feeding the paper monster is it can be either A quite superficial and not have a lot of substance to it. I mean, I even think of cases where they just cut and paste and they say the same thing over and over again because they're required for X period of time to have some kind of note and the content is just very, almost meaningless. At times it's like the funder just wants to see words on a page and doesn't even look at the depth or the quality of it. So how, in scenarios like that, I would assume you would have to rule out some of that kind of data, I assume. Yeah, so it's interesting.
Speaker 2:I have experience with organizations that I've worked with in implementing these kind of real-time systems where it's very interesting because how we communicate our findings to our funders.
Speaker 2:It's kind of interesting because, in some ways, if you do really good evaluation and I'm going to share with you this analytics approach that I think is bearing a lot of fruit in terms of being able to do this work and do it and automate it we end up finding out some truths that we may not like to see, which is that our program may work very well for certain groups of our population and we see we're successful, we're doing what we need to do, we're getting success, but there may be a whole lot of different subgroups that we're not actually as successful with and the problem we have is that that type of insight and this is what comes with rigorous evaluation See randomized control trials oftentimes come to the conclusion the program doesn't work right.
Speaker 2:They often quasi-experimental design. Evaluations will be like it works, but only under certain circumstances. If we actually leverage the data that people are collecting, we often come to the same conclusion. There may be 12 different types of cases that you're working with, who come with different circumstances and situations. Therefore, we shouldn't be tuning our programs to everybody, but meeting them where they are. But we may find out that out of that 12, we are really effective. Our program is very effective for four of these subgroups.
Speaker 1:I guess, but I guess yeah where I'm going.
Speaker 2:Let me. I'm so sorry. So here's the point. So how do we then communicate that if we're pitching for fundraising proposals and things like that and so we run into this real challenge? Right now there's actually an incentive for creating the kind of evaluation I described with the job training, where we can say 25% boost. We did a study pre post. Change is 25% boost and that sells the proposal. That communicates to the world that our program works.
Speaker 2:But really we have a challenge is it's not necessarily true for the beneficiary. You're getting that you're raising money on a correlational study. When we actually look at this stuff, it's very important that we kind of figure out how we're going to communicate this stuff, and so I will tell you. I believe that if we actually get to the place where we are looking at true impact measurement, we're all in for some humility as to how well our programs work for different types of people and groups of populations. There's learning in that. But if we look at it through a fundraising or like being able to leverage, like you said, there are people that have got positive RCT findings that will leverage that report over and over again for multiple years of funding. We've proven it. It's an evidence-based program.
Speaker 1:And I appreciate that, but I still I'm talking about scenarios and I've seen a lot of them where the data is just crap and so I don't know how you can discern. So what I was trying to drive out is, in those instances, I would imagine that your data analytics would come up with a. You know, the quality of this data is so substandard because there's no discernment. You're just cutting and pasting and saying the same thing over and over again. There's no way to make a discernment because the quality of it is at such a low level so that the first finding might be okay. We got to revamp the data collection here, to get some quality in here, to have something more than just repeating the same information over and over again.
Speaker 2:This is the other problem. So I run into an interesting and over again, that's what I'm Well, this is the other problem. So I run into an interesting and fascinating experience. So we're going to start to get to the solution, and part of the solution. I have to tell you the quality of the data. It all depends on the questions, when you ask the questions and how you ask the questions right.
Speaker 2:This is one of the challenges with formal evaluation, where you design your own survey that you ask at entry and you ask at exit. You can shade those questions to get the answers you want right and even working with evaluators, there's ways to do that. What you really want, if you want to measure kind of what's being delivered, how it's going, you need other types of data. You need transactional data, like your program administrative data. These are actual counts of how many minutes of therapy somebody's actually sat in the room and received, if a part of your transaction around program delivery. You're also telling them. You're also documenting what methods and modalities you're using in these different moments you're meeting with your client. These are actual behavioral tracking. Now people always say to me our administrative data is a mess. Right, we should actually just go about using a particular instrument and I get the desire to do that and if they're designed well, with objectivity, it's really great to also have assessment tools as a part of your system. But you have to design them well.
Speaker 2:Too many organizations design their own program tools, assessments and they kind of lead the witness. They ask questions that are kind of like not necessarily telling you anything. I'll give you an example. You know I'm satisfied with the program. I like the program. These are meaningless from the standpoint of what actually happened, as opposed to questions that say you know, did you receive very specific elements of a program? You know I spent a lot of time working on how to think about. You know, practicing mindfulness, right, that's the kind of question we want. We want objective kinds of questions.
Speaker 2:But too often our assessments that are created for a lot of our correlational studies to satisfy the funder, are oftentimes the really crappy data. But it's funny because when I get in there, a lot of folks will tell me our administrative data is crappy data. Right, we actually need you to design new tools and gather new data, and while you can do that objectively well, I always like to say some of the best data is transactional because we know they showed up, we know how long they were here, we know the modalities you delivered, we can get a sense of, we can even put in there the clinical expertise of somebody. This was delivered by somebody with 20 years experience in CBT. These are all data points that are empirically easier to do. They are messy in the sense that there's a lot of missing data. You got to piece it all together.
Speaker 2:Administrative data is complicated in how it's structured, but I always like to tell people the quality of data is always better observationally, transactionally.
Speaker 2:Once you get into self-report, you start to get into risky territory of not being objective in a way. However, it's valuable. So the whole key here when we're talking about impact measurement, I'm leaning into this idea that people are not realizing and with machine learning, you can actually bring in categorical ordinal text data that gets structured using generative AI or other tools. All of the data qualitative and quantitative data can now be brought together, and so, at the end of the day, if you combine the power of AI and data systems, machine learning and you also are able to bring all that together with administrative transactional data, we're actually able to build models that are more accurate than if we were just to use primary data collection tools created by an evaluator or an assessor subjectively within the organization or somebody who wants to help the organization look good.
Speaker 2:So at the end of the day, we can actually become more rigorous, more behavioral oriented, more transactionally oriented and actually bring a whole lot more to this than we ever thought before. Imagine being able to not do a survey but to be able to interview somebody in their own words as to what their experiences was. Let that text data just get thematically coded, analytically through a generative AI model, rather than you coding it with some special like agree to disagree kind of scale. We're getting to a very better place. I'm sorry I'm getting very geeky here, but yeah, yeah, ken.
Speaker 1:So this is also really a big deal, I think, because most agencies, if they're going to do some kind of proof of the results in addition to the typical outputs of before, after you know how many people went through and whatnot is surveys, and it sounds like, from what you're describing before and after surveys, it sounds like in the vast majority of cases, those surveys are basically, as you said, getting the answers that you want, as opposed to getting really objective evidence of meaningful impact. So that's the sobering reality.
Speaker 1:I think you've just laid waste to about 95% of what goes on out there.
Speaker 2:Well, let me pause, and this is what I was going to say. Let me not disparage good, psychometrically developed scales, instrumentation, assessment tools that have been validated, that have actually been factor analyzed, run through all of the techniques to developing really good metrics that are used in the human and in social sciences. Please don't hear me that I'm disparaging them. The problem I'm having is that there are a lot of times what I call kind of like survey hacks who basically come in. They've not been trained in how to build psychometrically sound scales and instruments and tools. They're not leveraging already created tests and measures in the social sciences that have been tested and measured. They're creating their own surveys and they're basically saying are you happy? Do you think it was the program? And when somebody gets something that's subsidized or free, what do you think their answer is going to be? And then they report to everybody.
Speaker 2:We have 98% satisfaction and 100% of them said they're moving the needle on this, and so at some point we definitely need to cut through the crap and be like that type of data and assessment, which also falls into something, is too replete out there and it is a mess. And so, please hear me, there are psychometrically sound ways, methodologies, and so please hear me, there are psychometrically sound ways, methodologies and tools out there you should be using. Bring them into your administrative data systems. But also get transactional data, get text data, bring other kinds of data into these systems, because we are in an era now well into it, a few years now where we can analyze all of it and put it together and build models that can help us understand what works for whom under what circumstances.
Speaker 1:Yeah, so so I, I I appreciate that, but I would stand by a rough statistic of about 90 to 95%. Don't do the surveying, don't do the assessments in the uh, in the manner that you described. I think there's a lot of sort of winging it, and part of it is also, you know, as we've talked about in so many of the episodes, the scarcity, the limited resources, the limited knowledge and just trying to do something to at least satisfy the funder provide some evidence in theory, even though in practice it may be not the case. So let me ask you something else. One last thing.
Speaker 2:Let me just clarify your 90 to 95% point real quick yeah yeah, I somewhat agree with you.
Speaker 2:I'm going to tell you that I believe that the amazing thing right now is, I would probably hazard to guess, 10, 15% of the nonprofit sector, which are the larger ones, are using and implementing some type of program administrative data system. I'm talking direct service providers. I'm not talking policy advocacy organizations, things like that. But if you look at direct service providers, it's hard to reach some decent scale and not have a need to bring in some type of tool. There are so many vendors out there as evidence for this, including places like Salesforce and other things that make these things available, and there's some great vendor tools out there. The reason being is because you are correct, but we have to say, for the larger organizations, if you're over $3, $4, $5 million and you're a direct service provider on the path to growth, you're very likely installing some type of system just to be able to manage the programmatic implementation work and to be able to share it with others, even if it's just output data. So I think we're not just output data.
Speaker 2:So I think I don't think they're wrong.
Speaker 1:We're not talking output data, yeah, yeah, Like when you know, when you talk let's, let's, you know. It goes all the way back to the beginning, our very first episode when we look at the land, and the second episode we look at the landscape of the sector. When you talk 10, 15% of the sector, you're now getting into agencies that are what like over a million or so and look, I think, to get to the kind of sophistication and what you're talking about, the number of agencies that fit that profile is a very small number, but at any rate, I don't want to.
Speaker 2:Well, like I said the other thing, if you were to look at the hundreds of thousands that are anybody that's filing an easy or, more importantly, a full 990 form, those are agencies, in my opinion, that, if they're direct service providers, are on the path to, or already the size of that they're using. They're starting to get administrative data systems in place.
Speaker 1:Oh, I'm not saying that they're not putting data systems in place, but I'm saying in terms of their ability to show evidence of impact.
Speaker 1:Oh, we're not in disagreement there. I think the number's open. Yeah, I, pathways that was much in vogue and I suppose it still is, which is this notion of what's called constituent voice, and a lot of that is about having surveys that are really hearing, listening to the beneficiaries, and one of the touch points that they identify as proving that this tool works is a book called the Ultimate Question and the notion that there is this thing they call a net promoter score and that the ultimate, this thing they call a net promoter score, and that the ultimate question that they identify is, if you ask someone basically, would you recommend this program to your family or friends? And that statistically, if you know, on a Likert scale, if you, you know, if you're, I guess, somewhere between eight and 10 is the answer for the majority that consistently, they you know, they identify that with high performing agencies and that that is a measure of showing that there's impact.
Speaker 2:I think sometimes they actually speak to it as as um. I think the folks, that the leaders who are trying to really bring shine a light on the beneficiary voice and their experience of the programs, which I do think there's huge value to. We should be listening to the beneficiaries. I don't know that the net promoter score is necessarily it, but I think that's more of a measure of satisfaction with the program. It's not an impact and I actually think some of the leaders in the field who are pushing on that, I think they're pretty clear. It's not a measure of satisfaction with the program, it's not an impact and I actually think some of the leaders in the field who are pushing on that, I think they're pretty clear. It's not a measure of an impact. I think what they're trying to do is bring another metric into the space that basically says hey, listen, we do need to actually gather data from the beneficiaries, because we're always kind of inputting data from the practitioner. They said what they did, you know, but to your point, it's very important.
Speaker 2:I think those who are adopting these practices of beneficiary feedback, constituent voice and using Net Promoter continue to tell the story of those data points as if they're impact. But see, we have a real communication problem and it's not a matter of ignorance. You know, a lot of times everybody says we talked about the output and outcome issue. Right, this is an issue of actually it measures program quality, not impact is net promoter. These are the kind of things that I know we have a I often question. I go do people really not get the difference between an output number served and an outcome lives changed in some measurable way? And I don't know that I'm convinced. I think there's some that it's just ignorance of the difference. Okay, but just for the record, everybody, an outcome means there's some kind of behavioral, habitual, conditional status change. Right, it's like, if you're in school, graduation is, is, is it? And actually matriculation into college, you know that may be a measure of your academic performance in high school and and your program experience in high school.
Speaker 2:Just showing up and completing a course is not an outcome. Completing a course is an output. I did the work, I completed the course, even your grade, that may be a tangible measure of like a quick, short, very near-term outcome. It's like, okay, we can go with the grade as a near-term outcome, but eventually what we're really interested in is did you get the course, do well in the course, and does that predict some other kinds of outcomes, some kind of impact on your life, whether it's your next step in your educational process or whatever? So we really have to distinguish.
Speaker 2:But I think a lot of people know the difference between outputs and outcomes and the problem is our sector comes back to something you and I have talked about on previous episodes, which is, you know, are funders really interested in funding outcomes or do they want the stories? And what stories are there that are going to tell us? And oftentimes I think you know they're okay Most people are kind of in agreement that we basically served people, we are a service sector, right, and so just the idea that you would question a youth development program doesn't just implicitly get outcomes. Therefore, all we should have to share with you is how many kids we served, because we all know after school programming is good, it has an impact, and I think there's a lot of folks out there that part of the challenges we've talked about this on the previous episodes I don't know that there really is a desire to know for sure, to find out whether it's actually caused by the program, these outcomes. I think there's a real hesitancy. I think there's a nervousness around it. I think there's some resistance to it.
Speaker 2:All this is a long-winded way to go back to your point. I actually think it's hugely valuable that we have tools like Net Promoter, that we understand people's satisfaction, we listen to that constituent voice, but let's not pretend that that is a measure of an outcome or an impact. It's not, and if we call it that, I think we're actually we're just wrong, we're not actually serving our beneficiaries. Constituent voice is just a point of saying it's another triangulated perspective that we should be getting if we're going to evaluate programs. No duh, we really need to, and so I think that that's really important. So let's switch to the solution so that we can really get the kind of like. Let me just encapsulate where I think we are now, which is really exciting. So there's an exciting period we've hit, which is that now that we know and we can debate how many nonprofits have program administrative data longitudinal direct service program administrative data that gathers intake information, all the transactional information of what they're getting in the program, and maybe some assessments right, some rigorous assessments, hopefully, instruments that have been developed right that all comes packaged in your administrative system.
Speaker 2:Whether you're providing homeless services to kids, homeless prevention services to kids aging out of foster care, you're providing psychiatric care to inpatient psychiatric care to kids or adults, whatever, whatever your program is, you're having these administrative data systems. You're gathering all this data, right. In some cases you're gathering notes, case manager notes in there, doctor's notes, physician's notes that's text. You know we have all the transactions On this date. They had this treatment, they had this. These are the prescriptions. It's all in that data. Once you have all the transactions on this date, they had this treatment, they had this. These are the prescriptions, these are the like. It's all in that data. Once you have all that data, we now know how to basically go into that data and machine learning algorithms can go in and figure out and find natural experiments. So what do I mean by that?
Speaker 2:We have successfully trained and can train, and there's a, an article that I think is a really good, put out by project evident, who is really working on this whole ai for evidence issue. I recommend everybody to check out project evident. Um, they published a paper, uh, called unlocking real-time evidence for practitioners. This is a case study of two cases that um. I actually authored the paper, but they published it. It was supported by the Gates Foundation, and so I recommend going to Project Evident and taking a look at this paper and so many other resources that they've pulled together around this idea of actionable evidence and a framework for really understanding how we can start to do this work. This is an example, with two case studies in it, where you've basically trained machine learning algorithms.
Speaker 2:We've trained machine learning algorithms to find, first of all, similar cases. Part of good research is, if you're going to do a study, you should study people who are similar in their situation and context. So the first thing you do is you train algorithms to make sure we're not comparing apples and oranges. We know, when I was a caseworker for the homeless, I knew that there were those that were newly homeless, sleeping in their cars and had never been homeless before but it was due to economic circumstances versus another set of clients I worked with which had been on the street all their lives because they were addicted and also had mental health issues. Right, we know those two groups do not need or should get the same things in order to achieve a similar outcome, which might be to get them permanently housed, safely, stably housed. Okay, my point being is that the first thing we can do with all that data you have in your systems is algorithms can find machine learning. Algorithms can find those groups that are very similar, that respond similarly to programming and services that you have. Once it does that.
Speaker 2:Now take that group with the. Let me work this example where you have those who are homeless living in their home their first time it's economic circumstances Say, I'm a caseworker for the homeless when they come in. Let's say we have served in the past three years of our data, we have served, let's say, 150 cases like this. We've documented in our data when we've successfully placed them in a stable housing situation or maybe helped them to get a job that they themselves were able to then leverage for the purposes of getting housed again, and we know that we have that outcome in there. We have this Now. What we can do is we know those that did and we know those that didn't succeed. Let's say we have 150 cases and we have all kinds of services we provided for them, everything from food uh pantry referrals to supporting them with, like, finding work, uh place attire for interviews. We gave them an address. Whatever the case, maybe we have all these different services we've been logging in our data.
Speaker 2:The algorithm goes into that group and says, okay, sometimes this group had these sets of services but not these sets of services. Sometimes in this group others had different mixes of services. These are like little counterfactual experiments. I had bus tickets that I could give out to help people get to job interviews in the first few days of their work, so that the transportation was not a barrier. Let's say, for this group. There were times I would run out of my allotment. So it was a city allotment for public transportation and I would run out of my allotment by about halfway through the month if I had a lot of cases that needed it. So let's say you're in this group and you're in the first part of the month I was able to give you transportation assistance, but at the last half of the month I was not. That's a natural experiment.
Speaker 2:So what happens is the machine learning algorithms can study what happens to the outcome stable housing or stable job or something like that. What happens to that outcome when the same type of cases got or didn't get the the transportation assistance? Did it help boost their likelihood of success? And so that's the second part, find similar cases in your data set and figure out what combinations get success. And now you can imagine two resulting products from this. One is we know what works so we can actually evaluate how many got what works.
Speaker 2:They need transportation plus this, plus this. How many in that group are we successfully getting what works? And that's our evaluation deliverable. And the second thing we can do is actually take that information and at any moment in time, because it's a part of a technology in the administrative data system, we can give you an update as to how that case is doing on getting what works and if they haven't gotten something, we might be able to say to them hey, listen, have you thought about providing more transportation assistance? Because that's going to help this case move the needle and so you get real-time recommenders. All of that can be put on top of, and run in an automated fashion, your administrative data system. That's an example of the opportunity that exists. That case study that I shared with you is a great place to learn more.
Speaker 1:Yeah, so you know, because I spent 20, a long time working with the homeless. It's interesting because the thought that came to my mind that might be a surprising result might be that the people who got this the tokens that that population might be more likely to use the tokens for because of the problem of substance abuse, might use the tokens and you know sort of barter and trade for drugs and that a certain irony would be. You might find that there's more of a problem.
Speaker 2:It actually has a reverse effect. It has a reverse effect, yeah, right, right. The other other thing actually, actually, to that point I I want to exemplify that because actually there were cases that I worked with, so to give you an example of why you need to find match groups of cases, because actually there was a group of cases where when I would actually, who were drug addicted, an alcoholic, and they were living on the streets for years, that group, if I, if I had given them bus tickets, they would go hawk them for Colt 45.
Speaker 2:Or they could buy whatever they could buy to basically get their knees met. I understood that. So the point being is, in that case, actually giving them bus tickets counterintuitively perhaps is a detriment to them because I'm actually feeding them. The problem, right.
Speaker 1:That's right.
Speaker 2:So this is why we have a goldmine in these data sets we can find natural experience. This is quasi-experimental, so we're not avoiding rigor, but we're figuring out more precise, like precision medicine, what works for whom, and we can automate this.
Speaker 1:And the other thought that I had on this particular example is the longevity in the placement is also important. Just because somebody is placed, doesn't you know, with all the challenges that they face with substance abuse, mental health issues and whatnot, the longevity and the quality of that placement for the longer term, and that I find is often very hard for agencies to get the data on the longer term results.
Speaker 2:Well, I'll give you an example. So in the case study in the Unlocking Real-Time Evidence for Practitioners, the Project Evident Paper, in one of the case studies it's for GEMMA Services, which is a psychiatric residential program for children with severe mental health challenges. They actually have somebody who conducts three, six, nine and 12 month follow-ups to just ask a few questions to ask how they're doing. Have they been re-hospitalized for any kind of event or something happened? Are they still living in their home? And we were able to use that data as a part of the modeling so we could even further discover things. So some folks don't have that capacity or resources to do the follow-up. So what you need to do is find the outcomes you control at the end of your program. But it makes you begin to think what this also does. You control at the end of your program, right, but it makes you begin to think what this also does is, with just a little few tweaks, if your system has not quite got the metric you want, you can start to think strategically about what to put in there. It's very important. Now there's a whole lot more under the hood of what I'm talking about, but the reality is we are now in a very different place. We can automate quasi-experimental evaluation, produce results in real time and we're proving it and we have projects working with a number of different funders, organizations, government agencies doing this work, state agencies, other stuff so you can get there stuff. So you can get there.
Speaker 2:There is a cost, but it's costless. Also because it's much less than a randomized control trial study and, as you can hear, once you've built the model, it's actually installed on top of your system. It's almost like a brain's on top of your system and it's much lower cost from the standpoint of maintaining and keeping that going. It's nowhere near the cost you would pay for an ongoing formative evaluation that was fully rigorous, do other things. So the opportunity is really big. It can transform the way we do things. We can actually get evaluation where we need to address the pain points we talked about earlier on, can actually get evaluation where we need to address the pain points we talked about earlier on.
Speaker 2:But I think there's some really big benefits and you and I have talked, ken, if you could imagine a world where at least the ones who have administrative data systems were to have these kind of learning systems in there where they're evaluating, they're creating recommendations for decision support on a case-by-case basis monitoring. But it's with this kind of causal sophistication you and I have had previous conversations where could we go right, being able to more truly evaluate cost per outcome? You could have external reviews of those algorithms to make sure nobody's putting in kind of subjective like you know pat on the back metrics so that you know it looks good. There's all kinds of ways you can begin to think about how this could feed a very different marketplace, because you're also, if you're, held, accountable for outcomes but you have a feedback system that tells you how to fix it. There's fairness in the equation If we just evaluate your outcomes and tell you you're not getting it or your program doesn't work. An RCT may find your program doesn't work or it barely makes a difference, but can it tell you specifically with different types of cases, how to fix it and why, and do it in real time? That's the opportunity that I think is transformative. So that's the opportunity that I think is transformative is we can finally connect the accountability for outcome to the answer to okay, if I'm accountable for it and I find out what my results are, how do I get there? What do I fix what do I do?
Speaker 2:And therein lies the opportunity for data and the opportunity for AI and machine learning and everything else, and it's really taking us to a whole new level Quicker, better, cheaper.
Speaker 2:Quicker, quicker, better, cheaper. It really does have that capability. I think we're still early on this, but we've been doing this for over a dozen years and in the past two, three years now, for where generative AI has come into the mix and the text analytics capabilities and qualitative analysis, automation of that some work we're doing with a very big federal funder around, that is really showing that you can also use text data in ways we never thought we could before Very accurately use text data in ways we never thought we could before. Very accurately, good validity, reliability. So all data counts and we can figure out what works and understand in text data the story behind what works, because now we can actually get the voice. So think of it this way too If we're doing constituent voice and we do surveys, we're entering a world where why wouldn't I just let them tell us their story and then we could analyze that to hear what themes are coming out of it, as opposed to forcing them into a particular category or scale.
Speaker 1:So cool stuff, transformative stuff, finally getting to that place where the nonprofit sector will be able to have agencies show evidence in an objective fashion of their impact and at long last, the trillion plus dollars that are going into the largest nonprofit sector in world history will be able to prove that the money is well spent and in some cases not well spent and hopefully make it a more rational kind of sector where resources are allocated much more effectively and efficiently and we can really do that nonprofit fix man, let's fix that nonprofit sector.
Speaker 2:Well, and let me put a plug in for a couple of things here too, if people would like to learn more about this, I mentioned the paper Unlocking Real-Time Evidence of Practitioners. It's on the Project Evident website. Go to projectevidentorg. I have to triple check that.
Speaker 1:Actually, I just typed in the name of the article and it comes up and there's also a webinar, a video webinar, associated with that too.
Speaker 2:And additionally, if you're interested in actually learning more about. What we're talking about is how AI and ML is doing this kind of quasi-experimental work, what we call precision causal modeling. There's a course, too, that I teach at the Evaluators Institute through Claremont Graduate University College. It's a three-day course. If you go to the Evaluators Institute and look it up, it's coming up in March, coming up here in about three, four weeks. I'm going to be delivering a three-day course on how to leverage administrative data using AI. So a little bit more details on that.
Speaker 2:I also teach a course, because I know we have some international listeners. I've taught a course in the past. I think I'll be doing it this summer as well for the International Program for Development, evaluation and Training that's in Zurich in the summer, in July. So these are all opportunities to learn a little bit more about this. But I and also I'll put in the we'll put in. I'll be very make sure we do this. Ken put a link to. I have a folder of all kinds of publications on this kind of work as well that I can share through our website as well through the podcast site. So I know I did a lot of talking on this one, ken, but I really appreciate your questions.
Speaker 1:You're the expert, brother, you're the expert.
Speaker 2:But this is one that we were counting as our final kind of bookend. We've talked about measurement all the way through our podcast series, the exhaustive sector. A lot of the solutioning of the exhaustion is imagine if we knew what worked for whom and under what conditions and what the cost of that was. How would the implications of that ripple through all the exhausted elements from governance to leadership to to program, all this stuff? We're talking about staffing. You know what it takes understanding the complexities of the work that we actually do. So this kind of bookends it and brings it back and we'll be moving to kind of another set of episodes and talking a little bit more current events and other stuff in the nonprofit sector.
Speaker 1:And just for those of you who have been listening to us through the what now 15 episodes, you may have noticed that our rhythm has slowed a bit from the first year.
Speaker 1:We're now in year two and, just for transparency and full disclosure, with this episode Peter and I think we've got and he actually said bookend and I think it's quite appropriate we are really thinking about putting this into a, into a book format, with some rich support and whatnot for it.
Speaker 1:So we're now sort of going between podcasts and working on the book, the book. So that's why it may have slowed down a bit, but we are committed to continuing to do these podcasts and so we're starting now to get back into the rhythm where we intend to try to get back to a once a month or so podcast, if not more. But that's part of the reason why you may have seen this change, because this episode brings us to the end of what we want to, at least as we think of it at this point, what we're thinking about putting into that book on the exhausted sector and on the basic challenges to the sector and some of the basics on short-term and long-term ways to fix the challenges we face. And, just a spoiler alert. Our thinking for the next episode is going to be on a question that I've been asked like literally almost twice a day what are the changes in the federal administration, what does that mean for the nonprofit sector and how do we handle it? So that's just get ready for the next episode.
Speaker 2:That'll be a biggie, and please share our podcast with others. If you like our podcast or if you don't give us a, give us a little shout out here. Give us some five-star ratings, if you will let us know um and share this with your friends, colleagues, et cetera. Uh, our audience is growing and we want to continue growing the audience. So please, if you get a chance, just share the link with somebody and help them to get oriented and introduced to the nonprofit fix. And thank you everybody.
Speaker 1:Thank you, Ken, thanks, thank you All right Until next time.