RAND Function in Excel to Choose a Random Sample
Do you ever have to report on an M&E outcome indicator and give a percentage, for example, 70% of the population was impacted? How do you figure out what that percentage is? A lot of the time you’ll base that number on the results of a survey. But for your survey results to be valid, you need to choose your sample randomly. Then RAND function in Excel can help you do this!
This post is all about how to choose a random sample when you’re doing a survey (like a baseline or endline). It includes:
- How to use a “Sample Size Calculator”
- How to use the RAND function in Excel to help you choose your list of people to interview
- And how the “Bernoulli Distribution” can help
A Lot of People Don’t Do Random Sampling (and they probably don’t use the RAND function in Excel either)
I know this sounds boring, but you know what? A lot of people don’t do random sampling when they carry out an assessment. And it bugs me. Because when you choose a random sample, your assessment results are more reliable and more professional. As much as possible, you’ve removed the bias from your sample selection so that your survey produces good data.
Are you using random sampling already? Or could you start doing it in your next assessment?
The ONE mistake many people do when carrying out a “survey” is they don’t do a random sample…
…and yet they still report percentages as if their results apply to the whole population.
If you don’t do random sampling, your results are literally percentages that apply only to the people you interviewed. The results are not generalizable to the rest of the population you’re interested in.
I’ve made a video explaining how to choose a random sample using Excel. But first, please download the Excel template I use in the video so you can follow along and use the template when choosing the sample for your next survey.
For those of you who just want a written tutorial, I’ve broken it down into a few steps which I’ll summarize here:
Step 0 – What is your “population”?
Before you report on it, you have to define who exactly is in your population. The random sampling method I describe here assumes you have a complete list of all the people you need to report indicators on. For example:
- if you’re running a training programme, your “population” might be the entire list of all the trainees who have been through your programme this year. So your total population would be “total number of people trained”.
- if you’ve done household distributions of hygiene items, food, or cash, your population would be the entire list of households that have received a distribution. Your population would be “total number of households reached”.
- if you’ve done a community WASH programme, your population would be the entire list of all households in the village or villages you’ve reached. Your population would be “total number of households in the community(s)”.
But this methodology assumes you can make a list in Excel of every single household or ID number or name that’s in the whole entire “population” of your targeted beneficiaries.
Step 1 – What is your “sample size”?
To get a percentage for your M&E indicator, you need to do a survey such as a baseline or endline. You could go to every single household or person in your population and ask your survey questions. But this might be a waste of time, money, and resources. Instead, you could define a “sample” (or subset) of your population – and if this sample is chosen randomly, then the results from only asking some people will be similar to the results of having asked every single person.
To calculate a sample size, you should use an online Sample Size Calculator, like this one.
- It will ask you to plug in your “population size” (see Step 0). For my example, I’m using a pretend population of 300 households.
- Then it will ask you to choose a confidence interval. Typically, you’ll probably use a confidence interval of 90% or 95% (most commonly 95%).
- Then it will ask you to choose a margin of error that you’re okay with. Most of the time, you’ll probably choose 5%. If you’ve got really difficult access constraints (depending on your context) or a very low budget, then you might choose something like 10%.
When you press “calculate” it will then calculate a sample size for you. In my example with a population of 300, 95% confidence interval with a margin of error of 5%, my sample size would be 169 households to interview.
Step 2 – Prepare for Interviewees Who Aren’t Available for the Interview
What if you choose a sample to interview, but then someone isn’t available when you go to interview them? How do you prepare for this before you begin your survey?
The easiest way to do this is to assume that a certain percentage of people will be no-shows. This is all dependent on context. If you can ask around your team, colleagues, or other organisations who have worked in the area before, they might be able to give you an idea of the number of people who are might be unavailable on any given day.
For example, I usually assume 15% or 20% of people won’t be available on the day of interviews.
Then you just multiply your sample size by your percentage of no-shows. (so, 15% x 169 households = 25 households) Now, add that many people on top of your sample size to get the total number of people you need to select for an interview. (so, 25 + 169 households = 194 households to interview).
Step 3 – Calculate Probability of an Interview
The next quick little calculation you want to do is calculate the probability of a person being chosen for an interview. To do this, divide the number of people or households you need to interview (from step 2) by the total number of people in the whole population. In my example, this would be 194 divided by 300. This results in 64.783%. So, this means that any one person in the population has about a 65% chance of being selected for an interview.
Step 4 – Use the RAND Function in Excel to Randomly Select People to Interview
Because of a nice little statistical distribution called the “Bernoulli” distribution, you can now use the RAND function in Excel to randomly select people from your population list to interview.
- the RAND function in Excel produces a random number between 0 and 1. To see this in action, just type it into an Excel cell like this:
=rand()
- Because it produces numbers totally randomly, then about 50% of the time, it will produce numbers that are 0.5 or below. and 60% of the time it will produce numbers that are 0.6 or below. And 20% of the time, it will produce numbers that are 0.2 or below. Do you see the pattern?
- So we can use the probability we calculated in Step 3, combined with the Bernoulli distribution, to write the following formula next to each name in our population list:
=if(rand()<$B$2,"yes","no")
- in the cell B2, place the “probability” we calculated in step 3.
- This formula will check the random number generated by RAND(). If the random number is equal to our probability or lower, then it will say “yes” we should interview this person. Or else, if it’s higher than our probability, then it will say “no”, we shouldn’t interview this person.
- If you want to learn more about the “if” function and the nested if function, check out my other tutorial here.
Step 5 – Make Sure you Copy your RAND function in Excel and “Paste Special” as “Values”
Every time you refresh an Excel formula, or refresh a spreadsheet, all of the formulas recalculate, and all of the random numbers generated by RAND() will change. So it’s important to copy your random interview selection list and “paste special” as “values” so it doesn’t change.
Step 6 – Double Check your Total Number of Interviewees Using a Pivot Table
Because we’ve used a random number generator, we’re never sure exactly how many names from your population list it will select for you to interview. It should be very close to the number you calculated in Step 2, however it might be a little higher or a little lower. Use a Pivot Table to summarize how many “yes’s” it calculated to interview and how many “no’s” it calculated. Is the number of “yes’s” close to the number of people you need to interview? If so, great. If not, you might want to refresh the RAND() formula again and check the new calculation.
For example, I calculated that I needed to have about 194 people on my interview list in Step 2. Using the RAND() formula, Excel randomly selected 195 households for me to interview. Okay. Close enough. Or if it had selected 193 households to interview, fine. But perhaps if it only selected 188 or 205 households, then I might refresh the formula to get a selection closer to my target from Step 2.
Steps 7 & 8 – Produce the FINAL Interview List
Filter your household list, select all the people that had a “yes” to be interviewed. Copy that list to a final Excel spreadsheet and produce the FINAL list of people to interview.
These people were selected randomly, so the percentage results you get from them should apply to your ENTIRE population.
If you’re still reading, you’re amazing. This was a long and detailed tutorial. I really hope you watch the video explanation and download the Excel template. Use them to follow along, and maybe even use the Excel template for selecting interviewees for your next survey!
By Janna
Janna is an aid worker, an engineer, a mom, a wife, and a self-declared data-lover! Her mission is to connect with every field worker in the world to help the humanitarian sector use information management and technology to make aid faster and more accountable.