Sampling With Replacement vs. Without Replacement

Often in statistics we’re interested in collecting data so that we can answer some research question. For example, we might want to answer the following questions: 1. What is the median household income in Cincinnati, Ohio? 2. What is the mean weight of a certain population of turtles? 3. What percentage of residents in a certain county support a certain law? In each scenario, we are interested in answering some question about a population, which represents every possible individual element that we’re interested in measuring. However, instead of collecting data on every individual in a population we typically just collect data on a sample of the population, which represents a portion of the population. There are two different ways to collect samples: Sampling with replacement and sampling without replacement. This tutorial explains the difference between the two methods along with examples of when each is used in practice.

Sampling with Replacement

Andy
Karl
Tyler
Becca
Jessica

Suppose we would like to take a sample of 2 students with replacement.

On the first random draw, we might select the name Tyler. We would then place his name back in the hat and draw again. On the second draw, we might select the name Tyler again. Thus our sample would be:

This is an example of obtaining a sample with replacement because we replace the name we choose after each random draw.

When we sample with replacement, the items in the sample are independent because the outcome of one random draw is not affected by the previous draw.

For example, the probability of choosing the name Tyler is 1/5 on the first draw and 1/5 again on the second draw. The outcome of the first draw does not affect the probability of the outcome on the second draw.

Sampling with replacement is used in many different scenarios in statistics and machine learning, including:

Bootstrapping
Bagging
A Simple Introduction to Boosting in Machine Learning
A Simple Introduction to Random Forests

In each of these methods, sampling with replacement is used because it allows us to use the same dataset multiple times to build models as opposed to going out and gathering new data, which can be time-consuming and expensive.

Sampling without Replacement

Again, suppose we have the names of 5 students in a hat:

Suppose we would like to take a sample of 2 students without replacement.

On the first random draw, we might select the name Tyler. We would then leave his name out of the hat. On the second draw, we might select the name Andy. Thus our sample would be:

This is an example of obtaining a sample without replacement because we do not replace the name we choose after each random draw.

When we sample without replacement, the items in the sample are dependent because the outcome of one random draw is affected by the previous draw.

For example, the probability of choosing the name Tyler is 1/5 on the first draw and the probability of choosing the name Andy is 1/4 on the second draw. The outcome of the first draw affects the probability of the outcome on the second draw.

Sampling without replacement is the method we use when we want to select a random sample from a population.

For example, if we want to estimate the median household income in Cincinnati, Ohio there might be a total of 500,000 different households.

Thus, we might want to collect a random sample of 2,000 households but we don’t want the data for any given household to appear twice in the sample so we would sample without replacement.

In other words, once we’ve chosen a certain household to be included in the sample we don’t want there to be any chance of selecting that household to be included again.

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “Sampling With Replacement vs. Without Replacement”

ardj says:

This is clear as far as it goes. But you do not explain how the probabilities for sampling without replacement are arrived at. Yes, intuitively it is obvious but a rational description that cover other situations would be valuable. And of course there is no figure, let alone explanation, for probabilitu of sampling with replacement

James Carmichael says:

Hi ardj…Let’s break down the concept of sampling with and without replacement and how the probabilities are calculated for each scenario. ### **Sampling Without Replacement** When you sample without replacement, each item selected is not put back into the pool, so the total number of items decreases with each selection. This affects the probability of selecting subsequent items. #### **Example:**
Imagine you have a deck of 5 unique cards (A, B, C, D, E), and you want to draw 2 cards without replacement. 1. **First draw:**
– The probability of drawing any specific card (say A) is \( \frac \) because there are 5 cards available. 2. **Second draw:**
– After drawing the first card, there are only 4 cards left. If you drew A first, the probability of drawing any specific card next (say B) is \( \frac \). #### **General Formula:**
For the first item:
– \( P(\text) = \frac \), where \( N \) is the total number of items. For the second item (without replacement):
– \( P(\text) = \frac \). If you continue this process, the probability of selecting a specific sequence of \( k \) items without replacement is:
\[ P(\text) = \frac \times \frac \times \cdots \times \frac \] ### **Sampling With Replacement** When you sample with replacement, each item selected is returned to the pool, so the total number of items remains the same for each selection. This means the probability of selecting any particular item remains constant across all draws. #### **Example:**
Using the same deck of 5 unique cards (A, B, C, D, E), if you want to draw 2 cards with replacement: 1. **First draw:**
– The probability of drawing any specific card (say A) is \( \frac \). 2. **Second draw:**
– After drawing the first card, you put it back in the deck, so the probability of drawing any specific card again (say B) is still \( \frac \). #### **General Formula:**
For each draw with replacement:
– \( P(\text) = \frac \), where \( N \) is the total number of items. The probability of selecting a specific sequence of \( k \) items with replacement is:
\[ P(\text) = \left(\frac\right)^ \] ### **Comparison and Intuition:** – **Without Replacement:** The probabilities change after each draw because the pool of items decreases. This leads to a decreasing probability of drawing any specific item as the draws continue. – **With Replacement:** The probabilities remain constant because the item pool remains unchanged after each draw. ### **Visual Representation:**
Imagine a bag with 5 colored balls: Red, Blue, Green, Yellow, and Orange. – **Without Replacement:** If you draw a Red ball first, there are only 4 balls left. The probability of drawing another Red ball is now zero (since it’s no longer in the bag), and the chance of drawing any other ball is higher (now \( \frac \) instead of \( \frac \)). – **With Replacement:** After drawing the Red ball, you put it back in the bag. The probability of drawing Red again is still \( \frac \) because the total number of balls hasn’t changed. ### **Conclusion:** The key difference between sampling with and without replacement lies in whether the pool of items changes after each draw. This distinction affects the probability calculations, where without replacement the probabilities change dynamically, while with replacement, they remain static. Understanding these principles allows you to apply them to various scenarios, such as probability distributions, Monte Carlo simulations, and real-world sampling problems.