Midterm Study Guide

Session 1: Events and Probabilities of Events

This session is all about the fundamental building blocks of probability.

The Gist: It defines the basic language of probability: what are all the possible outcomes, what is an event we care about, and how do we calculate basic chances.
Analogy: Imagine a vending machine.
Sample Space (Ω): This is the entire collection of items in the machine. It's every single possible thing you could get.
Event (A): This is a specific outcome or a set of outcomes you're interested in. For example, the event "getting a soda" includes all the different types of soda in the machine. The event "getting item B4" is a single, specific outcome.
Key Formula(s):
For equally likely outcomes: P(A) = (Number of ways A can happen) / (Total number of possible outcomes) = #A / #Ω.
Probability of "NOT A": P(Aᶜ) = 1 - P(A). The chance of something not happening is 1 minus the chance that it does happen.
For any two events A and B: P(A ∪ B) = P(A) + P(B) - P(A ∩ B). The probability of A or B happening is the sum of their individual probabilities, minus the probability of them both happening (to avoid double-counting).
Tutor's Note: Remember that probabilities are always between 0 (impossible) and 1 (certain). If you get an answer outside this range, you've made a mistake!

Session 2: Counting Methods

This session is about how to count the number of ways things can happen, which is crucial for calculating probabilities. The key is to figure out if the order of things matters.

The Gist: Learning efficient ways to count arrangements and selections without listing them all out.
Analogy: Think about choosing ice cream.
Permutation (Order Matters): Imagine you're getting a two-scoop cone. Getting "chocolate then vanilla" is a different experience and arrangement than "vanilla then chocolate." Permutations are for counting when the sequence is important.
Combination (Order Doesn't Matter): Now imagine you're getting two scoops in a cup. "Chocolate and vanilla" is the exact same combination as "vanilla and chocolate." It's just a group of two flavors. Combinations are for counting groups where the sequence of selection is irrelevant.
Key Formula(s):
Permutation: The number of ways to arrange k items from a set of n is Pn,k = n! / (n-k)!.
Combination: The number of ways to choose a group of k items from a set of n is Cn,k = n! / (k!(n-k)!).
Tutor's Note: The big question to ask yourself for any problem is: "Does the order matter?" If you're arranging things, forming a password, or assigning specific roles (1st, 2nd, 3rd place), it's a permutation. If you're just selecting a group, committee, or a hand of cards, it's a combination.

Session 3: Conditional Probability and Bayes' Theorem

This is about how probabilities change when you get new information.

The Gist: Updating the chance of an event happening based on the knowledge that another event has already occurred.
Analogy:
Conditional Probability: What's the probability that your friend is at home? Now, what's the probability your friend is at home given that their car is in the driveway? The new piece of information (the car) changes your original probability.
Bayes' Theorem: This is a formula to flip conditional probabilities around. Imagine a medical test. You might know the probability of testing positive if you have the disease. But what you really want to know is the probability you have the disease if you test positive. Bayes' Theorem lets you use the information you have to find the information you need, like a detective using clues to find the probability of a specific culprit.
Key Formula(s):
Conditional Probability: The probability of A given B is P(A|B) = P(A ∩ B) / P(B).
Multiplication Rule: P(A ∩ B) = P(A|B) * P(B).
Bayes' Theorem: P(B|A) = [P(A|B) * P(B)] / P(A).
Tutor's Note: A common trap is confusing P(A|B) with P(B|A). They are not the same! The probability of seeing clouds given it's raining is high, but the probability of it raining given you see clouds is not necessarily as high (since not all clouds produce rain).

Session 4: Random Variables and Discrete Distributions

This session is about turning the outcomes of an experiment into numbers so we can do math with them.

The Gist: Assigning numerical values to outcomes and then calculating the average and spread of those numbers.
Analogy:
Random Variable (X): It’s a rule that assigns a number to an outcome. In a game where you flip a coin, the outcomes are {Heads, Tails}. A random variable X could be "the money you win," so X(Heads) = $1 and X(Tails) = -$0.50. It's a way to quantify the results.
Expectation (E[X] or Mean): This is the long-run average you'd expect if you repeated the experiment over and over. If you play the coin game a thousand times, what's your average winning per game? It's the "center of mass" or balance point of the distribution.
Variance (Var[X]): This measures how spread out the numerical outcomes are. A high variance means the results are unpredictable and all over the place (a high-risk, high-reward game). A low variance means the results are very consistent and close to the average (a low-risk, low-reward game).
Key Formula(s):
Expectation: E[X] = Σ [x * P(X=x)]. (Sum of each value times its probability).
Variance: Var(X) = E[(X - E[X])²] = E[X²] - (E[X])². The second version is almost always easier for calculations.
Tutor's Note: The "expected value" is not necessarily a value you expect to get in any single trial. For a single die roll, the expected value is 3.5, which is impossible to actually roll. It's a long-term average.

Session 5: Multivariate Distributions and Independence

This is about looking at the relationship between two or more random variables.

The Gist: Understanding how two variables behave together and whether knowing one gives you information about the other.
Analogy: Think about the relationship between daily temperature (X) and daily ice cream sales (Y).
Joint Distribution: This tells you the probability of two things happening at once, like P(Temp=85°, Sales=200). It's a table or function that gives the probability for every possible pair of outcomes.
Independence: Two variables are independent if they have nothing to do with each other. For example, ice cream sales and the number of traffic jams in another city are likely independent. Knowing one tells you nothing about the other.
Covariance: This measures how two variables move together. For temperature and ice cream sales, the covariance would be positive, because when temperature goes up, sales tend to go up too. If covariance is negative, one tends to go up when the other goes down.
Key Formula(s):
Independence: X and Y are independent if P(X=x, Y=y) = P(X=x) * P(Y=y) for all x and y.
Covariance: Cov(X,Y) = E[XY] - E[X]E[Y].
Variance of a Sum: Var(X+Y) = Var(X) + Var(Y) + 2*Cov(X,Y). If X and Y are independent, their covariance is 0, so Var(X+Y) = Var(X) + Var(Y).
Tutor's Note: Independence implies zero covariance. However, zero covariance does not imply independence. This is a classic exam question! You can have a strong non-linear relationship where the covariance is zero.

Session 6: Continuous Distributions

This session moves from countable outcomes (like 1, 2, 3 heads) to a continuous range of outcomes (like height, weight, or time).

The Gist: How to handle probabilities when the outcome can be any value within a range.
Analogy:
Probability Density Function (PDF): Imagine spreading a kilogram of sand along a 1-meter line. The probability of the outcome being in a certain interval is the weight of the sand in that interval. The PDF is the density (height) of the sand at any given point. Where the sand pile is higher, outcomes are more likely. For a continuous variable, the probability of getting exactly one specific value is zero, just as a single, infinitely thin line of sand has zero weight. You can only measure the weight (probability) over an interval (area).
Key Formula(s):
The probability of X being between a and b is the area under the PDF curve: P(a ≤ X ≤ b) = ∫[a,b] f(x) dx.
The total area under the PDF must equal 1: ∫[-∞,∞] f(x) dx = 1.
Expectation: E[X] = ∫[-∞,∞] x * f(x) dx.
Variance: Var(X) = E[X²] - (E[X])², where E[X²] = ∫[-∞,∞] x² * f(x) dx.
Tutor's Note: Notice the parallel between discrete and continuous formulas! The summation sign Σ from the discrete cases is simply replaced by the integral sign ∫ for continuous cases. The logic is the same; you are still summing up all the possibilities, just over a continuous range instead of distinct points.