Understanding Game Theory and Nash Equilibrium

Ziyi Zhu

Ziyi Zhu / March 10, 2025

10 min read––– views

In this blog post, we'll walk through the key concepts that form the backbone of strategic decision analysis, from the foundations of Game Theory and Nash Equilibrium to their practical applications in everyday scenarios. We'll examine classic examples like the Prisoner's Dilemma and the Goalkeeper Problem, analyze mixed strategies in competitive settings, and investigate how repeated interactions can transform conflict into cooperation.

Introduction to Game Theory

Game theory is a mathematical framework for analyzing strategic interactions among rational decision-makers. Developed primarily in the mid-20th century, it provides powerful tools for understanding situations where the outcome of one's choices depends not only on one's own decisions but also on the choices made by others.

At its core, game theory models conflicts and cooperation between intelligent, rational agents. These "games" can represent anything from business competition and military strategy to evolutionary biology and social dynamics. The framework helps us understand how individuals might behave when their success depends on anticipating the actions of others.

A "game" in this context has several key components:

  • Players: The decision-makers involved
  • Strategies: The possible actions each player can take
  • Payoffs: The rewards or penalties that result from different combinations of strategies
  • Information: What players know about the game and about each other

Nash Equilibrium: The Cornerstone of Game Theory

Named after mathematician John Nash (whose life was portrayed in the film "A Beautiful Mind"), a Nash Equilibrium represents a stable state in a game where no player can benefit by unilaterally changing their strategy, assuming other players maintain theirs.

In other words, a Nash Equilibrium is a set of strategies, one for each player, where each player's strategy is optimal given the strategies of all other players. Once reached, players have no incentive to deviate from their chosen strategy.

Definition: Formally, let SiS_i be the set of all possible strategies for player ii, where i=1,,Ni=1,…,N. Let s=(si,si)s^*=(s_i^*,s_{-i}^*) be a strategy profile, a set consisting of one strategy for each player, where sis_{-i}^* denotes the N1N−1 strategies of all the players except ii. Let ui(si,si)u_i(s_i,s_{-i}^*) be player ii's payoff as a function of the strategies. The strategy profile ss^* is a Nash equilibrium if:

ui(si,si)ui(si,si)siSiu_i(s_i^*,s_{-i}^*) \geq u_i(s_i,s_{-i}^*) \forall s_i \in S_i

The concept is powerful because it helps predict likely outcomes in strategic situations. When rational players interact, we can expect the outcome to be a Nash Equilibrium—even if that equilibrium doesn't maximize collective welfare.

The Prisoner's Dilemma: Game Theory in Action

The Prisoner's Dilemma is perhaps the most famous illustration of game theory concepts. The classic scenario involves two suspects being interrogated separately, each facing a choice: cooperate with their accomplice by staying silent or defect by betraying their partner.

The Setup

Two suspects, A and B, are arrested and placed in separate cells with no means of communication. The prosecutors offer each prisoner the same deal:

  • If one confesses (defects) while the other remains silent (cooperates), the confessor goes free while the silent one gets 3 years in prison.
  • If both confess (defect), each receives 2 years.
  • If both remain silent (cooperate), each receives only 1 year on lesser charges.

For each suspect i{A,B}i \in \{A, B\}, the strategy set is Si={Defect,Cooperate}S_i = \{\text{Defect}, \text{Cooperate}\}.

Payoff Matrix

We can represent this scenario with a payoff matrix, where the numbers represent years in prison (negative utility):

B Cooperates (Stays Silent)B Defects (Confesses)
A Cooperates (Stays Silent)(-1, -1)(-3, 0)
A Defects (Confesses)(0, -3)(-2, -2)

Finding the Nash Equilibrium

To identify the Nash Equilibrium, we analyze each player's best response to the other's strategy:

  1. If B cooperates (stays silent), A's best response is to defect (confess) and get 0 years instead of 1 year.
  2. If B defects (confesses), A's best response is still to defect and get 2 years instead of 3 years.

The same logic applies to B's decisions. Therefore, regardless of what the other player does, defecting is always the dominant strategy for both players.

The Nash Equilibrium in this game is (sA,sB)=(Defect,Defect)(s_A^*, s_B^*) = (\text{Defect}, \text{Defect}), resulting in both players getting 2 years in prison. This is the stable outcome, despite the fact that if both had cooperated, they would have received only 1 year each—a better outcome for both.

This paradox illustrates why individually rational decisions can lead to collectively suboptimal outcomes, a phenomenon known as a Pareto inefficient Nash equilibrium—crucial in understanding market failures, common resource dilemmas, and various social coordination problems.

The Goalkeeper Problem: Mixed Strategies in Action

The goalkeeper problem provides an excellent illustration of situations where no pure strategy Nash Equilibrium exists, necessitating the use of mixed strategies—probability distributions over the set of pure strategies.

The Setup

Consider a penalty kick in soccer. The striker can kick left or right, and the goalkeeper can dive left or right. Both players must choose simultaneously without knowing the other's choice.

For the striker and goalkeeper, their strategy sets are SS=SG={Left,Right}S_S = S_G = \{\text{Left}, \text{Right}\}.

Let's define the payoffs as:

  • uS(sS,sG)u_S(s_S, s_G) = payoff to the striker (probability of scoring)
  • uG(sS,sG)u_G(s_S, s_G) = payoff to the goalkeeper (probability of saving)

Note that in this zero-sum game, uG(sS,sG)=1uS(sS,sG)u_G(s_S, s_G) = 1 - u_S(s_S, s_G) since the probability of the goalkeeper saving is the complement of the striker scoring.

Payoff Matrix

We can represent this as a payoff matrix:

Goalkeeper LeftGoalkeeper Right
Striker Left(0.4, 0.6)(0.9, 0.1)
Striker Right(0.9, 0.1)(0.4, 0.6)

No Pure Strategy Equilibrium

To determine if a pure strategy Nash Equilibrium exists, we check each strategy combination:

  1. (sS,sG)=(Left,Left)(s_S, s_G) = (\text{Left}, \text{Left}): Payoff is uS(Left,Left)=0.4u_S(\text{Left}, \text{Left}) = 0.4

    • If striker deviates to Right: New payoff would be uS(Right,Left)=0.9u_S(\text{Right}, \text{Left}) = 0.9
    • Since 0.9>0.40.9 > 0.4, striker has incentive to deviate
    • Not an equilibrium
  2. (sS,sG)=(Left,Right)(s_S, s_G) = (\text{Left}, \text{Right}): Payoff is uS(Left,Right)=0.9u_S(\text{Left}, \text{Right}) = 0.9

    • If goalkeeper deviates to Left: New payoff would be uG(Left,Left)=0.6u_G(\text{Left}, \text{Left}) = 0.6
    • Since 0.6>0.10.6 > 0.1, goalkeeper has incentive to deviate
    • Not an equilibrium
  3. (sS,sG)=(Right,Left)(s_S, s_G) = (\text{Right}, \text{Left}): Payoff is uS(Right,Left)=0.9u_S(\text{Right}, \text{Left}) = 0.9

    • If goalkeeper deviates to Right: New payoff would be uG(Right,Right)=0.6u_G(\text{Right}, \text{Right}) = 0.6
    • Since 0.6>0.10.6 > 0.1, goalkeeper has incentive to deviate
    • Not an equilibrium
  4. (sS,sG)=(Right,Right)(s_S, s_G) = (\text{Right}, \text{Right}): Payoff is uS(Right,Right)=0.4u_S(\text{Right}, \text{Right}) = 0.4

    • If striker deviates to Left: New payoff would be uS(Left,Right)=0.9u_S(\text{Left}, \text{Right}) = 0.9
    • Since 0.9>0.40.9 > 0.4, striker has incentive to deviate
    • Not an equilibrium

We've proven that no pure strategy combination is stable, which reflects the real-world observation that penalty kicks involve unpredictability.

Mixed Strategy Nash Equilibrium

When no pure strategy equilibrium exists, players adopt mixed strategies—probability distributions over their possible actions. Let's define:

  • pp = probability striker kicks left
  • (1p)(1-p) = probability striker kicks right
  • qq = probability goalkeeper dives left
  • (1q)(1-q) = probability goalkeeper dives right

Expected Payoff Calculations

The striker's expected payoff uS(p,q)u_S(p,q) is:

uS(p,q)=pq0.4+p(1q)0.9+(1p)q0.9+(1p)(1q)0.4\begin{aligned} u_S(p,q) &= p \cdot q \cdot 0.4 + p \cdot (1-q) \cdot 0.9 + (1-p) \cdot q \cdot 0.9 + (1-p) \cdot (1-q) \cdot 0.4 \\ \end{aligned}

Simplifying:

uS(p,q)=pq+0.5p+0.5q+0.4u_S(p,q) = -pq + 0.5p + 0.5q + 0.4

Optimal Strategy for the Striker

In a mixed strategy equilibrium, the striker must be indifferent between kicking left or right. To find the optimal qq that makes the striker indifferent, we apply the indifference principle:

uS(p,q)p=q+0.5=0\begin{aligned} \frac{\partial u_S(p,q)}{\partial p} &= -q + 0.5 = 0 \end{aligned}

Solving for qq:

q=0.5\begin{aligned} q &= 0.5 \end{aligned}

Optimal Strategy for the Goalkeeper

Similarly, the goalkeeper must be indifferent between diving left or right:

uS(p,q)q=p+0.5=0\begin{aligned} \frac{\partial u_S(p,q)}{\partial q} &= -p + 0.5 = 0 \end{aligned}

Solving for pp:

p=0.5\begin{aligned} p &= 0.5 \end{aligned}

Nash Equilibrium Solution

Therefore, the mixed strategy Nash Equilibrium in this game is:

  • Striker: p=0.5p = 0.5 (kick left with 50% probability, right with 50% probability)
  • Goalkeeper: q=0.5q = 0.5 (dive left with 50% probability, right with 50% probability)

The expected payoff of the game for the striker at equilibrium is:

uS(0.5,0.5)=0.50.5+0.50.5+0.50.5+0.4=0.65u_S(0.5, 0.5) = -0.5 \cdot 0.5 + 0.5 \cdot 0.5 + 0.5 \cdot 0.5 + 0.4 = 0.65

This means that when both players play optimally, the striker scores with 65% probability and the goalkeeper saves with 35% probability—a result consistent with empirical observations of professional soccer matches.

Repeated Games and the Evolution of Cooperation

While the one-shot Prisoner's Dilemma leads to mutual defection, the dynamics change dramatically when the game is played repeatedly with the same partner. In repeated interactions, the future casts a shadow on the present, creating incentives for cooperation.

The Iterated Prisoner's Dilemma

When the Prisoner's Dilemma is played repeatedly:

  • Players can condition their current choices on past behavior
  • Cooperation becomes possible through reciprocity and reputation
  • Long-term relationships can overcome short-term temptations to defect

The folk theorem of repeated games formalizes this intuition, stating that any individually rational outcome can be sustained as an equilibrium in an infinitely repeated game if the discount factor is sufficiently high:

δTRTP\delta \geq \frac{T-R}{T-P}

Where:

  • TT = Temptation payoff (defect while other cooperates)
  • RR = Reward payoff (mutual cooperation)
  • PP = Punishment payoff (mutual defection)

Tit-for-Tat: A Brilliantly Simple Strategy

In the early 1980s, political scientist Robert Axelrod organized tournaments where various strategies competed in an iterated Prisoner's Dilemma. The surprising winner was the simplest strategy submitted: Tit-for-Tat, developed by mathematician Anatol Rapoport.

Tit-for-Tat follows four principles:

  1. Start cooperative: Begin by cooperating
  2. Retaliation: If the opponent defects, defect in the next round
  3. Forgiveness: If the opponent returns to cooperation, immediately cooperate in the next round
  4. Clarity: The strategy is transparent and easily understood by opponents

The strategy's success stems from its combination of being "nice" (never the first to defect), "retaliatory" (punishing defection), "forgiving" (willing to return to cooperation), and "clear" (allowing opponents to adapt to it).

Key Lessons from Tit-for-Tat

Axelrod's analysis revealed several profound insights about successful strategies in repeated interactions:

  1. Don't be envious: Focus on maximizing your own score, not on outperforming your opponent.
  2. Be nice: Never be the first to defect.
  3. Retaliate appropriately: Respond to defection to discourage exploitation.
  4. Forgive quickly: Don't hold grudges; be willing to restore cooperation.
  5. Keep it simple: Clear strategies allow others to recognize and adapt to your pattern.

These principles extend far beyond game theory, providing insights into international relations, business partnerships, and evolutionary biology.

Beyond Tit-for-Tat

Further research has identified variations that can outperform the original Tit-for-Tat in certain environments:

  • Generous Tit-for-Tat: Occasionally forgives unprovoked defections with probability ϵ\epsilon
  • Pavlov/Win-Stay-Lose-Shift: Transitions according to the Markov process defined by P(Ct+1Ct,Ct)=P(Dt+1Dt,Dt)=1P(C_{t+1}|C_t,C_t) = P(D_{t+1}|D_t,D_t) = 1 and P(Ct+1Ct,Dt)=P(Dt+1Dt,Ct)=0P(C_{t+1}|C_t,D_t) = P(D_{t+1}|D_t,C_t) = 0
  • Gradual: Increases retaliation with consecutive defections according to function f(n)=nf(n) = n, followed by forgiveness periods

These variations highlight the importance of context in determining optimal strategies for repeated interactions.

The Enduring Relevance of Game Theory

Game theory provides rigorous mathematical tools for analyzing strategic interactions across numerous domains. The Nash Equilibrium concept elucidates stable outcomes in non-cooperative settings, while the Prisoner's Dilemma exemplifies the tension between individual rationality and collective optimality. Mixed strategy equilibria, as demonstrated in the goalkeeper problem, reveal how randomization emerges as an optimal response under certain competitive conditions. The evolution of cooperation in repeated games illustrates how temporal extension of interactions fundamentally alters strategic incentives, potentially reconciling self-interest with socially beneficial outcomes through mechanisms of reciprocity, reputation, and conditional strategies.