Let's now speak briefly about stochastic games. This is a topic that lends itself

to a very long discussion and it's, it's quite a complicated one, but we'll touch

on the, the main points and position it in the landscape of topics we're discussing.

And so, the, the starting point are repeated games. as we know, a repeated

game is simply a game in normal form, for example, that we repeat over and over

again. For example, we play Prisoner's Dilemma once, twice, three times, maybe a

finite time, maybe an infinite number of time. Then, we accumulate all the rewards

all the time to sum overall rewards. A stochastic game really is a generalization

of it, where we play a game repeatedly, but not necessarily the same game. So, we

play a game depending on how we play the game, let's say, Prisoners Dilemma, we

each got some payoff. But depending on how we play the game, we also

probabilistically transition to some other game, play that in turn, and the, the, and

the process continues. A graphical way to look at it is here. If, if, if this here

is a, is a repeated game, where you play the same game over and over again, here,

you play the game and then if you happen to play this, you'll transition to this

game. If you happen to play this, you'll maybe transition to the same game. If you

play this, you, you'll transition here. If you play this, maybe you'll play this game

again, and so on. From each game, you transition probabilistically to, to, to

other games. this is a stochastic game. formally speaking, it's, it's, it's the

following tuple. It's a lot of notation, but the concept is exactly what we, we

saw. We have a, a finite set of state Q. We have a set of players. we have a set of

actions where actions are available to, to, to, to specific players of A sub i, is

the action available to play i. And then, we have two, two functions. We have the

transition probability function. So, this is, depending on the state we're in and on

the actions we took, we move to each of any of the other state s or the, the very

same state with a certain probability as governed by this probability distribution.

And, and similarly, our reward is the reward function which tells us if in a

certain state, a certain action profile was taken by the agents, then this is a

reward to, to that particular agent to, to each of the agents. So, r sub i is a

reward to, to agent i. that's the formal definition. notice that you sort of assume

the implicity that you have the same action spaces here but you, you could

define it otherwise. It simply would involve more notation. There's nothing

inherently important about the action spaces being the same in the different

games within the stochastic game. So, just a few final comments on it. First of all

as we as we saw, this obviously generalizes a notion of a repeated game.

But also, it generalizes a notion of an MDP or a Markov Decision Process. If a, if

it's a stochastic game, if a repeated game is a stochastic game with only one game, a

Markov Decision Process or MDP, is a game with only one player. And so, you have

states there that, where the agents take, agent takes an action, receives a

remuneratory reward, and probably moves to some other state.

And the only difference is that, that he's the only actor in the setting. I mention

this because while MDPs have been studied substantially in a variety of disciplines

from optimization to Computer Science to pure Math and beyond, but also, these two

perspectives of a generalization repeated games, and of MDPs, give you a sense for

the theory and investigations into stochastic games. So, from repeated games,

we inherit the definitions of different ways of aggregating rewards over time. You

can have limited average rewards, future discountage rewards whereas, from the, the

literature on optimization and on MDPs, we get a notion such as stationarity and

Markovian strategies. These have to do with, we also have a

notion of reachability about the structure of the underlying transition probability.

And so again, these are issues that are involved that we won't get to into more in

this lecture but at least we flagged their existence.