CS121

CS121

Introduction to Artificial Intelligence

Winter 2004

Homework #7

(Adversarial Search, Decision Making under Uncertainty)

Out: 2/25/04 — Due: 3/3/04

How to complete this HW: First copy this file; then insert your answers in the file immediately below each question; finally submit the completed file to the submission webpage https://www.stanford.edu/class/cs121/hwsub/hw7.fft before 3/3 at midnight.

Note on Honor Code: You must NOT look at previously published solutions of any of these problems in preparing your answers. You may discuss these problems with other students in the class (in fact, you are encouraged to do so) and/or look into other documents (books, web sites), with the exception of published solutions, without taking any written or electronic notes. If you have discussed any of the problems with other students, indicate their name(s) here:

………………………………………………………………………………………………

Any intentional transgression of these rules will be considered an honor code violation.

General information: In the following, whenever an explanation is required (“Why?”), there will not be full-credit without an explanation. Keep explanations short and to the point. Excessive verbosity will be penalized. If you have any doubt on how to interpret a question, either tell us in advance, so that we can help you understand the question, or tell us how you understand it in your returned solution.

Grading:

Problem #	Max. grade	Your grade
I	5
II	5
III	5
IV	5
Total	20

I. (5 points) Consider a two-player game with observable states and no randomness, like tic-tac-toe. The two players are MAX and MIN. MIN chooses his moves with a classical minimax algorithm that constructs a game tree at fixed depth d and uses the evaluation function e_min. MAX knows this, i.e., he knows both d and e_min, but he does not know how MIN breaks ties, if any. He also has greater computational power than MIN; so, when it is his turn to play, he constructs a game tree of depth d+1. MAX has an evaluation function e_max that he believes is better than e_min. Propose a modification of the minimax algorithm that MAX could use to exploit both his greater computational power and his knowledge of how MIN chooses his own moves. Briefly justify the modification.

II. (5 points) Consider a three-player game with observable states, no randomness, and no alliance between any two players. How should the minimax algorithm be modified to be used by a player? [Hint: First indicate how the evaluation function should be modified.]

III (5 points) A box contains six balls: three red ones and three blue ones. Three of the six balls are selected at random and transferred to a second box, which was originally empty. A ball B is then picked at random from the second box and found to be blue. Knowing this result, what is the posterior (conditional) probability that two red balls and one blue ball were transferred from the first box to the second box? In other words, we want to compute:

P(2 red balls and 1 blue ball in the second box | ball B is blue).

IV (5 points) The state space of an agent consists of 6 states {1, 2, …, 6}. The agent is initially in state 1. The states 4, 5, and 6 are terminal states.

The agent can perform 4 actions A, B, C, and D. There is uncertainty in these actions, so each action may have several possible outcomes. The agent’s sensors are perfect, so the agent always knows the state it is in.

Actions A and B can be performed only when the agent is in state 1. If A is executed, the agent moves to state 2 or 3, with probabilities 0.8 and 0.2, respectively. If B is executed, the agent moves to state 2 or 3, with probabilities 0.4 and 0.6, respectively.

We represent these transition rules by:

A: 1 à {2(0.8), 3(0.2)}

B: 1 à {2(0.4), 3(0.6)}

In the same way, the transition rules for C and D are defined as follows:

C: 2 à {4(0.8), 5(0.2)}

D: 2 à {4(0.6), 6(0.4)}

C: 3 à {6(1)}

D: 3 à {5(0.6), 6(0.4)}

Note that actions C and D are executable in either state 2 or 3.

The agent must move to a terminal state, but the three terminal states provide different rewards:100 for state 4, 200 for state 5, and 50 for state 6. In addition, there is no reward for the agent to be in a non-terminal state and actions have no cost. The utility of a state is defined recursively as follows:

- The utility of a terminal state is the reward provided by that state.

- The utility of performing an action a in a non-terminal state s is the expected utility of the state reached.

- The utility of a non-terminal state s is the maximum utility of performing any action in s.

1. Assume that the agent has first executed action A and is now in state 2. What is the utility of action C? What is the utility of D?

2. What are the utilities of states 2, 3, and 1?

3. A policy is a complete mapping of every non-terminal state into an action executable in that state. For example P = {1(A), 2(C), 3(C)} is a policy that tells to perform A in state 1 and C in states 2 and 3. What is the optimal policy P* for the above problem?

4. For each of the following statements, tell if it is true or false (give no explanation):

a) If the agent performs the following once “Start at state 1; in each non-terminal state, choose the action to perform according to P*”, then the agent is guaranteed to reach the terminal state that gives the greatest reward.

b) If the agent performs the following many times “Start at state 1; in each non-terminal state, choose the action to perform according to P*”, then the average of the rewards collected at the terminal states successively reached by the agent is maximal.

c) Let s be any one of the non-terminal states (1, 2, or 3). If the agent performs the following many times “Start at state s; in each non-terminal state, choose the action to perform according to P*”, then the average of the rewards collected at the terminal states successively reached by the agent is maximal.