CS121

Introduction to Artificial Intelligence

Winter 2004

 

 

 

Homework #8

(Probabilistic Inference and Inductive Learning)

Out: 3/3/04 — Due: 3/10/04

 

 

How to complete this HW: First copy this file; then insert your answers in the file immediately below each question; finally submit the completed file to the submission webpage https://www.stanford.edu/class/cs121/hwsub/hw8.fft before 3/3 at midnight.

 

Note on Honor Code: You must NOT look at previously published solutions of any of these problems in preparing your answers. You may discuss these problems with other students in the class (in fact, you are encouraged to do so) and/or look into other documents (books, web sites), with the exception of published solutions, without taking any written or electronic notes. If you have discussed any of the problems with other students, indicate their name(s) here:

………………………………………………………………………………………………

Any intentional transgression of these rules will be considered an honor code violation.

 

General information: In the following, whenever an explanation is required (“Why?”), there will not be full-credit without an explanation. Keep explanations short and to the point. Excessive verbosity will be penalized. If you have any doubt on how to interpret a question, either tell us in advance, so that we can help you understand the question, or tell us how you understand it in your returned solution.

 

Grading:

 

Problem #

Max. grade

Your grade

I

5

 

II

5

 

III

5

 

IV

5

 

Total

20

 

 

 

I. (5 points) Logic sentence AÞB (read “A implies B”) is defined as the equivalent of ØAÚB (read “not A or B”). So, AÞB is True if and only if either A is False or both A and B are True.

 

Consider the following three sentences: AÞB, BÞC, and AÞC. Given a sentence S, Pr(S) denotes the probability that S is True. We are told that: Pr(AÞB) = 0.7 and Pr(BÞC) = 0.8. Given this information, we want to determine Pr(AÞC). In fact, Pr(AÞC) is not uniquely determined by the given information. Instead, it can only range over a sub-interval of [0,1]. The goal of this problem is to determine this interval.

 

  1. We define abc, ab, ac, …, as follows

abc = Pr(AÙBÙC)

ab = Pr(AÙBÙØC)

ac = Pr(AÙØBÙC)

...

a = Pr(AÙØBÙØC)

x = Pr(ØAÙØBÙØC)

 

Express Pr(AÞB), Pr(BÞC), and Pr(AÞC) using abc, ab, ac, bc, a, b, c, and x.

 

  1. Express all constraints on abc, ab, ac, bc, a, b, c, and x, using the axioms of probability and the given information?

 

  1. Derive from these constraints the range of values of Pr(AÞC).

 

 

II. (5 points) Suppose we generate a training set from a decision tree and then apply decision-tree learning to that training set. Is it the case that the learning algorithm will eventually return the correct tree as the training set goes to infinity? Why or why not?

 

III. (5 points) Most likely, at the beginning of an exam, you try to predict whether each problem is easy or difficult (D = + if it is difficult and – if it is easy). Let us assume that you use two observable problem attributes (predicates):

·         The text length L of the problem (L = 1 if it is long, 0 otherwise)

·         The amount M of math in the text (M = 1 if there is a lot of math, 0 otherwise)

For training data, assume that you have examined 12 previous problems from the homeworks, and have collected the following data:

 

L

M

D

#

0

0

-

4

0

0

+

1

0

1

-

0

0

1

+

3

1

0

-

1

1

0

+

2

1

1

-

1

1

1

+

0

 

The first line of this table reads as follows: 4 problems for which L = 0 and M = 0 were not difficult (D = -). The second line says: 1 problem for which L = 0 and M = 0 was difficult (D = +). Etc… Note that you observed no problem for which L = 0 and M = 1 and D = -, or L = 1 and M = 1 and D = +.

 

Based on this training data, you want to compute a representation of a difficult problem (D) in the form of a decision tree using the two binary attributes L and M.

 

  1. Construct the best decision tree you can for the training data. [As presented in class, choose the best attribute to test next by minimizing the number of misclassified examples in the training set. In addition, use no stopping criterion or pruning, so that the tree is grown until you run out of attribute; then, if needed, use majority rule.]

 

  1. Suppose we have 3 test examples, whose correct classification we know, as follows:

 

L

M

D

0

0

-

1

1

+

0

1

+

 

      Does the decision tree agree with each of these?

 

IV. (5 points) A program uses the version space method to determine the concept of a rewarded card. It is assumed that the concept to be learned is of the form:

REWARDED(x)   Û   r(x) Ù s(x)

where r is a rank predicate and s a suit predicate.

 

A rank predicate is one of the following: 1, 2, …, 10, J, Q, K, NUM (1, 2, …, 10), FACE (J, Q, K), ODD (1, 3, 5, …, 9, J, K), EVEN (2, 4, …, 10, Q), HONOR (1, J, Q, K), TEN-VALUE (10, J, Q, K), ANY-RANK (1, …, K).

 

A suit predicate is one of the following: §, ª, ¨, ©, BLACK, RED, MAJOR(that is, ª or©), MINOR (that is, § or ¨), ANY-SUIT.

 

For simplification we represent the concept REWARDED by such expressions as (FACE, BLACK), (TEN-VALUE, MAJOR),  or (ACE, ANY-SUIT).

1.      Show the graphs (one for ranks and one for suits) that represent the hierarchy of generality of the predicates.

2.      Give the specific and general boundaries (respectively, the S-boundary and the G-boundary) of the version space after each of the following three series of  examples (each example is a card followed by “+” is the card is a rewarded card and “–” otherwise):

a.       (K§ +)(10ª -) (8§ +)

b.      (8§ +)(J© +)

c.       (8¨ -)(3© -)(2ª -)(9§ -)(10© -)(1ª -)(J© +)(Qª +)