Course notes for CS154

This file will keep track of the material covered so far, and list the handouts distributed to date.

Preliminaries

Chapter 0.1 of the book tells you where this course will be about. Read it before taking this class, together with the first section of the preface. The rest of Chapter 0 consist of preliminaries that you should have encountered in CS103 or other courses. Please leaf through this chapter to check that you are acquainted with its contents. Ask us for help if necessary, preferably prior to the first class. During the first lecture, homework will be assigned on this material.

Wednesday, April 3

I presented an overview of the course, also treating this handout. Then I made a start with the first of the four subjects of this course, namely regular languages. I introduced DFAs, NFAa and regular expressions and told how they define languages. Next week I will show that these three methods are equally powerful, i.e. define the same class of languages, the regular ones. In terms of Sipser, I treated Section 1.1 until page 44, Section 1.2 until page 54 and 1.3 until page 66. It helps if you reread the same material from the book. Homework is on the web.

Monday, April 8

Today I showed the equivalence of the three approaches to defining regular languages. I also showed that regular languages are closed under intersection (besides being closed under the regular operators). This finishes Section 1.3 of Sipser, and everything before that.

For the readers of Hopcroft & Ullman (HU): We have covered Chapters 1 and 2 now, while skipping 2.6 - 2.8. (The applications in Section 2.8 may be good to read though.) Wednesday Chapter 3.

For the readers of Hopcroft, Motwani & Ullman (HMU): We have covered Chapters 1, 2 and 3, while skipping Sections 2.4, 3.3 and 3.4. (The applications in Sections 2.4 and 3.3 may be good to read though.) Wednesday Chapter 4.

Wednesday, April 10

Today I presented two tools for proving languages not to be regular: the pumping lemma, and closure properties. Subsequently I showed how to make sure that a DFA has as few states as possible, by calculating which states are equivalent and identifying equivalent states. This subject (minimization) is covered in HMU, but not in Sipser. Therefore I hereby present this handout. Finally I showed how three basic problems that are unsolvable in some other classes of languages, can be solved for regular languages, namely the question whether a given string belongs to a given regular language, whether a regular language is empty, and whether two regular languages are equal. In all the cases the regular languages may be given as DFAs, NFAs or regular expressions.

Covered material on regular languages: Sipser Chapter 1 (all of it) and minimization of DFAs (not in Sipser). Also the easy algorithms below.
This corresponds to HMU Chapter 2 (except 2.4), 3 (except 3.3-4), and 4 (except 4.2.3-4),
and to HU Chapter 2 (except 2.7-8) and 3 (except Theorems 3.9-10).
Important topics:

DFAs and NFAs recognize, and regular expressions generate, the same class of languages, called the regular languages.
These languages are closed under intersection, union, concatenation, star, complement and reversal.
The pumping lemma and the closure properties are the main tools for showing languages to be non-regular.
There are easy algorithms for

Friday, April 12

Aaron gave the first discussion session today. Here are his notes: page 1, page 2 & page 3.

Monday, April 15

I presented context-free languages by means of context-free grammars as well as pushdown automata. Wednesday I will show that these methods are equally powerful. I showed that regular languages are context-free. I also treated parsing, ambiguity, and "repairing" ambiguous grammars.

Wednesday, April 17

Today I showed that any CFG can be converted into a PDA accepting the same language, and, vice versa, any PDA can be converted into a CFG generating the same language. I also explained that PDAs that accept by empty stack are equally powerful as PDAs accepting by final state. Deterministic CFLs are languages accepted, by final state, by deterministic pushdown automata. These are pushdown automata that in any state, for any possible input and stack content, can do at most follow one possible transition. For deterministic PDAs acceptance by empty stack would not be powerful enough. The regular languages are seen to be a subclass of the deterministic context-free languages, which are a subclass of the context-free languages.

I also showed how any CFG can be converted into Chomsky normal form. This yields an algorithm to decide whether a given string is generated by a given context-free grammar. And I introduced the concept of an inherently ambiguous context-free language.

Friday, April 19

Arun gave the second discussion session today. Here is his handout..

Aaron produced a handout that shows a conversion from a PDA to a CFG, available as dvi, ps and pdf.

Monday, April 22

I finished the treatment of context-free languages by explaining the pumping lemma, closure properties, and some easy algorithms and the absence thereof (see below, and compare with the ones for regular languages).

Covered material on context-free languages: Sipser Chapter 2 (all of it) and the concept of useless variables (not in Sipser). Also the easy algorithms below. Furthermore deterministic pushdown automata, and the notion of acceptance by empty stack.
This corresponds to HMU Chapters 5, 6 and 7, except Sections 5.3, 6.4.4, 7.3.1, 7.3.2, 7.3.5 and 7.4 (but the very beginning of 7.4.4 and 7.4.5. is treated),
and to HU Chapters 4, 5 and 6, except Sections 4.6-7, Theorems 6.2-3 and the end of Section 6.3.
Important topics:

Context-free grammars generate, and pushdown automata recognize, the same class of languages, called the context-free languages.
These languages are closed under union, concatenation, star and reversal, as well as intersection with a regular language. They are not closed under complementation and intersection.
The pumping lemma and the closure properties are the main tools for showing languages to be non-regular.
Parsing, ambiguity, inherently ambiguous languages.
There are easy algorithms for
- eliminating epsilon-productions and unit rules, and converting a CFG into Chomsky normal form,
- eliminating useless variables from a CFG,
- checking whether a pushdown automaton or CFG accepts a given string,
- and checking whether the language accepted by a given pushdown automaton or CFG is empty.
On the other hand, there are no algorithms whatsoever for
- checking whether two CFGs generate the same language,
- checking whether a given CFG is ambiguous,
- checking whether the intersection of two context-free languages (given by CFGs or PDAs) is empty,
- and checking whether the language accepted by a given pushdown automaton or CFG consists of all strings over the given alphabet (i.e. whether its complement is empty).

The midterm exam covers all the material on regular and context-free languages mentioned above.

CS154N

This part of the course deals with Parts Two and Three of the book.

Wednesday, April 24

I started off by introducing the concept of decidability without using Turing machines and Church's thesis, following this handout. Please read the handout carefully before the next lecture.

Friday, April 26

Elena gave the third discussion session today. Here are her notes.

Monday, April 29

I presented Turing machines, and their connection (through Church's thesis) with decidability. By the Church-Turing thesis a language L is recursive enumerable (called Turing-recognizable in Sipser) if and only if it is semi-decidable (or recognizable, or enumerable). In other words: if there is any algorithm that when given a string w as input accepts exactly when w is in L, then there is a Turing machine with that property. Likewise, a language L is recursive if and only if it is decidable. In other words: if there is any algorithm that when given a string w as input accepts exactly when w is in L and rejects otherwise, then there is a Turing machine with that property. Church's thesis is not a mathematical theorem, but a reflection on the inability of computer scientists since 1936 to come up with any algorithm whatsoever that cannot be translated into a Turing machine.

I showed that one-tape Turing machines are equivalent to multitape Turing machines, and mentioned that there are many variations of Turing machines of this nature.

Wednesday, May 2

I showed that nondeterministic Turing machines are equally expressive as deterministic ones. Thereby Chapter 3 of Sipser is covered entirely.

I also introduced Grammars, which are generalizations of context-free grammars, not covered in Sipser. A language is generated by a grammar if and only if it is recursive enumerable.
Chomsky hierarchy

Regular languages Context-free languages Context sensitive languages Recursive enumerable languages

Regular expressions Context-free grammars Context sensitive grammars unrestricted grammars

NFA PDA LBA nondeterministic Turing machines

DFA (DPDA are weaker) (DLBA are weaker) Turing machines

I completed the treatment of the Chomsky hierarchy (depicted above, and originally proposed as four candidates for modeling natural languages) by defining the context sensitive languages. These are generated by context sensitive grammars, and can be recognized by nondeterministic Turing machines that use no more space than the length of the input string. Such machines are called Linear Bounded Automata.

I presented the concept of an enumerator and showed that a language is enumerable if and only if it is semi-decidable (see last week's handout).

I defined the notion of cardinality and showed that there are equally many rational numbers (fractions) as natural numbers (non-negative integers): the rational numbers are countable. However, there are more real numbers than natural numbers: the reals are uncountable. This I showed with the method of diagonalization. The same method also yields that there are countably many algorithms, but uncountably many languages. Thus there must be languages whose membership cannot be semi-decided by means of any algorithm. In fact one such language was constructed last week already, in the handout on decidability.

I touched upon the notion of reducing one problem to another, and used it to prove that the acceptance problem is undecidable. In terms of general algorithms this is treated in my decidability handout; in terms of Turing machines in the textbooks. Chapter 4 of Sipser is now considered completely covered, as well as the very beginning of Chapter 5.

Thursday, May 3

In view of the midterm exam on Friday afternoon, there will be no discussion session on Friday morning. Instead, there will be a midterm review session on Thursday morning, presented by Aaron, from 9:30 to 10:20am in McCullough 115, live on channel E1.

Monday, May 6

Fix a suitable alphabet

, and consider languages over alphabet

. The acceptance problem for Turing machines, AP_TM is the language

AP_TM = { (<M>,w) | Turning machine M accepts input string w}

where <M> denotes the description of M as a string over alphabet

. Because this is such an important issue I once more reviewed the undecidability of AP_TM. The language

SELF-REJECT_TM = { <M> | Turning machine M does not accept input string <M>}

can be seen to be not Turing-recognizable by means of a direct paradoxical contradiction, or by means of a diagonalization argument. Hence the language

SELF-ACCEPT_TM = { <M> | Turning machine M accepts input string <M>}

must be undecidable, using closure of decidability under complementation and intersection, and Church's thesis. Now AP_TM is undecidable because the language SELF-ACCEPT_TM can be reduced to AP_TM, by substituting <M> for w.

It is important to apply reductions in the right direction (see above)! I showed that it is undecidable whether a given Turing machine accepts the empty language, by reducing the acceptance problem (which is known to be undecidable) to it. This is a special case of Rice's theorem, which says that any nontrivial property about the language recognized by a Turing machine is undecidable. I proves Rice's theorem, which is not covered in Sipser, using this handout.

Wednesday, May 8

Today I proved the undecidability of the emptiness problem for linear bounded automata, and of the Post correspondence problem. I didn't finish the latter, but you should read the rest of it in the book.

Friday, May 10

Aaron gave today's section. Here are his notes: page 1, page 2, page 3, page 4, page 5 & page 6.
Or smaller versions if they don't fit your screen: page 1, page 2, page 3, page 4, page 5 & page 6.

Monday, May 13

I finished the treatment of the Post Correspondence Problem. Then I introduced computable (or recursive) and partial recursive functions and explained the difference between mapping reducibility and Turing reducibility.

Subsequently I'll treated the undecidability of the empty intersection problem for context-free grammars (not covered in Sipser). Consult this handout for the undecidability of the ambiguity problem for CFGs.

During this course I plan to cover also a few more advanced topics in computability (or decidability) theory; however, these will be postponed until after the treatment of complexity theory. This because it is very important that you get the homeworks on complexity theory back well before the final exam.

Covered material on computability theory: Sipser Chapters 3, 4, 5 (except for the proof of theorem 5.10 (but the statement is covered)), and 6.3, as well as the handouts linked below.
This corresponds to HMU Chapters 8 and 9, except Sections 8.5.3, 8.5.4 and 9.5.3, plus the handouts.
Important topics:

Decidable, recognizable and enumerable languages; recursive, Turing-recognizable and recursive enumerable languages; the relationships between these classes; Church's thesis.
The decidable languages are closed under union, intersection and complementation; r.e. languages under union and intersection, but not complementation.
The existence of unrecognizable languages, and of recognizable but undecidable languages.
Diagonalization. Countability.
Turing machines; configurations; multitape TMs; the equivalence of deterministic and nondeterministic TMs.
Unrestricted and Context-sensitive grammars. Linear Bounded Automata .
Computable (or recursive) and partial recursive functions.
The method of reduction. Mapping reducibility and Turing reducibility.
Undecidability of the acceptance problem for TMs, the halting problem, whether an algorithm halts on all inputs, the emptiness problem for TMs, the emptiness problem for LBAs, Post Correspondence Problem, ambiguity of CFGs, and empty intersection of CFGs.
Rice's theorem.

Important topics to be treated later on:

n-adic and n-ary representation of natural numbers.
The undecidability of provability in first order logic and truth in arithmetic.
Goedel's incompleteness theorem and the class of arithmetical languages.

Monday, May 13 & Wednesday, May 15

I introduced, for every function f(n) on the length n of the input string, the complexity classes TIME(f(n)), NTIME(f(n)), SPACE(f(n)) and NSPACE(f(n)). These give worst case upperbounds for the time (measured in steps) and the space (measured in tape cells) it takes to solve a problem on a deterministic, respectively nondeterministic, Turing machine. I also discussed the relations between these classes, and defined the classes P, NP, PSPACE and NPSPACE. Details are in the handout complexity classes.

Monday, May 20

Giving an example of a polynomial-time algorithm, I showed that the problem PATH is in P (even though the most simpleminded algorithm is not in P).

I explained that the class NP can be seen as the class of problems with polynomial time verifiers, and showed that the problem CLIQUE is in NP. Other examples of problems in NP can (and should) be found in the book. It is a famous open problem whether P = NP. Most computer scientists think this is not the case.

I introduced the concept of a polynomial time reduction. Then I explained the notion of NP-completeness and mentioned that SAT and even 3SAT is an NP-complete problem (the first one ever found, by Cook). Using this, I showed that also CLIQUE is NP-complete, by providing a polynomial time reduction from 3SAT to CLIQUE. In the book there are many more NP-complete problems, which are assigned reading (although you do not have to know hard proofs like the NP-completeness of HAMILTONIAN PATH). Showing NP-completeness is a skill that is a vital part of this course, and you should practice on your own to get the necessary experience.

Wednesday, May 22

I showed that the class of context-free languages is included in P (Sipser Thm. 7.14).

Then I proved Savitch's theorem, saying that if f(n) log(n) then NSPACE(f(n)) SPACE(f(n)²). I also recalled that if a language is in NTIME(f(n)), then there is a constant d such that it is in TIME(d^f(n)). Thus, whereas converting nondeterministic machines into deterministic ones costs us an exponential in time, it costs only a square in space. It follows that the classes PSPACE and NPSPACE coincide.

I discussed the hierarchy of complexity classes

L NL P NP PSPACE = NPSPACE EXPTIME EXPSPACE.

The concept of a polynomial-time reduction can also be used to define PSPACE-completeness, EXPSPACE-completeness, etc., and likewise NP-hardness, PSPACE-hardness, etc. (A problem is X-complete iff it is X-hard and in class X.)

Friday, May 25

Arun gave today's section. Here is his handout.

Monday, May 27 is labor day: no class

Wednesday, May 29

I showed why SAT and 3SAT are NP-complete. Then I presented the TQBF (truth of quantified boolean formulas) problem, which is complete for PSPACE. I also showed how the generalized geography game can be proven PSPACE-complete by means of a polynomial time reduction from TQBF. Then I introduced log-space reductions, and told that the PATH problem is NL-complete.

Friday, May 31

Elena gave today's section. Here are her notes.

Monday, June 3

Today I showed the NL-completeness of PATH. Subsequently I reviewed the hierarchy of complexity classes. That NL

P follows because PATH is in P, and is NL-complete, and any log-space reduction is a polynomial time reduction.

For any class X there is a class co-X of complements of problems in X. When X is defined in terms of deterministic Turing machines, trivially co-X = X. It is an open problem whether NP = co-NP. Surprisingly, co-NL = NL. This follows because PATH is in NL (the proof of which has been skipped).

Subsequently I presented, without proof, the time and space hierarchy theorems that say that for time constructible functions such that one is larger than the other by more than a logarithm, the corresponding time complexity classes are different. Likewise, for space constructible functions such that one is larger than the other by more than a constant factor, the corresponding space complexity classes are different.

Finally, I argued that through a suitable encoding of natural numbers as strings, the concepts of decidability and enumerability apply not only to languages, but also to sets of natural numbers. It doesn't matter which encoding is chosen, because there must be an algorithm to convert one encoding into an other. When the encodings are reasonable, such as the standard n-ary or n-adic representation of numbers as strings, one encoding can be retrieved from another in polynomial time (or better), so even most complexity classes are well-defined for sets of natural numbers.

I explained that the n-adic representations yields a bijective correspondence between strings over an n-letter alphabet and natural numbers, and pointed out the difference between n-ary and n-adic representations.

Covered material on complexity theory: Sipser Chapters 7 (except for Theorem 7.13 and the proof of Theorem 7.35), 8 (except for the proofs of Theorem 8.8 and 8.22 (so the statements of these theorems are covered)) and 9.1 (except for the proofs of Theorems 9.3, 9.10, and 9.15), as well as the handout on complexity classes.
This corresponds to HMU Chapters 10 and 11.1-3, plus the handout. The material on the classes L and NL is not in HMU; students who are not in the possession of Sipser should collect a copy of those parts.
Important topics:

The complexity classes TIME(f(n)), NTIME(f(n)), SPACE(f(n)) and NSPACE(f(n)), and their relationships.
The classes L, NL, coNL, P, NP, coNP, PSPACE, NPSPACE, EXPTIME and EXPSPACE and their relationships.
The relationship between the classes of regular, context-free, and context-sensitive languages and the classes above.
Polynomial time reductions and the notions of NP-completeness, PSPACE-completeness and EXPSPACE-completeness.
Log-space reductions and the notion of NL-completeness.
The problems SAT, 3SAT, Clique, directed and undirected Hamiltonian path, subset-sum, and node-cover (or vertex-cover) are NP-complete; PATH is NL-complete and in P. TQBF (even in prenex conjunctive normal form) and the geography game are PSPACE-complete.
The time and space hierarchy theorems.

Wednesday, June 5

Today I introduced a class of languages that is even larger than the recursive enumerable languages, and includes all languages that have been mentioned as examples or counterexamples at any time in this course so far. I would have liked to introduce a class of definable languages, such that any language that can be described in words is definable. The Post Correspondence Problem (and language) for instance is not recursive enumerable, but surely definable, because the definition is in the book. In fact it is impossible to mention an example of an undefinable language, as any way of mentioning it involves or constitutes a definition. Nevertheless, as definitions can be put into words, there can only be countably many definable languages, whereas the set of all languages is uncountable. Thus most languages are not definable, even if we cannot point at any example.

Sadly, the above plan is bogus, because the concept of definability is ill-defined. The best one can do is to define particular tools for defining languages, and consider classes of languages that are definable with those tools. An extremely fruitful tool is first-order arithmetic. One defines the arithmetical languages as those that can be defined by arithmetical means. This turns out to be a very large class that surely contains every language that has been contemplated in this course so far.

I proved that the class of arithmetical languages at least contains all recursive enumerable languages, by showing that the behavior of any Turing machine can be encoded in terms of first-order arithmetic. This also yielded the undecidability of arithmetic, and Goedel's incompleteness theorem.

Thursday, June 6

There was a final exam review session, presented by Aaron, on Thursday morning from 10:00 to 10:50am in McCullough 115, live on channel E4. Do read his notes with tips on how to prepare for the exam: page 1, page 2, page 3, page 4, page 5 & page 6. Here are a few exam-level reduction problems (also as ps and pdf) for you to try.

Rob van Glabbeek

rvg@cs.stanford.edu

Regular languages	Context-free languages	Context sensitive languages	Recursive enumerable languages
Regular expressions	Context-free grammars	Context sensitive grammars	unrestricted grammars
NFA	PDA	LBA	nondeterministic Turing machines
DFA	(DPDA are weaker)	(DLBA are weaker)	Turing machines