CS154, a class about "languages"

This class is about languages, where a language is simply a finite or infinite set of strings, or words, and every word is a finite sequence of letters, taken from a finite alphabet.

Example: Take as alphabet all symbols that appear on one of the webpages in this directory "handouts", including spaces and linebreaks. Each webpage in this directory can now be regarded as a word, and the set of all those words (representing the webpages in this directory) form a language. This languages is rather small, as it has only about 10 words in it. However, most of those words are hundreds of characters long.

Example: An arithmetic equation is something like 3x+4xy=7y3+5. The precise definition is not important here. A solution of a such an equation is an assignment of natural numbers (= non-negative integers) to the variables, such that the equation holds. I have no idea whether the equation above has a solution or not (do you?). However, we can conceptually distinguish the equations that have a solution, from those that don't. This enables us to define the language L of all those arithmetic equations that have a solution. The alphabet to be used here must be rich enough to express arithmetic equations. As a number like 4 can be expressed as (1+1+1+1), it suffices to have only the symbols 0, 1, +, *, (, ), and = in the alphabet. The way we normally write arithmetic moreover allows to suppress the multiplication symbol *. Now an arithmetic equation is surely a word over this alphabet, and the set of all arithmetic equations that have solutions is a language. It is in fact an infinite language, as there are surely infinitely many arithmetic equations with a solution. It may be a difficult problem to find out whether a given word is in this language or not.

Example: Another language is the set of prime numbers, written in decimal notation. Here the alphabet consists of the 10 digits of the decimal system, and each number is a word over that alphabet.

In this class we shall investigate whether it is possible at all for a given language to find out if a given word belongs to it or not, and if it is possible how hard it will be. These constitute the fields of decidability theory and complexity theory, respectively.

What we really want to do is to find out which problems can be solved in general, and for those problems that can be solved, how hard it is to solve them. In order to make these questions more precise, we encode problems as languages.

In this course we concentrate on problems that are parameterized questions with yes/no solutions, like "is a given number prime?". Questions that have an essay solution, like "how much is 4+7?" can be treated in the same way, but for simplicity we focus on yes/no questions here.

We deal with parameterized questions, such as "given a number, is it prime?". The parameter is the given number. A parameterized question can be encoded as a language by considering the language of all values of the parameter for which the question has a positive answer.

Unparameterized questions illustrate a philosophical problem that we will not solve in this class. Consider the question "is the number 546021461420847567 prime?". All we ask in this class is whether it is possible to give the answer to the question, and if so how hard that it. In this course we would measure the hardness of a question by how long a suitable computer would need to calculate to find the answer, or how much memory it would use. In this example, we can immediately conclude that it is extremely easy to give the answer. For we could carve a suitable computer out of a boulder by displaying to word "no" on it (or "yes" if the number turned out to be prime). The resulting boulder would by the definitions of this course count as a perfectly acceptable computer that is guaranteed to find the right answer to that very question. Moreover, it would use 0 units of time and 0 units of memory to obtain the correct answer each time the question would be asked to it (although it would perform poorly on many other questions). Thus we are bound to conclude that the question is ultimately easy to answer.

Naturally that judgment is deemed objectionable, but in the setup of decidability theory and complexity theory unavoidable. If we would like to express that the question above is in fact more difficult that the boulder-computer suggests, one of the arguments that could be convincing is that that computer knew the answer in advance, as witnessed by the fact that it cannot solve a similar question with a different number. But this amounts to changing the problem into a parameterized question, that asks whether a given number is prime without actually giving the number. A computer that can solve that question would need considerably more time and memory to find the right answer than the boulder above. And that amount of time and memory would be a better measure of the complexity of the original question as stated above. It is for this reason that we consider parameterized questions only.


Rob van Glabbeek rvg@cs.stanford.edu