WrittenAssignment2Solns

This page is ready. It contains solutions for each problem as well as explanations and grading standards. The standards for each problem are at the end of the solution for that problem. Overall grading information is at the end of the file. Ideally, your peer grading assignment should be turned in to the bin outside AC312, inside an envelope addressed to LAS, by noon Monday.

Contents

Problem 1: Scheme
Problem 2: Sort analysis
Problem 3: Asymptotic analysis
Problem 4: Hashing
Problem 5: Finite State Automata
Notes on grading

Problem 1: Scheme

There are variants to these solutions, but not so many. The test, of course, is to see whether your code runs on the pset examples. Be careful to preserve parentheses when you type these in.... (Graders should feel free to run scheme to help themselves, but be careful that you type exactly what you've been given.)

Write a scheme procedure make-multiplier that takes one number as an argument and returns a procedure that multiplies its argument by the original number. The clearly spelled out clean solution to this problem looks like this:
```
(define make-multiplier
  (lambda (x)        ;; this is the lambda that IS the make-multiplier procedure
    (lambda (y)        ;;this is the lambda that is RETURNED when you invoke (call) the make-multiplier procedure
      (* x y))))          ;; it doesn't matter whether you (* x y) or (* y x), of course!
```
Or, using the syntactically sugared define/lambda abbreviation, here's the same solution
```
(define (make-multiplier x)   ;; this one hides the lambda in the sugared define
  (lambda (y)        ;; but the lambda that is RETURNED is not optional
    (* x y)))          ;; and the multiplication is only done when that second lambda is called
```
There may be other forms that are correct, but most of them are minor variants on the above. Be very careful about opening parens -- lisp parens make a big difference, so it's important to have them right -- but it's OK to be a bit more lenient on close parens as it's harder to count these in handwritten work. Extra lambdas or defines are probably a good indication that something's wrong here....
Write a scheme procedure, apply-to-5, that takes one argument and returns the result of applying that argument to 5. Again, beginning without syntactic sugar:
```
(define apply-to-5        ;; apply-to-5 is a
  (lambda (proc)          ;; procedure that takes one argument
    (proc 5)))           ;; and applies its argument to 5
```
Or, syntactically sugared,
```
(define (apply-to-5 proc) (proc 5))   ;; the argument had better be something that can be applied to 5!
```
Again, there aren't too many variants that are correct here. Again, watch (open) parens carefully. Run scheme if you really want to be sure.
What should the result of (apply-to-5 (apply-to-5 make-multiplier)) be? If we did our work correctly, the answer can be found by running these procedures:
```
> (apply-to-5 (apply-to-5 make-multiplier))
25
```
Of course, we could also figure it out by thinking about it: (apply-to-5 make-multiplier) should be 5-times-er procedure (I'm sure that there's an obscure Latin word for that), and applying a 5-times-er-procedure to 5 should yield 5², or 25.

Grading standard: There is one grade for the three parts of problem one combined. All three parts correct earns +. One wrong is . All wrong is -. Other cases use your best guess or leave comments explaining that it doesn't fit this pattern and where you think it should go (or that it's on the borderline and why). More importantly, make sure that the person whose work you're grading understands what is wrong and why.

Problem 2: Sort analysis

This problem mostly asks for formulae and only somewhat for justification. Of course, a good justification can help obtain credit for even an incorrect answer...

Consider an optimization of merge sort in which the original list of length n is divided into disjoint sublists of length k, these lists are sorted using insertion sort, and then the n/k sublists of length k are merged (pairwise, successively, as in merge-sort) until the final sorted list of length n is achieved.
1. How many steps will it take to sort each of the sublists of length k? Give your answer in asymptotic (Theta, like big-Oh) notation. Indicate whether your analysis is best, worst, or average case.
  Insertion sort is (worst and average cases) Theta(n²). SInce our input is of size k, this is Theta(k²). (Best case, i.e., assuming the sublists were magically sorted, would be Theta(k) to verify. But it ain't gonna happen.)
  NB either worst or average case anaysis is sufficient. Use of O rather than Theta should be noted but not penalized.
2. How many steps will it take to sort all of the sublists of length k? Again, use Theta and indicate what you're analyzing.
  Saying that sorting a sublist is Theta(k²) means that there is some constant c for which c k² is a good estimate of the number of steps. (Well, technically it means that there are two constants, c₁ and c₂, and the actual number of steps steps is bounded by
  - c₁ k² <= steps <= c₂ k²
  but I'm going to cheat and just use c so I don't have to keep writing the double bounds.)
  Since there are n/k sublists, each taking c k² steps, this is n/k * c k² = n c k² / k = c n k, which is Theta(nk). (For those who want the gorey details, this argument can be made twice with c₁, <=, c₂, and >= to really prove Theta. I will be satisfied with any reasonable approximation of the derivation, including waving hands madly in the obviously right direction or stating the correct answer by fiat.) This is again a worst or average case. (They're the same here, as the difference is just in the expected constant if the list isn't pessimally organized.)
3. Once the n/k sublists of length k are individually sorted, how long will it take to merge them all back together? Again, Theta and case....
  Now that the sublists are sorted, we do a standard merge on them. This will take log₂ n/k passes, each of size n, for a bound of n log₂ n/k. That is, the (worst, average) case is Theta(n log (n/k)).
  To see this, consider the merging problem: We start with n/k little lists and merge them pairwise, generating n/2k somewhat larger lists. This progression is building up the binary tree from the little lists at the leaves to the whole big merged list at the root (which is of course on top because this is an upside-down computational tree...). We know that this tree has depth log₂ (n/k), so there are log₂ (n/k) passes. Each pass touches each of the n elements exactly once, when it's being put into place in its own merge-list, so each pass is Theta(n). Theta(log (n/k) ) passes of size Theta(n) yields a running time for this phase of Theta(n log (n/k)).
4. Given your analysis and what you know about merge sort, how should we choose k (as a function of n) so that this algorithm is asymptotically (Theta-notation) comparable to merge sort?
  This would be a good time to remind ourselves that merge sort is Theta(n log n). Before we answer this question, we need to take a minute to put the algorithm back together. The total algorithm has two basic steps:
  1. Sort the little lists.
  2. Merge the sorted little lists into one big sorted list.
  The time for this algorithm is Theta(nk) + Theta(n log (n/k)). Note that, without knowing what k is, we don't know which term dominates.
  For example, if we set k=1, this is Theta(n*1) + Theta(n log (n/1)) = Theta(n) + Theta(n log n) = Theta(n log n), i.e., the n log n term dominates. This is asymptotically comparable to merge sort, which isn't particularly surprising because it is merge sort.
  If on the other hand, we choose k=n, we get Theta(n*n) + Theta(n log (n/n)) = Theta(n²) + Theta(n) = Theta(n²), with the nk (now n²) term dominating.
  The question, then, is how to pick k so that we get the most improvement -- make the n log (n/k) term small -- without doing worse than merge sort. If we don't worry about constants -- which is the next question (and also this one specifically says "asymptotically comparable") -- we can observe that the Theta(n log (n/k)) term will never do worse than Theta(n log n), which means that we can safely ignore it in picking k. That means that we just need to to ensure that the other term -- Theta(nk) is better than Theta(n log n).
  Happily, Theta(nk) <= Theta(n log n) will be true as long as k <= log n.
5. Merge sort typically has a higher constant factor on its (asymptotic) running time than insertion sort. How could this be taken into account in selecting k? Oops, here come those pesky constants.
  What the above analysis really means is that there are some constants (again, I'm only going to use one for each Theta rather than constant above and constant below) for which our new algorithm runs in c_insert n k + c_merge n log (n/k) steps. The question points out that c_merge >> c_insert, i.e., merge sort has a much bigger constant.
  Any time that we can make k bigger, it helps the c_merge n log (n/k) term, since it reduces the number of merges. And making k bigger isn't so unreasonable in the first term if c_insert << c_merge. We want
  - c_insert n k < c_merge n log n
  i.e., c_insert k < c_merge log n
  or k < (c_merge/c_insert) log n
  As long as we adhere to this constraint, we're going to be improving the running time of mergesort by using this algorithm. Since c_merge >> c_insert, this could be a large constant multiplier on the value of k.

Grading standard: These problems can't as easily be graded separately. A + answer to this question would have all of the values correct in parts a through c and some reasonable (mis)interpretation of Theta, O, best/worst/average case. It would also give a correct or extremely reasonable if different analysis of the value setting for k in parts d and e. (The analysis in d should where the leverage comes in and/or the relationship between choices of k and relative costs of the two phases. The analysis in e need only say that the high mergesort constant gives you additional leverage, though for a + it'd be nice to see insight into why.) A answer gets the general gist of the problem and the different components of the new algorithm, i.e., it needs to say something about Theta(nk + n log(n/k)). A - answer is one in which the analysis is difficult to connect to the algorithms as presented here or in which substantial parts of the problem are left blank. Also, note that a best case analysis would be acceptable but should come to a total running time of Theta(n + n log(n/k)), which argues for maximizing k, i.e. using insertion sort alone. This is unrealistic, but a reasonable misinterpretation of the problem.

Problem 3: Asymptotic analysis

Argh! I had to type these in after all!

Notation notes:''

sqrt(n) is square root of n
lg n is log₂(n), but any reasonable assumption here is fine. The base turns out not to matter....'
log n means the base doesn't matter. Again, any reasonable assumption is fine.
<= is less than or equal to; >= is greater than or equal to.
min(x,y) is whichever of x and y is smaller. For example, min(3,n) is n for all n < 3 and 3 otherwise.

The formula for big-Oh (sometimes spelled big-O, but I hear that's an anime character) is:

f(n) = O(g(n)) iff there exists a constant c such that for all n > n₀ f(n) <= c * g(n)

The formula for Theta is:

f(n) = Theta(g(n)) iff there exists constants c₁ and c₂ such that for all n > n₀ c₁ * g(n) <= f(n) <= c₂ * g(n)

Here we go...

Which of the following are true and which are false, and why?
1. sqrt(n⁵) is O(n ²)
  False. sqrt(n⁵) is n^5/2 which grows faster than n². In more detail:
  What is the ratio between sqrt(n⁵) and n²?
  n^5/2 / n² = n^{5/2 - 2} = n^1/2 = sqrt(n), i.e., sqrt(n⁵) = sqrt(n) * n².
  But sqrt(n) grows faster than any constant c. So there can be no constant c for which sqrt( n⁵ ) <= c n²; c would have to be bigger than sqrt(n) for all n.
  n-technicality: Actually, this should read "c would have to be bigger than sqrt(n) for all n > some n₀." But since sqrt(n) increases as n does, ignoring small values isn't going to help.
2. lg(n³) is O(n log n)
  True. log₂(n³) = 3 log₂(n), which is clearly less than n log₂n for all n > 3.
  Since we're allowed to ignore small values of n (the definition has to hold for all n > n₀, where we get to pick the n₀), this works.
  Of course, the right hand side doesn't say O(n log₂n); it says O(n log n). This turns out to mean that the base doesn't matter, but let's show this by assuming it actually means O(n log₁₀n):
  Since log₁₀n is smaller than log₂n, we have to show that this doesn't cause the argument above to fail.
  By the properties of logarithms, log₁₀n = log₂n / log₂10. This means that there's a constant -- log₂10 -- that I can use to make the formula go through. Specifically, I'm going to use this as the c in the formula for big-Oh:
  log₂(n³) <= log₂10 * n log₁₀n for all n > 3.
3. sqrt(n) lg sqrt(n) is O(n)
  True. sqrt(n) * sqrt(n) would certainly be <= n. But lg sqrt(n) = log₂sqrt(n) < sqrt(n) for all n > 1. So sqrt(n) * lg sqrt (n) < n.
Which of the following are true and which are false, and why?
1. 2/n + 4/n² is Theta(1/n) True. For Theta proofs, there are two inequalities -- the bound above, or big-Oh, and the bound below, or Omega -- that we have to prove. Let's start with big-Oh:
  To see how 2/n + 4/n² grows relative to 1/n, we can divide the first by the second:
  (2/n + 4/n²) / (1/n) = n * (2/n + 4/n²) = 2 + 4/n < 3 as long as n > 4. So the difference in growth is bounded by a constant factor and the big-O side goes through with n₀ = 4 and c = 3:
  for all n > 4, 2/n + 4/n² <= 3 * 1/n
  Also, as long as n > 1, 1/n < 2/n + 4/n². 1/n < 2 * 1/n = 2/n, and since 4/n² > 0, 2/n < 2/n + 4/n². This proves the bound below (Omega) -- we can use n₀ = 4, as above, and c₁ = 1 -- and we're done.
2. n log₁₀n is Theta(n log₂n)
  True. Actually, we sort-of did this in 22.b., above, but here's a full explanation. Again, for Theta we need to go through a bound above -- big-O -- and a bound below -- Omega. We'll look at n > 1 (i.e., take 1 as our n₀) just to keep things simple.
  n log₁₀n = n * (log₂n / log₂10) = 1/log₂10 * n log₂n
  So taking c₁ < 1/log₂10 -- say 1/4 -- and c₂ > 1/log₂10 -- say 2 -- we get
  for all n > 1, 1/4 n log₂n <= n log₁₀n <= 2 n log₂n
3. log₂ sqrt(n) is Theta (log n)
  True. This is because log₂ sqrt(n) = log₂ n^1/2 = 1/2 log₂ n, i.e., the two functions are directly related by a constant factor. (We are taking advantage of the unspecified base in Theta (log n), but we know that we can change to any base using the constant factor trick in the previous problem.
4. sqrt(log₂ n) is Theta (log n) False, thereby demonstrating that the order of function composition matters.
  Before beginning the analysis, we observe that log n = c log₂n, i.e., no matter what base logarithm we start with, we can switch to base 2 by multiplying by the appropriate constant. (If we start with log_kn, c = 1/log_k2 = log_k2) Now, we use the division trick to find the growth ratio between these two functions:
  (log n) / (sqrt(log₂ n)) = c (log₂ n) / (sqrt(log₂ n)) = c sqrt(log₂ n)
  This means that log n grows faster than sqrt(log₂ n) by a factor of c sqrt(log₂ n). This factor is not constant; it increases as n increases. So we will never find a constant that we can use to keep log n smaller than (a constant multiple of) sqrt(log₂ n); it just grows too fast.
  Note that this does not prevent us from saying that sqrt(log₂ n) is O(log n); in fact, log n is a nice, big, even too-big upper bound for sqrt(log₂ n). But Theta requires that it also be a bound below, and we just can't fit log n below sqrt(log₂ n).
5. min(700, n²) is Theta (1)
  True. Perhaps the easiest way to show this is to consider n₀ = sqrt(700):
  for all n > sqrt(700), min(700, n²) = 700, so 1 <= min(700, n²) <= 700 * 1
  We actually don't need to use such a large n₀, of course, because
  for all n, min(700, n²) <= 700 -- that's the definition of min -- so as long as we can find a constant smaller than n² for the lower bound (and that's easy as long as we only consider n >=1), we're all set.
Prove that if f is O(g) and g is O(h) then f is O(h). What are the n₀ and c for f and h, in terms of those for f and g, and g and h?
- We have f is O(g), i.e.,
  - there exists c_fg such that for all n > n_0fg, f(n) <= c_fg g(n)
  Similarly, since g is O(h), we know that
  - there exists c_gh such that for all n > n_0gh, g(n) <= c_gh h(n)
  This means that for sufficiently large n,
  - f(n) <= c_fg g(n) <= c_fg * c_gh h(n)
  giving us a c_fh = c_fg * c_gh. But what is n_0fh, i.e., how large does n need to be? It simply needs to be at least as large as n_0fg and n_0gh. So n_0fh = max(n_0fg, n_0gh)
  NB I misadvised someone on n_0fh. I know who it was, I apologize, and I will take that into account.

Grading standard: Ultimately, all of Problem 3 should receive only one grade, but you may find it easier to grade individual pieces and then look at the overall picture. A + solution for a particular question has the right (true/false) answer and a reasonable justification, i.e., you understand and believe the argument. It does NOT need to be spelled out in as much detail as the solutions I wrote. A solution either has the right answer but no/minimal/you can't figure it out or don't believe the argument justification OR the wrong answer but a plausible argument (that presumably contains a small error). A - solution doesn't have the right answer and doesn't have a reasonable justification for it OR is simply missing. For the proof, a + answer should look a lot like mine (more words and a little less formal is OK, but it should definitely look like a proof); a answer should have the basics correct but might err on the constants, etc., or be an explanation, not a proof) and a - solution is one that doesn't have the basics correct. For the problem as a whole, a + is earned by earning a + on at least seven of the pieces and no - s; a - is earned by having no more than one + and at least five - s, and otherwise give it a . Again, the most important thing is that the student whose work you are grading understand why/where you're having problems with the answer given.

Problem 4: Hashing

I can't believe I'm drawing wiki-pictures!

Draw a picture of a hash table with 8 buckets using closed hashing and the hashing function mod 8 after inserting the following data: 1, 3, 9, 5, 18, 41

(0)

(1)

-->

1

->

9

->

41

(2)

-->

18

(3)

-->

3

(4)

(5)

-->

5

(6)

(7)

The bin numbers (in bold italics and parentheses) are neither necessary nor traditional, but I was worried enough about how this would come out on the wiki to include them. Given how poorly the table renders, this may be a Good Thing. (Any reasonable linked list notation is fine.)
Draw a picture of a hash table with 8 buckets using naive open hashing and the hashing function mod 8 after inserting the following data: 1, 3, 9, 5, 18, 41

(0)

(1)

1

(2)

9

(3)

3

(4)

18

(5)

5

(6)

41

(7)

Again, the bucket numbes are neither necessary nor traditional but are provided to make the picture legible. The existence of all eight buckets, though, is actually a Good Thing.
Leaving the pun aside, would the hash function mod n be a clever choice if we want to build a hashtable of Olin roommate pairs (i.e., using your room numbers as keys and storing two student data at each node? If so, for what value(s) of n and why? If not, suggest a better hash function. The answer that I was expecting goes something like this:
- Because Olin room numbers (outside the mod) use the first digit as a room number and the remainder repeats from floor to floor, you're going to have a lot of collisions no matter what number you use. (You'd expect 4 roommate pairs in each bin, just from the rooms stacked on top of one another in East Hall.)
The ideal hash function maps the range of keys into a table size that will hold the number of items comfortably. The number of items in a hashtable divided by its capacity is called its load factor. In an open hashing algorithm, the load factor should ideally be between .5 and .8; you want the table to be more than half full, but not so full that every new insertion causes a collision. In a closed hashing scheme, the load factor should be over 1 -- every bucket should be in use -- but not by too much; keeping it under 2 is a good idea.
At Olin, there are currently about 120 occupied rooms, so we'd like to use a table with between 60 and 240 buckets (60-120 or so for closed hashing, 150-240 for open hashing). The highest numbered room is 430-something. (Note to self: Next time get room numbers first!) This means that we're going to want a hash function that collapses the range of room numbers considerably. At the same time, the Olin room number scheme is not uniform over its range -- rooms have low numbers mod 100, and the high numbers mod 100 are unused -- and we want our function to spread it out. One possibility is to collapse the range, making the 2d floor numbers follow the 1st floor numbers immediately, etc. A function that would do this is
- (roomNumber mod 100) + ( 40 * floor ( roomNumber / 100) )
with an optional mod hashtablesize thrown in for the open hashing case. Another possibility is to use some function totally uncorrelated to the room number.

Grading standard: The first two parts of this question are fairly straightforward and, with the exception of mixing up which is open and which closed hashing, a + answer would be expected to have these completely correct. The third part was a bit of a wild card and a + answer should really demonstrate good understanding of the need for a hash function to spread your data out. My answer (the thing in italics) would be fine except that it lacks a suggestion. Any reasonable suggestion (i.e. something that fixes the issues) will do. My long explanation of hashing background is way more than I expect from a student answer. Also, although the expected answer is no, I might give a + to a "yes" answer if it were cleverly argued, but it would have to demonstrate real understanding. Blindly asserting that powers of two are bad or primes are good is more of a answer as those facts are true in general of hashtables but aren't particularly relevant here. A answer needs to get close on the first two parts and have a reasonably relevant answer to the third. - is reserved for those answers that are substantially incomplete or indicate significant gaps in understanding of hashtables.

Problem 5: Finite State Automata

I am including links to JFLAP until I find a tool that lets me treat these as images. Note that my solutions are generally minimal, i.e., you can't have a correct answer with fewer states. You can, of course, have a correct answer with more states....

Draw state diagrams for deterministic finite automata accepting each of the following languages:

{ w in {a,b}* : each a in w is immediately preceded by a b}
DFSM and an equally correct variant DFSM that doesn't show the fail state explicitly
In this FSM, the empty string is acceptable, so the start state is also an accepting state. From the start state, it's OK to see a b but not an a. (a takes us to q3, the fail state.) We might call q0 "waiting for b". b has to take us out of q0, though, because after b we can accept either a or b, and q0 goes to fail on a. Once we're in q1, we can be done -- q1 is also an accepting state -- or we can handle either a or b, so q1 might be called "seen b, anything's OK". The b leaves us where we are, but the a has to put us in a can't accept a state. Fortunately, we have one already set up for this: q0.
- state
  
  nickname
  
  q0
  
  waiting for b
  
  q1
  
  seen b, anything's OK
This FSM must not allow a or any strings that contain aa as a substring. It must allow the empty string as well as any strings containing arbitrary numbers of bs in any position and strings containing multiple as as long as they are separated by bs. This FSM should accept strings beginning with a or b and strings ending with a or b. It must be deterministic.

{ w in {a,b}* : w has abab as a substring}

DFSM

This FSM should allow abab and any prefixes or suffixes. It basically works by progressing from q0 to q4 as it sees the four symbols in the required substring. This is the forward path that zig zags through the center of the FSM. At any point along this path, if it doesn't see the correct next symbol from the substring, it goes back. If it sees a b in the wrong place, it goes back to the start state -- q0 -- but if it sees an a out of sequence, it goes back only to q1 as that amight be the beginning of the abab that it's looking for.

state	nickname
q0	start; havent seen (a plausible) a yet
q1	just got an a
q2	just got ab
q3	just got aba
q4	got abab, happy and don't care what I see now

I don't think that there are too many weird cases, but I'm sure that some of your solutions will prove me wrong. (Actually, going back to q0 on a is a pretty straightforward error to make, though not a + error.) The most important thing to here may be that the FSM is deterministic.

{ w in {a,b}* : w has neither aa nor bb as a substring }
DFSM
This FSM accepts any string with alternating as and bs, including the empty string, a, and b. Anything else should fail. The fail state, q3, (and the failing transitions) need not be included explicitly. The FSM should be deterministic, i.e., except for fail transitions it should include exactly one transition out of each state on each symbol and no epsilon (lambda) transitions.
Several students produced a (correct) solution that looks like this: variant solution
Since the symbol that is acceptable depends on which symbol you've just seen, the FSM must distinguish q1 -- just saw a -- from q2 -- just saw b. Neither of them can be the start state, though, because at the beginning (and never afterwards) either symbol is acceptable. This means that there are three accepting states, one for the start (q0) and one for each of the "just saw" states.
- state
  
  nickname
  
  q0
  
  start
  
  q1
  
  just saw a
  
  q2
  
  just saw b
  
  q3
  
  fail
q3 can be omitted (i.e., made implicit), but nothing should ever come out of q3 once it's gotten there.
{ w in {a,b}* : w has an odd number of as and an even number of bs}
DFSM
This FSM must have (at least) four states; it's not possible to solve with fewer. Essentially, these states form a 2x2 grid that keeps track of the parity (oddness or evenness) of as and bs. Exactly one of the four is accepting; it's also important to start in the right one.
- state
  
  nickname
  
  q0
  
  have seen an even number of as and an even number of bs
  
  q1
  
  odd as, even bs
  
  q2
  
  even as, odd bs
  
  q3
  
  odd as, odd bs
Once you have this parity map worked out, transitions should be straightforward. q0 is the start state (0 is definitely even as when you divide it between two children, there isn't anything left over); the problem specifies that q1 is the accepting state.

{ w in {a,b}* : w has both ab and ba as substrings}

DFSM

The idea behind this automaton is that the language is all strings that contain at least one transition from a to b and at least one transition from b to a. So as soon as we see our first symbol, we know which way our first transition will need to occur and we know whether we're on the upper (a-to-b-to-a) or lower (b-to-a-to-b) path. We stay in the first state on the respective paths -- q1 or q4 -- until we see the transition, then switch to q2 or q5. Now we're waiting for the other transition. Once that happens, we're in q3 and don't care whether we see as or bs or some combination because we've already satisfied the acceptance condition.

state	nickname
q0	start
q1	got a, waiting for transition to b
q2	got transition to b, waiting for transition to a
q3	got transitions a to b to a, am happy!
q4	got b, waiting for transition to a
q5	got transition to a, waiting for transition to b

The automaton as drawn is minimal. There are other automata that have more states and do the same work. Be sure that they don't include or exclude other strings, though. One particular not-quite-correct answer that you might see looks like this:
Not quite correct...
The problem with this FSM is that it doesn't accept "aba" or "bab", which are actually strings in the language. I wouldn't really penalize much for this, though, because the question is easily misread. (This might be one of those cases where I'd give a + for a beautiful but incorrect answer.)

Grading standard: Again, I think it's probably easier to grade each FSM separately but the problem will ultimately receive only one grade. A + FSM is one that is correct on all input; it need not be minimal, but shouldn't have a lot of extraneous states. It must be deterministic. It need not explicitly include a fail state, although this turns out mostly to be relevant for 5a. A really nice FSM that contains one minor omission or inclusion might be OK as a +, too. A FSM either leaves out some cases, includes some things it shouldn't or is nondeterministic, i.e., includes epsilon (lambda) transitions or multiple transitions out of a state on the same symbol. A - FSM is one that breaks lots of rules (i.e., isn't an FSM) or that really solves a different problem (or that is missing/substantially incomplete). Overall, four + s and no - s earns a +; two - s and no + s earns a -.

Notes on grading

The solutions provided in this file are often more detailed than I would expect from a student solution. A student solution should include enough information that the question is actually answered -- if it asks "why", there should be a justification, etc. -- but it need not be a rigorous and thorough examination of all possibilities. This is particularly true for the Asymptotic Analysis problems above, where I tried to give an explanation that would satisfy most or all puzzled students, while a student solution only has to demonstrate to me (or you) that the student understands why the given answer is correct.
Also, there are often multiple ways to solve a problem, especially when the problem involves justification or variant notations. I have not tried to be exhaustive in listing the correct answers, which means that you may run into an answer that isn't in here but is still correct. Do your best to ascertain whether that answer is correct, but it's OK to say that you honestly can't figure it out. (With FSMs, you could get JFLAP to minimize them for you, but this may or may not be fun.)
The grading standard is (./), +, - for each of the five problems on the pset, with occasional finer gradations included (and explicitly allowed if you really can't provide the right feedback otherwise!). The rough standard is:
A solution that is perfect or extremely close -- has no significant errors -- receives a +. I like to think that most readers will learn something from a + solution, but simply being correct is also enough.
A solution that has some definite errors, but captures the main ideas of the correct answer, receives a . Most answers to most questions are answers.
A solution that has significant errors or otherwise misses a substantial part of the main idea receives a - .
A missing solution receives a 0
For peer feedback, I am more concerned with your pointing out what's unclear and why or explaining errors that you find than with the utter accuracy of your grading. In particular, I'm not at all concerned whether you can distinguish borderline cases of grading decisions. (No one will suffer from -- or benefit from -- any grading inaccuracies that you introduce.) I'm more concerned with whether you can recognize a(n in)correct solution and figure out (and explain) why.
Your grading of your peer's problem set will itself be evaluated. The criteria will emphasize your feedback over your precision in reading my mind and determining exactly where the line between, e.g., and + might lie.

(0)
(1)	-->	1	->	9	->	41
(2)	-->	18
(3)	-->	3
(4)
(5)	-->	5
(6)
(7)

state	nickname
q0	have seen an even number of as and an even number of bs
q1	odd as, even bs
q2	even as, odd bs
q3	odd as, odd bs