[FrontPage] [TitleIndex] [WordIndex

Notes for September 21

Return to the main StudentFrontPage.

Ball Comparisons

Themes of Today's Class

Finding the Minimum/Maximum of a Collection

Aside: Formality and Expected Knowledge

Finding the Ith element in the list (Quicksort)

WikiPedia entry on Quicksort

  1. Pick an element in the list (randomly). This is the pivot.

  2. Divide the list into two chunks, those numbers that are larger (in value) than the selected value and those that are less. (this will take n-1 comparisons to split)
  3. Hopefully this gives us about half and half in each group. This is like the dictionary example, halving the size of the thing for which we are searching. This should end up taking about log(n) passes. (If the random selection of the pivot gets the smallest value in that list, it takes n comparisons and doesn't effectively split the group - every value will be larger than the pivot and so we only end up with another list that's just n-1 long. If this happens every time, this method will suck.)

    Taking a random pivot could take as long as n2, but it could also end up a lot better than n2.

    best case


    (1 pass, get lucky)

    average case


    2 = 1 + 1/2 + 1/4 + ... (We would say this is O(2n), except that order is the upper bound, so O(2n) = O(n))

    worst case


    see above justification

    It turns out we can do a better job of avoiding the O(n^2) worst case scenario. Take the collection and divide it into groups of five. Find the median of each of these groups. This can be done in six comparisons. Now we have a whole bunch of elements about which we know that there are two things greater than the middle element and two things less than the middle element.
  4. Take the collection and separate into n/5 groups of five. (O(n), maybe faster, but def. proportional to n)
  5. Find the median of each group (2 above, 2 below, 1 median) (n/5 * 6, O(n) - some discussion about the actual cost of finding the median. might be 7?)
  6. find the median of the medians. (f(n/5))
  7. Use the median of the medians as a pivot.
  8. Recurse on what's left. (do it like we did it above, but always use this algorithm to select the pivot.) It turns out that if you do it this way, eliminating about a 1/4 for each cycle, this is still O(n). It also has a huge constant. This means that it's not a good way to find the ith element of a small collection - small collections should be sorted. What's cool is that it's a guaranteed linear way to find the ith element. It's also neat that the number 5 appears, and is crucial to making the math work out. It's especially slick because it isn't a power of 2.

o o o o o|o o o o o
o o o o o|o o o o o
_______  |  
o o o o[o]o o o o o
       |  ---------
o o o o|o o o o o o
o o o o|o o o o o o

Each column is a set of five, with the third row being the medians. The center one (bracketed) is the median of the medians. The ones within the boxes (up-right and bottom-left) are eliminated as being greater than or less than the selected pivot.

Aside: Orders of Growth

Sorting Reprise: Doing better than n log(n)

Hash Sort

Example: The Faculty Members at Olin and the classrooms where they teach.

We want something like an array that's order 1 access to retrieve items.

So we define a hash function. For example take the 1st letter of the faculty member's name mod 10 (a = 0). In our example (Stein, 318). This says: put the data (318) in the 8th slot. For Martello, put it in 2 (12 mod 10). To retrieve values, we ask "what is Martello's hash number?" and then ask that array location for its values. This makes it constant time to both put things in _and_ get things out. This is cool.

There are potential glitches, though. What if we want to add somerville? His has resolves to the same as Stein. Collision! There are two ways to solve this problem:

Take Aways

2013-07-17 10:43