Frances' Notes for Friday, September 17th, 2004
Contents
Intro Math Trick
- Lynn will guess which page of the OED a word is on, and steve will tell her if her guess is too low or too high. Lynn's guesses:
- 1200 too high
- 600 too low
- 900 too low
- 750 too high
- 825 too high
- 787 too high
- 769 too low
- 778 too high
- 774 too high
- 772 too high
- 771
[ ] - / \ | [ ] [ ] | / \ / \ | Depth 3 = log_{2}^{2} [ ] [ ] [ ] [ ] | / \ / \ / \ / \ | [] [] [] [] [] [] [] [] _ 2^n on bottom level - total nodes = 2^{n+1} - 1
For orders of growth, people don't think about the base of the logirithm, because it's just a constant. The fact that the tree N can be transversed from top to bottom in Log N is a fact we can use for good or evil. Computers represent numbers as strings of bits. In n bits, there are 2^{n} possibilities[ ] - 0/ \1 | [ ] [ ] | 0/ 1\ 0/ 1\ | Depth 3 = log_{2}^{2} [ ] [ ] [ ] [ ] | 0/ \1 0/ \1 0/ \1 0/ \1 | [0][1][2][3] [4][5][6][7] _ 2^n on bottom level - total nodes = 2^{n+1} - 1
Trees in Lisp
- Node datum = car
- left child = (car(cdr...))
- right child (car (cdr (cdr...))) Because we know how much space a ballanced tree needs, we know exactly how big it's going to be, which means we can use an array represenatation to represent the tree. We can stick the first thing in Zero. Tree left will go in [2i + 1] and the tree right in [2i + 2]. Remember, to store the next level you will need the same amount of space as you needed for all previous entries. These trees can be unwieldy though, because it's very difficult to change the structure.
The children of
go in
0
1 and 2
1
3 and 4
2
5 and 6
Different Sorting Mechanisms
- Bubble sort is very inefficent, but easy to code and understand. Insertion sort can be very efficent if the list is mostly inorder to start with. We can use trees to make insertion sort more efficent.
- === Insertion Sort Example ===
Sort: 777, 42, 420, 9, 17, 41, 7, 7, 4, 28, 23, 16, 5 First Pass: 17, 28, 777 Second Pass: 17, 23, 28, 41, 777, 42 Third Pass: 17, 23, 7, 28, 41, 16, 777, 42, 420 Fourth Pass: 17, 23, 7, 4, 28, 41, 16, 5 , 777, 42, 420, 9
We have broken it up into groups and this can be done recursively. It puts things closer to where it should be. It is called shell sort and it is more efficent than regular insertion sort. Breaking things into smaller pieces makes them run faster.
x > parent x The idea is that the root, the parent is always going to be the next element that I want. It will always be the smallest element. It keeps the order of the smallest element on top. We don't though care about the right or left is bigger. We dont' care about the order of the children (as we would in a binary tree). This is a priority queue. We insert at the next free position.
Verifying/heapifying = O(log n)0
1
2
3
4
5
Place
1
Insert 1 in the first free place
1
8
Insert 8 in the first free place, it's parent is 1
1
8
7
Insert 7 in the first free place, it's parent is 1
1
8
7
4
Insert 4 in the first free place, it's parent is 8
1
4
7
8
Swap 8 and 4, because 4 is smaller than 8
- ==== Selection using heap ====
- Remove top element - O(n)
- reheapify by moving the smaller child up (recursively) - O(log(n))
- Break data in half log(n) times.
- Sort each subsection (O(1))
- Put back together merge Divide and conquor computational intensity:
1 * n/2 + 3 * n/4 + 7 * n/8 There are log(n) steps, because each time we double the list. Each step we step up is log(n) merges. The end efficency is O(n log n).
- === Insertion Sort Example ===
Things you should know
- Bubble Sort - you should be able to program this n^{2}
- Insertion Sort -| You should be able to do one of these. n^{2}
- Selection Sort -| n^{2}
- You should be able to see why it is n^{2}. You should know why being in order to start with helps or hurts.
- You should know how to do a heap sort - build and check the heap properties and know why it's n log n
- You should know merge sort and explain why it's n log n. n^{2} sorts generally aren't used unless the lists are small, or mostly in order to start with. Bubble sorts are the worst sorts. Quick sort [read about it] - but it is very expensive, because there is a lot of guessing. Finding the right abstraction is a major theme of this course, but another theme is the power of log(n).
To Do:
- Read Paul Graham article and "Data Structures and Their Algorithms"