|
|
This is the algorithms item. An algorithm is essentially a technique for doing something; for instance, sorting an array of integers or list of items. Algorithms are fundamental to computing, and form a basic part of any programmer's education. They are also often independent of any particular language or environment. Discuss them here.
13 responses total.
Okay, I'm not drawing a clear distinction between the worst case scenario between Binary Search Trees and QuickSort. I know both of these algorithms can be represented as a tree. I think the correct mathematical term in an acyclic graph. I would look this up, but like the CR math book is holding my bedroom window. Anyhow, I can see the Binary Search Tree turning into a linked list in the worst case scenario. In this case it would linear time (vs logarithmic). However, for the Quicksort Algorithm, I don't see how it can be quadratic time in the worst case scenario if the tree also degenerates to a linked list? I know this be calculated used mathematicl induction, but I totally forgot how do that kind of shit.
er *CRC Standard Mathematical Tables and Formulae*
Regarding #1; Quicksort doesn't turn anything into a tree, though some textbooks use a graphic that looks like a tree to represent how the algorithm recursively splits up the array as it operates (they often do this when describing the algorithm's space complexity). This may be confusing you.... With a basic tree, yes, the worst case for insertions is if the data is pre-sorted, in which case everything goes in on one side or the other of the tree (depending on how it's sorted), and the tree turns into a glorified linked list and access times are O(n) (e.g., bounded by some constant and n). Quicksort just operates on some linear data structure, sorting the elements in place. It does this recursively. Basically, in each invocation, the algorithm picks the middle element of the array, calls it the "pivot", and walks over the array. If, during its walk, it finds an element on the left side of the pivot that's greater than or equal to the pivot, it tries to find an element on the right side of the pivot that has a value less than that of the pivot and swaps the two. Note that the pivot might move around in the array during this process. But, by the time you get to the pivot from both sides, you're guaranteed that everything on the left side of the pivot is strictly less than the pivot value, and everything on the right side is greater than or equal to the pivot value. Then you call quicksort twice on subarrays: once to the left of the pivot value, and once to the right. Now, this clearly splits the array in half each time, right? So one would think that it would take on the order of O(n lg n) times (where lg is logarithm base 2), since I'm recursing O(lg n) times and each time I'm walking over O(n) elements of the array, for a total time complexity of O(n)*O(lg n) = O(n lg n). But what happens if I pick a bad pivot? Say, nothing is ever greater than the pivot, so *every other element* lies to the left of the pivot value. Then, the pivot sort of moves over to the far right hand side of the array, and when I recurse, I walk over n - 1 elements on the left. Suppose that, every time I recurse, I end up wind up with a similarly bad pivot with n - k elements on the left (for the k'th recursion). Then it's not hard to see that I'm ending up with O(n) recrusions instead of O(lg n), and one each, I'm dealing with O(n) elements for a total time complexity of O(n)*O(n) = O(n)^2 = O(n^2). So, for a poor choice of pivot, I get pathlogical behavior that's bounded by a term that is quadratic in the size of the input.
Okay, cool. That clears up a lot of shit.
Yeah. The secret to understanding the worst case scenario for quicksort running time is the choice of pivot: you don't *have* to pick the middle element, but if the pivot is chosen poorly, the time will be bad.
There are techniques for choosing pivots which minimize(thought not entirely negate ) such worst cases. Moreover the qsort() that is supplied with glibc is not a pure quicksort. When the size of the array after a few recursion gets smaller than a particular value , it invokes insertion sort on them.
Yes, there are many techniques for choosing a pivot to minimize the chances of pathological behavior. McIlroy and Bentley wrote an SP&E on it a few years back; their sort is similar if not identical to what comes with BSD and maybe some other operating systems. Let me see if I can find a link to that paper.... Here's one, though it's not where I'd expected it: http://www.enseignement.polytechnique.fr/profs/informatique/Luc.Maranget/42 1/09/bentley93engineering.pdf The optimization where you stop quicksort once you have a sufficiently small subarray and instead employ something else (e.g., insertion sort) is based on the observation that while quicksort nominally has O(n lg n) behavior, in practice there are fairly large constant factors that bound the run time. Insertion sort, which has runs on order O(n^2) has smaller constant factors and thus can be much faster than repeated recursions on small arrays than quicksort. Further, for arrays that are `close' to sorted, insertion sort runs in near linear time. In general, there's some cutoff-point at which it's actually cheaper to use the slower algorithm. I believe the idea is due to Sedgewick, but it's to optimize the constant portion of the runtime of the algorithm (the stuff you don't see inside the big O), not so much to minimize the effects of chosing a bad pivot....
I'm a heapsort fan, mostly because the whole heap concept is so cool once you finally get it. Also, it doesn't have the O(n^2) worst-case performance problem that quicksort has..
That's true, though in the non-pathological case, heapsort tends to be a bit slower than quicksort. Still, it's good if you need bounded O(n lg n) running time. Mergesort is also a good one; I remember sitting in a CS class once where the professor used the standard ``you have a lot of data on tape that won't fit into core...'' example for the rationale for mergesort. The guy sitting next to me (who I knew socially) looked at me and said with a sort of grimmace, ``tape?!'' To which I replied, ``cache.''
Moreover data in businss are usually somewhat sorted, e.g. the cheques are usually encashed in serial number order more or less.
Yeah. Often, as a design choice, I've found it best to not worry about sorting data, but rather, to store it in some sort of pre-sorted container (like a balance tree structure or something). Then, the cost of sorting the data is amortized over the lifetime of total execution time of the program.
Does anybody know where I can look to learn about different algorithms for integration? It'd be great if there were some info on complexity too. I found some info on the Risch algorithm, which is used by Maple, but no mention of its complexity or method. The usual haunts (Wikipedia, Google, Wolfram) have failed me. :(
The Numerical Recipes book has a lot of information on such things.
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss