No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help
View Responses


Grex Systems Item 29: The Algorthims item
Entered by cross on Sat Sep 16 23:45:31 UTC 2006:

This is the algorithms item.  An algorithm is essentially a technique for
doing something; for instance, sorting an array of integers or list of items.
Algorithms are fundamental to computing, and form a basic part of any
programmer's education.  They are also often independent of any particular
language or environment.  Discuss them here.

13 responses total.



#1 of 13 by herasleftnut on Sat Oct 7 17:43:48 2006:

Okay, I'm not drawing a clear distinction between the worst case scenario
between Binary Search Trees and QuickSort. I know both of these algorithms
can be represented as a tree. I think the correct mathematical term in an
acyclic graph. I would look this up, but like the CR math book is holding my
bedroom window.

Anyhow, I can see the Binary Search Tree turning into a linked list in the
worst case scenario. In this case it would linear time (vs logarithmic).
However, for the Quicksort Algorithm, I don't see how it can be quadratic time
in the worst case scenario if the tree also degenerates to a linked list? I
know this be calculated used mathematicl induction, but I totally forgot how
do that kind of shit.



#2 of 13 by herasleftnut on Sat Oct 7 17:45:06 2006:

er *CRC Standard Mathematical Tables and Formulae*


#3 of 13 by cross on Sat Oct 7 18:19:15 2006:

Regarding #1; Quicksort doesn't turn anything into a tree, though some
textbooks use a graphic that looks like a tree to represent how the
algorithm recursively splits up the array as it operates (they often do this
when describing the algorithm's space complexity).  This may be confusing
you....

With a basic tree, yes, the worst case for insertions is if the data is
pre-sorted, in which case everything goes in on one side or the other of the
tree (depending on how it's sorted), and the tree turns into a glorified
linked list and access times are O(n) (e.g., bounded by some constant and
n).

Quicksort just operates on some linear data structure, sorting the elements
in place.  It does this recursively.  Basically, in each invocation, the
algorithm picks the middle element of the array, calls it the "pivot", and
walks over the array.  If, during its walk, it finds an element on the left
side of the pivot that's greater than or equal to the pivot, it tries to
find an element on the right side of the pivot that has a value less than
that of the pivot and swaps the two.  Note that the pivot might move around
in the array during this process.  But, by the time you get to the pivot
from both sides, you're guaranteed that everything on the left side of the
pivot is strictly less than the pivot value, and everything on the right
side is greater than or equal to the pivot value.  Then you call quicksort
twice on subarrays: once to the left of the pivot value, and once to the
right.

Now, this clearly splits the array in half each time, right?  So one would
think that it would take on the order of O(n lg n) times (where lg is
logarithm base 2), since I'm recursing O(lg n) times and each time I'm
walking over O(n) elements of the array, for a total time complexity of
O(n)*O(lg n) = O(n lg n).  But what happens if I pick a bad pivot?  Say,
nothing is ever greater than the pivot, so *every other element* lies to the
left of the pivot value.  Then, the pivot sort of moves over to the far
right hand side of the array, and when I recurse, I walk over n - 1 elements
on the left.  Suppose that, every time I recurse, I end up wind up with a
similarly bad pivot with n - k elements on the left (for the k'th
recursion).  Then it's not hard to see that I'm ending up with O(n)
recrusions instead of O(lg n), and one each, I'm dealing with O(n) elements
for a total time complexity of O(n)*O(n) = O(n)^2 = O(n^2).

So, for a poor choice of pivot, I get pathlogical behavior that's bounded by
a term that is quadratic in the size of the input.


#4 of 13 by herasleftnut on Sat Oct 7 21:52:07 2006:

Okay, cool. That clears up a lot of shit. 


#5 of 13 by cross on Sat Oct 7 22:00:29 2006:

Yeah.  The secret to understanding the worst case scenario for quicksort
running time is the choice of pivot: you don't *have* to pick the middle
element, but if the pivot is chosen poorly, the time will be bad.


#6 of 13 by sholmes on Sun Oct 8 00:00:29 2006:

There are techniques for choosing pivots which minimize(thought not entirely
negate ) such worst cases. Moreover the qsort() that is supplied with glibc
is not a pure quicksort. When the size of the array after a few recursion
gets smaller than a particular value , it invokes insertion sort on them.


#7 of 13 by cross on Sun Oct 8 03:17:51 2006:

Yes, there are many techniques for choosing a pivot to minimize the chances
of pathological behavior.  McIlroy and Bentley wrote an SP&E on it a few
years back; their sort is similar if not identical to what comes with BSD
and maybe some other operating systems.  Let me see if I can find a link to
that paper....  Here's one, though it's not where I'd expected it:

http://www.enseignement.polytechnique.fr/profs/informatique/Luc.Maranget/42
1/09/bentley93engineering.pdf

The optimization where you stop quicksort once you have a sufficiently small
subarray and instead employ something else (e.g., insertion sort) is based
on the observation that while quicksort nominally has O(n lg n) behavior, in
practice there are fairly large constant factors that bound the run time.
Insertion sort, which has runs on order O(n^2) has smaller constant factors
and thus can be much faster than repeated recursions on small arrays than
quicksort.  Further, for arrays that are `close' to sorted, insertion sort
runs in near linear time.  In general, there's some cutoff-point at which
it's actually cheaper to use the slower algorithm.  I believe the idea is
due to Sedgewick, but it's to optimize the constant portion of the runtime
of the algorithm (the stuff you don't see inside the big O), not so much to
minimize the effects of chosing a bad pivot....


#8 of 13 by mcnally on Sun Oct 8 03:34:53 2006:

 I'm a heapsort fan, mostly because the whole heap concept is so cool
 once you finally get it.  Also, it doesn't have the O(n^2) worst-case
 performance problem that quicksort has..  


#9 of 13 by cross on Sun Oct 8 03:54:45 2006:

That's true, though in the non-pathological case, heapsort tends to be a bit
slower than quicksort.  Still, it's good if you need bounded O(n lg n) running
time.  Mergesort is also a good one; I remember sitting in a CS class once
where the professor used the standard ``you have a lot of data on tape that
won't fit into core...'' example for the rationale for mergesort.  The guy
sitting next to me (who I knew socially) looked at me and said with a sort
of grimmace, ``tape?!''  To which I replied, ``cache.''


#10 of 13 by sholmes on Sun Oct 8 12:36:16 2006:

Moreover data in businss are usually somewhat sorted, e.g. the cheques are
usually encashed in serial number order more or less.


#11 of 13 by cross on Sun Oct 8 16:36:00 2006:

Yeah.  Often, as a design choice, I've found it best to not worry about
sorting data, but rather, to store it in some sort of pre-sorted container
(like a balance tree structure or something).  Then, the cost of sorting
the data is amortized over the lifetime of total execution time of the
program.


#12 of 13 by easlern on Fri Dec 8 15:34:43 2006:

Does anybody know where I can look to learn about different algorithms for
integration? It'd be great if there were some info on complexity too. I found
some info on the Risch algorithm, which is used by Maple, but no mention of
its complexity or method. The usual haunts (Wikipedia, Google, Wolfram) have
failed me.  :(


#13 of 13 by cross on Fri Dec 8 16:23:55 2006:

The Numerical Recipes book has a lot of information on such things.

Response not possible - You must register and login before posting.

No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss