You are not logged in. Login Now
 0-5          
 
Author Message
maus
Performance question Mark Unseen   Mar 21 04:22 UTC 2007

Presuming identical hardware and a platform-independent language (that
is sadly not explicitly parallelizable without modification to the
interpreter itself), does anyone know how performance stacks up for
running a batch job on Linux vs OBSD vs FBSD vs Solaris 10 vs Cygwin?
Specifically, I will be running iterations of large, nasty
network-simulations using ns2, which uses a derivative of Object-TCL to
describe the network (and which does not provide a facility to
parallelize the task), and have a machine that should be sufficiently
butch to handle the load. I intend to create the simulation and leave it
crunching over-night, but does anyone know how performance changes as a
function of the number of nodes and rate of traffic generated? Are the
constraining factors for these simulations something that can be
compensated for by choosing an optimal operating environment? I've run
much smaller, simpler simulations on a time-sharing Solaris server
(E3500) and on my laptop (AMD semperon, Cygwin under WP-Pro), and even
with the more performant machine on which to run the simulation
(mirrored 10kRPM U320 drives, dual core AMD processor, 4 GBytes RAM), I
am not optimistic about the speed of running the simulation, and being
able to iteratively tweak large simulations (probably around 1M nodes,
many linked to more than one other node) in any sane amount of time. If
the simulator were parallelizable, I would rent time on a cluster (or,
rather, try and barter to get someone to rent time on my behalf if the
price were out of my range). 

Thanks and sorry for the rambling question
5 responses total.
mcnally
response 1 of 5: Mark Unseen   Mar 21 17:15 UTC 2007

 Will you be doing mostly computation?  Disk I/O?  Network I/O?
 Memory allocation and freeing?  Process and/or thread setup and
 control?  

 Not that I know the answer to your question anyway, but *nobody*
 could begin to answer it without knowing more about what the 
 performance chokepoints in your program are likely to be.
maus
response 2 of 5: Mark Unseen   Mar 21 17:54 UTC 2007

I realize I had a badly formulated and incomplete question. Part of the
problem is that I do not know what the chokepoints are for this
software, and searching via Google and a couple of other resources show
lots of information about using ns2 to find chokepoints in networks, but
those overwhelm the results, keeping anything with information about
chokepoints in the simulation software from showing up. From what I know
about the underlying system, there will be a large amount of disk
access, though it will all be to a pair of files, so I presume on a
fresh install of the OS, these files should be fairly contiguous, taking
out the overhead from seeking. Additionally, there will be a lot of
treads spawned for creating the topology, with the actual running of the
simulation being run in-order. I know memory consumption will also be
huge (my understanding is that it keeps the entire topology in memory,
along with a route-map for each node that performs packet switching
duties, at the very least in the beginning). In short, I guess this will
be slow in every way except for use of the network (since the simulation
runs entirely on one machine). 

Thanks for forcing me to think about the question in a more sane way. 
maus
response 3 of 5: Mark Unseen   Mar 21 18:09 UTC 2007

Looking deeper into the archives of the mailing list for the software
showed some information. Namely, since each node represents a
stand-alone device, each node keeps as much information about the
network as would be kept in a kernel-routing-table, so when the
simulation is running, a lot of information is kept in memory. I'll try
to put some markers into the early simulations to see if they can tell
me more about where the bottlenecks are, and will re-ask the question in
a more sane way, with more/better information. 
arthurp
response 4 of 5: Mark Unseen   Mar 22 09:18 UTC 2007

Sounds like memory will be huge.  Did I gather correctly that there will
be a thread for each node?  If things haven't changed since I last knew
the Sun should be way faster at the context switches involved in all
those threads.  I'd look at the real clock time for context switches for
each of your candidate systems both normal loading and in the
pathological case where number of contexts greatly exceeds the number of
hardware context frames.

I'd also be interrested in the amount of work done on any one thread
before switching to another WRT memory locality and cache hit rates. 
Specifics of the cache hardware and cache clearing on context switches
may play a big part.

There's a good chance that hardware will play as much role as OS.
maus
response 5 of 5: Mark Unseen   Mar 23 03:13 UTC 2007

Memory will be huge when I run the *really big* simulations. The box has
4 GBytes of physical RAM and will probably wind up having as many swap
files as the OS allows (I believe Linux allows 8, not sure about
UNIX/BSD, and if I go with Cygwin, I'll get Windows's pathologically
enthusiastic paging setup). 

I am pretty sure it is a separate thread for each node creation, and
then it is a single thread for the whole network for the actual running,
though I could be wrong. 

The truly hardcore, deep system-level performance analysis (memory
locality, cache hit rates, context switching) is out of my depth. I will
read up on it, though. 

If the programme were more parallelizable and would benefit from the sun
architecture, I would set up the three Netra clones and the Netra and
the Ultra5 as a compute cluster. If I thought I would be done before
graduation, I would just run this on the Sun at school (I forget if it
is an E3000 or an E3500, but it is big and butch and spiffy). 

Anyone have a spare v880 that they don't want? I'll take it off your
hands and give it a good home doing scientific computing (and never need
a heater in the winter). 

 0-5          
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss