|
|
I am curious to know whether people here have worked with scaling a database on UNIX/Linux. For the time being, this is a gedanken experiment, as I am not sure it would fly with the powers-that-ain't, and there is the possibility that I may get myself purposely fired because of contract change that will probably hurt all of the employees (I can't give details, per my lawyer's instructions). Presuming I still exist at the company, I am toying with the idea of possibly setting up a large IDS system for a server-farm that has a huge exposure footprint. The design is as follows: each access switch will have a span-port with a SNORT sensor hanging off of it. Each of these sensors will go to a shared pool of aggrigators, which will in-turn feed the data to a database. The data will be made available through a couple of slave databases to an application server running BASE or a similar analysis tool, as well as to a slave replica reserved only for trend-analysis and data-mining. Since all of the data coming from the probes would be of the same type, in the same format, concerning the same issue, it does not look like the database is divisible, or at least, not easily divisible. I know that on a database server with sufficient RAM, performance is i/o bounded, and I am wondering how to work this to make it scale. I would presume that having the NIC (or possibly NICs) on a seperate bus from the discs would be a start (either by using a lot of PCI-e or by getting a system with multiple buses), and by having the disc-array use a large, fast memory buffer, so that the host can write to that and not have to wait for the order-of-magnitude slower write to the actual drives. On a database server that is taking a *LOT* of incoming data (lots of insert queries -- the serevr farm is pretty fucking huge and has a huge exposure footprint (i.e. virtually every server is fully internet-facing)) and writing them all to a table, is the primary slow-down going to be getting the data in (from the network) or getting the data out (to the tables/drives)? Can the slower of these two have a higher preempting priority so that the two can be made roughly the same speed? Is there a formula or methodology or even a rule-of-thumb for approximating the number of insert queries a database can do in a second, given the number of processors and amount of RAM and other factors? Do the different major databases differ greatly in the number of queries per second that they can process? For example, would there be a major difference between using an untuned PGSQL vs MySQL vs Oracle (we don't have a mainframe, so using DB2 would probably not be an option; besides, we have a lot of UNIX, BSD and Linux knowledge, and only two people who know mainframes)? Same question for the various databases and having each one tuned by an expert. Can a database have multiple listeners (perhaps one per node in the aggrigator pool)? Would the listeners wind up preempting one another? Could each listener be tied to a separate SMT/SMP core (so 2 dual-core, hyperthreading processors could simultaneously service 8 threads, for example)? Is there a risk of simultaneous insert queries failing or stalling or something because one would have a lock on the table or can a database lock something smaller than a table (such as just the row or even just the cell)? Would it substantially matter if this was done in UNIX or BSD or Linux? Am I over-thinking this? Thanks.
0 responses total.
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss