|
|
If you were to build a database, containing thousands of gigabytes of information in various libraries, simultaneously accessible by thousands of users around the country -- and capable of lightning-fast, highly detailed, customized searches of all the documents, no matter how many users were on-line -- how would you go about doing it? How much would it cost?
14 responses total.
In order to get reasonable speed, I'd use fiber connections world-wide. Bandwith isn't a problem with copper, but delay from here to the pacific basin is a problem with copper. I'd use satellite connections as a backup to fiber, and the whole shebang would be IP routers. Personally, I'd use a large mainframe capable of managing terrabytes of data at high speed, and run a mainframe Unix system with distributed database and X-term capabilities. I'd select an Amdahl mainframe and their UTS operating system, which is Unix System V plus about 2 million lines of code to support large system activities. I'd connect 10 or 12 of their 6100 Storage Processors to the system; each of them can easily manage 240 GB of data while doing 16 simultaneous data transfer operations. I'd probably add 128 MB of cache to each one to begin with, then as the Db application required I'd bump them up towards 512 MB or so. I'd also add, regardless, about 16MB of non-volatile storage to each 6100 so that "Fast Write" operations could take place async. The disks themselves would be the Amdahl 6390s. Data transfer is 4.2 MB/sec, and 60 GB with 4-paths fits easily in a cabinet you can lean on. High speed, high tech. I'd use Informix or Oracle as the DB, perhaps Tuxedo for non-GUI or non- distributed (dumb terminal) transactions. I'd probably want to reduce disk cost by going to a hierarchical storage system. Perhaps an Amdahl 6110 High Performance Storage Subsystem for the really high-speed requirements, say, about a GB, then, the 10 or 12 6100s as described above, then to the Storage TeK Tape Library for the low use cheap mass storage. I'd need to use UniTree software for managing the hierarchy automatically. Cost? Now its time to get impersonal. Well, what I'd do is draw up a plan like this in greater detail. If you're buying workstations and including them, I'd stick them in here too. Need a building to house the whole thing? Add it in. Whatever you need. Do you need the application developed? Spec it out also. Wrap the whole thing up in a written "book" called a Request For Information (RFI), have 30 or 40 copies printed; then send it to all the vendors and systems integrators you can think of, who might want to get involved in your project. Be sure to write a one or two page "executive" cover letter that the salesmen can read, that asks them to be as innovative as possible, to consider how they'd accomplish some of these things, and that you have a budget under consideration of about $80,000,000 that you hope is enough. In the RFI, I wouldn't mention specific products (hardware/software) like I did up above. You'd piss of some vendors who would think that you'd pre-selected another vendor. And, you might end up in court with a losing vendor if you specified someone else's product in your eventual Request For Proposal (RFP) that all the vendors hope you send out. Your cover letter should state clearly that vendors chosen to compete for the RFP will be made by examining the responses received from the RFI. That means that those who don't respond to the RFI don't get a crack at your 80 million bucks. Those who don't give you a GREAT response also don't get a crack at it. ------ Since this is all hypothetical, anyway, what is the hypothetical application you want to run here?
For a start, you could lease compuserve's network connections. Many networks do that today.
I'd stay away from Oracle, from what I've seen of it (only the system
administration end of it admittedly..) but those sound like good suggestions.
{
re #1: Actually, I have used various mega-databases (such as Mead's Lexis/Nexis/Medis, and West Publishing's Westlaw), and wondered what was involved. Only $80 million? A bargain.
Mead and West Publishing are both customers of ours; neither is using an open systems solution at this time.
Open systems solution? (Forgive my lack of familiarity with the terms of the trade.)
Open Systems = unix based. Many vendors, no hardware technology lock-in.
Why wouldn't someone want to use an open systems solution? Isn't the software (or performance) available?
There are many reasons, but you hit the nail on the head with the software. Many subsystems aren't there yet; or not there for large system solutions. The pressure *is* on for scalable solutions; to be able to have both scalability and interoperability (eech, I hate that word) are strong incentives to search for open systems solutions. Some products designed for small workstations aren't scalable -- so I would say that not all the pieces are in place yet; they're definitely on their way.
To continue, proprietary systems that have robust subsystems are desired simply because of their longevity and stability; as well as a sole source or nearly sole source means that standards may exist and be adhered to. (In open systems, the joke goes, "standards are very popular...that's why there are so many of them.")
This might help: Open Systems Defined - "Software environments consisting of products and technologies in accordance with 'standards', established and de facto, that are vendor independent and commonly available." - X/Open "Systems which allow unimpeded access to the information required to do one's job." - User Alliance for Open Systems, January, 1991. Why Open Systems? - Portability: The ability to use or migrate application software across different hardware from multiple vendors. Scalability: The ability to move an application onto systems with a variety of performance characteristics. Interoperability: A capability for applications in a variety of different parts of a computing environment to share processes and data. Additional definitions - Monolithic Solution: Single platform, with multiple data bases, applications, and users. Example: MVS running on System/370 hardware platform. Distributed Computing: Single platforms plus network, with a single data base, single application, and multiple users. Example, VMS running on multiple VAXes. Personal Computing: Single platform, single application, single user, with optional shared disk and peripherals via server LAN. Example, MS-DOS. Desktop Computing: Standard platform, multiple vendors, multiple databases, portable applications. Example, desktop and server Unix environments. -------- The monolithic solutions that have been typically offered by Information Systems departments has not kept up with the needs of the enterprise. Many organizations have had other solutions (distributed, desktop, personal, or a mix) installed to meet those pressing needs. Unfortunately, these separate solutions often add additional problems to managing information throughout the enterprise. The problems these departmental solutions cause include: Data at risk Personnel costs Integration of processes Loss of flexibility Loss of competitiveness Loss of opportunity. The burden on executive management is cost control and protection of mission critical data in these environments. Monolithic solutions in the "glass house" often do not cohesively interoperate with these departmental "glass suburbs." An open systems solution that provides interoperation, scalability, true distribution and data control throughout the enterprise would solve the problems caused by departmental "glass suburb" solutions that don't meet enterprise wide data and application management criteria.
Actually, the interesting phrase here is "lightening fast searches". That implies a pretty respectable database package, which among other things, probably devotes a lot more space to indexing the documents, than it provides for the documents themselves. That implies a lot of custom coding, and might also negate using Oracle or Informix -- depends on just how demanding the database requirements are, and how well those products scale today. Most of the systems doing that to date have gone with fully custom code, suggesting the off-the-shelf products weren't up to it. I assume am Amdahl 6390 is more or less equivalent to an IBM 3390 -- if you really wanted to stick to 370 and compatible architectures, you'd certainly want to avoid any dependencies on either Amdahl or IBM -- instead, you'd always ask both vendors (and any others if possible) to bid, and you'd play each off the other to get the best possible price. This sounds like a large enough order that both vendors ought to be quite eager to cut special deals for you. (Besides, both employ "american" salepeople -- just like stereos, and automobiles, nobody pays list, if they can help it...) But sticking with the dinosaur may not be the best bet. You can buy a micro from IBM - the RS/6000, that makes for some real interesting comparisons with their big mainframe, the ES/9000. The speed of light being what it is, smaller = faster, and someday, all mainframes will be single-chip micros. In shear CPU speed, the 6000 is real comparable to a 9000 processor (which one is faster depends on what you do and on which model.) In terms of I/O bandwidth, the 9000 should win. On the other hand, disk is a lot cheaper on the 6000, and so is the CPU. So what if the disk is smaller -- you can buy a lot more of them -- and if you can partition up your task in some convenient manner, then you might find it very attractive to have 20 or 30 RS/6000's each talking to a slow SCSI disk, rather one ES/9000 talking to many fast disks. Hmm. You can buy a rather nice RS/6000 for, what, $50K? Not sure how much a gigabyte scsi drive is these days -- another few thousand, I'd guess. So, for about 70 million, you could have about a thousand RS/6000's, each with its own dedicated gigabyte of storage, making up a terabyte total. 'Course, you'd have to worry about tapes, backups, software, networks, & a lot of other stuff -- (for which you'd have to be a lot more specific on the application, like # of users, write operations & updates, & more...) to fix those other pesky numbers down. Still, I think that ocean of a thousand RS/6000's gives you about a hundred times the CPU power you could get with a mainframe solution, and a lot more flexibility. The RS/6000 is merely one of many -- mips & Sun also offer interesting risc chip systems. You still get to play vendors off against each other, although it's not quite as trivial to switch. Oh yes -- if I remember right, neither electricity nor light travel at "the speed of light" in solid media. I think electricity in copper is at 60% the speed of light in a vacuum -- I forget the exact figure. Light slows down also -- it all depends on the index of refraction, but that 60% figure doesn't sound far wrong. Fiber might be faster, but it probably won't be a whole lot faster. The big advantage fiber has it it takes up a lot less space. One skinny bundle, instead of many fat bundles of copper. Microwaves ought to be faster than either, since air is a lot closer to being a vacuum. A satellite repeater is going to be a lot slower though, unless it's one of many in low earth orbit, and not in geosynchronous orbit way far away.
Does lightning travel at the speed of light?
No, not even close.. At least if you mean how fast the bolt travels.. The current within the lightning bolt travels at a significant fraction of the speed of light but the bolt itself travels slowly enough so you can catch it moving with a movie camera with a few gizmos set up to help..
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss