|
Grex > Agora46 > #67: Buggy code has already sent people to jail! | |
|
| Author |
Message |
russ
|
|
Buggy code has already sent people to jail!
|
Jul 9 02:15 UTC 2003 |
...
Some malfunctions in SEVIS have distorted students' files,
mysteriously replacing or deleting various pieces of information.
Students can be deported if their records aren't in order. Already,
some have been held overnight in jails or kept from re-entering the
United States because their records were misplaced or because the
system garbled them.
...
"Oh, you're not going to believe what it's doing now," said Catheryn
Cotten, the director of the International Office at Duke University,
to the SEVIS help desk technician on the phone as she stared in amazement
at her computer screen. Cotten had logged into the system to work on
some foreign exchange students' records. But when she clicked a link
to access them, SEVIS told her there were 1,800 exchange students
still awaiting entry. Cotten was stunned.
"I don't have that many [full-time foreign students] at Duke, let
alone in my exchange program," she says, recounting the event.
Somehow, Cotten had accessed the records of a summer camp program
at another school that trains foreigners to be counselors. The
help desk operator wasn't surprised. This had happened before; so
many times, in fact, that the glitch had been given a name: "bleeding."
Several schools have seen their student records mixed up with those
of other institutions in the past few months. A student record from
Michigan State University, for example, was routed to a printer at
a college in Arizona. In other instances, pieces of data are jumbled.
Officials at the University of Texas in Austin reported that a student’s
last name was listed as an address, one that didn’t belong to any student
in the school’s records.
http://www.govexec.com/dailyfed/0703/070303h1.htm
|
| 26 responses total. |
xi
|
|
response 1 of 26:
|
Jul 9 03:59 UTC 2003 |
hm.. i think a couple of guys from my old job probably wrote that piece of
shit /if the story is true, of course/
|
flem
|
|
response 2 of 26:
|
Jul 9 15:14 UTC 2003 |
Heh. My current job consists mainly of converting data from a wide variety
of database sources into a format compatible with my company's database. I
get to look at raw data in lots of database schema. I'm constantly amazed
by just how screwed up these databases can get, even when the design is
fundamentally sound (rare), simply by the garbage that users put in
them.
Thank god I don't have any important parts of my life controlled by
databases. </sarcasm>
|
janc
|
|
response 3 of 26:
|
Jul 9 16:10 UTC 2003 |
I've had similar jobs, where some client comes with a database that they have
been maintaining manually with something like Microsoft Access and they want
to put it into an on-line database. I've never once had a database that
wasn't corrupt to the point of being virtually useless. Mostly it ends up
being a job that has to be done manually, line-by-line.
|
other
|
|
response 4 of 26:
|
Jul 9 20:02 UTC 2003 |
Timely item. I was just talking with a non-profit organization's staff
about long range planning regarding their database.
Right now, they are running a FileMaker database with about 30,000
records, being served from a machine running Mac OS X Server and
optimized as a database server.
My primary suggestion was that they run two development processes
concurrently. One process is the ongoing implementation of small scale
changes which improve the functionality of the current system on an as-
needed and as-available-to-do-it basis. The other process is the setting
aside of funds on an annual basis for development and implementation of a
whole new system designed to meet their worst-case (heaviest) usage needs
for at least five years beyond implementation.
The second process requires some research in order to determine several
things: precise needs staff have for database functionality; the
differences between those needs and the needs the current system can
meet; what products best suit their circumstances and identified needs --
with room to grow -- and; what costs are likely to be required to employ
those products.
Can some of you with expertise in database development and implementation
suggest resources for the latter elements of this research?
Key considerations: The office desktops currently run Mac OS 9, and
probably will not be changing that for at least a year or two. When they
do change, the likely next step will be Mac OS 10.x (x = whatever is
current at the time). The database server will not likely change to a
different OS, just upgrade to newer versions of the current one.
|
gull
|
|
response 5 of 26:
|
Jul 10 13:50 UTC 2003 |
I had to move an Access database to a MySQL server, a while back. It
was the company's parts database, used in generating bills of material
for construction jobs. It was a mess...they'd been running it as an
Access shared file data base for years until it finally got to the point
where it would become corrupted within hours of being restored from
backup and couldn't be repaired in a way that would last. Fortunately
it was a fairly simple database structure, and I was able to salvage
most of the data from the backup copy without too much manual work. The
hard part was sorting out a bunch of duplicate part numbers that had
somehow sneaked in, even though Access was set to disallow them. Oh,
and reworking all the forms and reports, since Access allows identifiers
with spaces and MySQL doesn't.
|
polygon
|
|
response 6 of 26:
|
Jul 10 14:34 UTC 2003 |
The insidious data corruption problem is the biggest reason I'm glad I'm
not using Microsoft Access.
|
janc
|
|
response 7 of 26:
|
Jul 10 15:36 UTC 2003 |
The specific problems I've mostly had resulted from the fact that they wanted
to change the structure of the database. For example, you'd have a list of
contacts - names of people at various companies or government agencies with
addresses, phone numbers and fax numbers. This would be set up as a single
table, one row for each person, with most of the fields being just plain text
fields.
Now they want to merge it into a more "grown up" system that is used for lots
of other things too. You might have a separate tables for companies, and
people would be linked to their companies. Simple enough in theory.
In practice when you look at the database, you find that hardly any
company name was spelled the same twice. So "Greenfield", "Greenfield
Industries", "Greenfield Ind.", "Greenfield Industries, Inc." and "GI"
are all in the database, and by the way, did you know that "RTW" and
"Kennametal Industrial Products Group" are the same company as "Greenfield
Industries"? There are multiple duplicates for what appear to be the same
person with different contact info. Addresses give the state name as "MI"
"Mich" "Mich." "Michigan" or "Micigan". Phone numbers are formated in
dozens of different ways, some missing area codes.
This isn't database corruption in the technical sense - the database software
is doing a perfectly good job of keeping the data that was put into it.
But it makes for a substantially less useful database. You can't readily
call up all contacts for one company, or all contacts in one state.
There is no way to clean up a database like that short of having some person
go through manaully and fix up each record. Some of it can be done by any
random dweeb with common sense, but much would be better done by someone
with a lot of knowledge of the folks in the database - which usually means
someone who's time is scarce and expensive.
Really, the way to have a good database is to start enforcing as much
consistancy on it as you can from the get go. But that never happens in the
real world. When organization are getting going they have other things on
their minds.
|
tod
|
|
response 8 of 26:
|
Jul 10 15:57 UTC 2003 |
This response has been erased.
|
cross
|
|
response 9 of 26:
|
Jul 10 16:41 UTC 2003 |
This response has been erased.
|
tod
|
|
response 10 of 26:
|
Jul 10 17:10 UTC 2003 |
This response has been erased.
|
flem
|
|
response 11 of 26:
|
Jul 10 17:54 UTC 2003 |
It's been a long time since I read the definitions, but ISTR that most real
world databases stick to level 3 or 4 normalization. Higher than that is just
more PITA than it's worth.
SQL is a query language. There is an ANSI standard. Like all languages,
it has as many dialects as implementations. MS Access uses a dialect
called MS Jet SQL. It has some interesting features, and some pretty lame
features, but it generally seems pretty feature-complete to me. I assume
Tod is talking about either MySQL or MS SQL Server, both of which use
substantially different dialects of SQL.
Jan is dead on target about the practical problems that plague databases.
I'd also add the fact taht as databases evolve and as employees come
and go, the way they get used changes substantially. Maiden names are an
example; for some reason people always want to know what a woman's maiden
name is but never put a field in their schema for it. So one user might
enter it as part of their last name, like "Jones(Smith)". Someone else
might just stuff it into the middle name field. A third person might come
along and get fed up enough to add a maiden name field to some table, and
start using it, but never get around to telling the first two people.
|
jazz
|
|
response 12 of 26:
|
Jul 10 18:12 UTC 2003 |
The big problem I see is that databases aren't properly engineered
from the start, and in order to fix that problem, they aren't re-engineered,
but rather patched.
|
flem
|
|
response 13 of 26:
|
Jul 10 19:15 UTC 2003 |
Yeah, that's certainly a problem, but it's not quite the same problem as the
fact that users are more interested in having their data be convenient for
them than in whether or not it's in the correct field in the right table.
There's no engineering process in the world that can deal with the fact that
if Joe User wants to see the Part Number on the Inventory screen instead
of clicking through to the Parts Definition screen, he'll cheerfully get
used to typing it into the Part Name field instead of where
it belongs, rather than submitting a change request and waiting some
indefinite length of time for it to be "fixed". Maybe, if he's
consciencious, he'll type it into both fields, but he'll probably spell
it wrong 10% of the time.
|
jazz
|
|
response 14 of 26:
|
Jul 10 21:43 UTC 2003 |
Pre-entry validation works pretty well for that, assuming the numbers
are sufficiently different in scope.
|
slestak
|
|
response 15 of 26:
|
Jul 11 06:01 UTC 2003 |
An MS Access database can be fairly easily migrated to MySQL by converting
the Access DB into a comma delimited file from within MS Access. Once you have
a comma delimited text file,(pun) you can use a text editor to remove pesky
space problems without risking damage to the original DB. Once these steps
are completed, you can import the comma delimited file into MySQL using an
import command: "shell> mysqlimport [options] database textfile1 [textfile2
...]" This command can be used for each table in your new MySQL DB. If running
Mac OS X, MySQL,Apache2 and PHP can be installed and configured with minor
head aches. Once these components are in place, drop a *.php file containing
"<php? phpinfo() ?> into the root directory of the Mac OS X webserver and call
the file up in a web browser. If you get a long purple page with the operating
system finger print,MySQL info,Apache2 installation details etc, your on your
way. PHP allows for rapid development of web page interfaces to MySQL and
other DB's. You can serve up your DB to users on the lan, localhost, or the
internet via Apache2. If your in a big hurry, you can use several "form
generators",written in PHP to create test forms that look nasty and allow you
to test and develop further without having to recode a bunch. I used these
methods to create a MySQL DB containing 485,000 / 20+ table entries with funky
old FAA MS Access garbage DB's. The end result was up and serving the web in
one day. I apologize to all for the rip and slash posting...I gotta run...If
you want help, I'm bored and you can email me on Grex. I dig Grex.
|
scott
|
|
response 16 of 26:
|
Jul 11 06:39 UTC 2003 |
I suspect the biggest problem is the one mentioned a couple times above, the
lack of data authentication on the original ad-hoc application. A good system
will have a separate table for "company", and when you add a new contact you
then select a company from a drop-down or something. Otherwise there's little
or no control over what sorts of gibberish people might enter under the
"company" field.
|
jmsaul
|
|
response 17 of 26:
|
Jul 11 12:35 UTC 2003 |
If any of you are Access experts, can you tell me how to change the behavior
of the wheel on the mouse, either within a specific form or within Access
itself?
|
polytarp
|
|
response 18 of 26:
|
Jul 11 12:46 UTC 2003 |
Wrong item, Saul.
|
gull
|
|
response 19 of 26:
|
Jul 11 14:15 UTC 2003 |
Re #6: Access is sneaky that way. It works all right as a shared file
database, using page locking, until you get more than a few people using
it concurrently. Then you get people getting locked out, and data
corruption. The sneaky part is this means it fails at right about the
time it starts to become really vital to the company. The idea is that
you'll then have to go out and buy Microsoft SQL Server as a back end.
Re #7: Oh, I get it. Our Commence RM sales database has the same data
consistancy problems. Some day it will have to be dealt with, but I
hate to think about it. It's also an absolute horror from a
normalization standpoint -- for example, whoever designed it put in not
only a "first name" and "last name" field, but also a "last,first"
field. Then they made the "last,first" field the "name" field for the
category (somewhat analogous to a primary key in an SQL database.) This
isn't an issue for manual entry, since the form fills out that field
after you enter the last name and first name, but it's a real pain when
they ask me to import stuff from CSV files. I have to write a script to
create a file with the extra field in it.
Re #10: We're using Access as the front end, still, with linked tables
to the MySQL server. That has its own painful aspects but it avoids
retraining our users and lets us keep the same functionality.
Eventually we may migrate to a web-based front end, but it's unlikely
because it'd be slower and more awkward for the users.
|
cross
|
|
response 20 of 26:
|
Jul 11 15:35 UTC 2003 |
This response has been erased.
|
jazz
|
|
response 21 of 26:
|
Jul 11 15:48 UTC 2003 |
Fortunately, it sounds like none of those databases had any binary
objects in them, that's where it gets really hairy, combined with things like
PHP's inclusion of a pg_escape_bytea function, but not a pg_unescape_bytea
function.
|
flem
|
|
response 22 of 26:
|
Jul 11 15:57 UTC 2003 |
re 17: Dealing with the mouse wheel in Access seems to be hard. Here's a
couple of links; googling gives plenty more.
http://support.microsoft.com/search/preview.aspx?scid=kb;en-us;Q278379
http://www.worship.co.za/BlackandWhiteInc/MouseWheel.htm
The second one seems more promising.
|
jmsaul
|
|
response 23 of 26:
|
Jul 11 22:24 UTC 2003 |
Cool, thanks!
|
russ
|
|
response 24 of 26:
|
Jul 12 06:27 UTC 2003 |
(I *love* drift that basically says "Microsoft is for losers".)
|