Grex Agora46 Conference

Item 67: Buggy code has already sent people to jail!

Entered by russ on Wed Jul 9 02:15:11 2003:

...
Some malfunctions in SEVIS have distorted students' files,
mysteriously replacing or deleting various pieces of information.
Students can be deported if their records aren't in order.  Already,
some have been held overnight in jails or kept from re-entering the
United States because their records were misplaced or because the
system garbled them.

...

"Oh, you're not going to believe what it's doing now," said Catheryn
Cotten, the director of the International Office at Duke University,
to the SEVIS help desk technician on the phone as she stared in amazement
at her computer screen.  Cotten had logged into the system to work on
some foreign exchange students' records.  But when she clicked a link
to access them, SEVIS told her there were 1,800 exchange students
still awaiting entry.  Cotten was stunned. 


"I don't have that many [full-time foreign students] at Duke, let
alone in my exchange program," she says, recounting the event.
Somehow, Cotten had accessed the records of a summer camp program
at another school that trains foreigners to be counselors.  The
help desk operator wasn't surprised.  This had happened before; so
many times, in fact, that the glitch had been given a name: "bleeding."
Several schools have seen their student records mixed up with those
of other institutions in the past few months.  A student record from
Michigan State University, for example, was routed to a printer at
a college in Arizona.  In other instances, pieces of data are jumbled.
Officials at the University of Texas in Austin reported that a student’s
last name was listed as an address, one that didn’t belong to any student
in the school’s records.

http://www.govexec.com/dailyfed/0703/070303h1.htm
26 responses total.

#1 of 26 by xi on Wed Jul 9 03:59:37 2003:

hm.. i think a couple of guys from my old job probably wrote that piece of
shit /if the story is true, of course/


#2 of 26 by flem on Wed Jul 9 15:14:43 2003:

Heh.  My current job consists mainly of converting data from a wide variety
of database sources into a format compatible with my company's database.  I
get to look at raw data in lots of database schema.  I'm constantly amazed
by just how screwed up these databases can get, even when the design is
fundamentally sound (rare), simply by the garbage that users put in 
them.  

Thank god I don't have any important parts of my life controlled by 
databases.  </sarcasm>


#3 of 26 by janc on Wed Jul 9 16:10:25 2003:

I've had similar jobs, where some client comes with a database that they have
been maintaining manually with something like Microsoft Access and they want
to put it into an on-line database.  I've never once had a database that
wasn't corrupt to the point of being virtually useless.  Mostly it ends up
being a job that has to be done manually, line-by-line.


#4 of 26 by other on Wed Jul 9 20:02:57 2003:

Timely item.  I was just talking with a non-profit organization's staff 
about long range planning regarding their database.

Right now, they are running a FileMaker database with about 30,000 
records, being served from a machine running Mac OS X Server and 
optimized as a database server.

My primary suggestion was that they run two development processes 
concurrently.  One process is the ongoing implementation of small scale 
changes which improve the functionality of the current system on an as-
needed and as-available-to-do-it basis.  The other process is the setting 
aside of funds on an annual basis for development and implementation of a 
whole new system designed to meet their worst-case (heaviest) usage needs 
for at least five years beyond implementation.

The second process requires some research in order to determine several 
things:  precise needs staff have for database functionality; the 
differences between those needs and the needs the current system can 
meet; what products best suit their circumstances and identified needs -- 
with room to grow -- and; what costs are likely to be required to employ 
those products.

Can some of you with expertise in database development and implementation 
suggest resources for the latter elements of this research?

Key considerations:  The office desktops currently run Mac OS 9, and 
probably will not be changing that for at least a year or two.  When they 
do change, the likely next step will be Mac OS 10.x (x = whatever is 
current at the time).  The database server will not likely change to a 
different OS, just upgrade to newer versions of the current one.


#5 of 26 by gull on Thu Jul 10 13:50:43 2003:

I had to move an Access database to a MySQL server, a while back.  It
was the company's parts database, used in generating bills of material
for construction jobs.  It was a mess...they'd been running it as an
Access shared file data base for years until it finally got to the point
where it would become corrupted within hours of being restored from
backup and couldn't be repaired in a way that would last.  Fortunately
it was a fairly simple database structure, and I was able to salvage
most of the data from the backup copy without too much manual work.  The
hard part was sorting out a bunch of duplicate part numbers that had
somehow sneaked in, even though Access was set to disallow them.  Oh,
and reworking all the forms and reports, since Access allows identifiers
with spaces and MySQL doesn't.


#6 of 26 by polygon on Thu Jul 10 14:34:29 2003:

The insidious data corruption problem is the biggest reason I'm glad I'm
not using Microsoft Access.


#7 of 26 by janc on Thu Jul 10 15:36:12 2003:

The specific problems I've mostly had resulted from the fact that they wanted
to change the structure of the database.  For example, you'd have a list of
contacts - names of people at various companies or government agencies with
addresses, phone numbers and fax numbers.  This would be set up as a single
table, one row for each person, with most of the fields being just plain text
fields.

Now they want to merge it into a more "grown up" system that is used for lots
of other things too.  You might have a separate tables for companies, and
people would be linked to their companies.  Simple enough in theory.

In practice when you look at the database, you find that hardly any
company name was spelled the same twice.  So "Greenfield", "Greenfield
Industries", "Greenfield Ind.", "Greenfield Industries, Inc." and "GI"
are all in the database, and by the way, did you know that "RTW" and
"Kennametal Industrial Products Group" are the same company as "Greenfield
Industries"?  There are multiple duplicates for what appear to be the same
person with different contact info.  Addresses give the state name as "MI"
"Mich" "Mich."  "Michigan" or "Micigan".  Phone numbers are formated in
dozens of different ways, some missing area codes.

This isn't database corruption in the technical sense - the database software
is doing a perfectly good job of keeping the data that was put into it.
But it makes for a substantially less useful database.  You can't readily
call up all contacts for one company, or all contacts in one state.

There is no way to clean up a database like that short of having some person
go through manaully and fix up each record.  Some of it can be done by any
random dweeb with common sense, but much would be better done by someone
with a lot of knowledge of the folks in the database - which usually means
someone who's time is scarce and expensive.

Really, the way to have a good database is to start enforcing as much
consistancy on it as you can from the get go.  But that never happens in the
real world.  When organization are getting going they have other things on
their minds.


#8 of 26 by tod on Thu Jul 10 15:57:41 2003:

This response has been erased.



#9 of 26 by cross on Thu Jul 10 16:41:37 2003:

This response has been erased.



#10 of 26 by tod on Thu Jul 10 17:10:59 2003:

This response has been erased.



#11 of 26 by flem on Thu Jul 10 17:54:20 2003:

It's been a long time since I read the definitions, but ISTR that most real
world databases stick to level 3 or 4 normalization.  Higher than that is just
more PITA than it's worth.  

SQL is a query language.  There is an ANSI standard.  Like all languages, 
it has as many dialects as implementations.  MS Access uses a dialect 
called MS Jet SQL.  It has some interesting features, and some pretty lame 
features, but it generally seems pretty feature-complete to me.  I assume
Tod is talking about either MySQL or MS SQL Server, both of which use 
substantially different dialects of SQL.  

Jan is dead on target about the practical problems that plague databases.
I'd also add the fact taht as databases evolve and as employees come 
and go, the way they get used changes substantially.  Maiden names are an
example; for some reason people always want to know what a woman's maiden
name is but never put a field in their schema for it.  So one user might
enter it as part of their last name, like "Jones(Smith)".  Someone else
might just stuff it into the middle name field.  A third person might come
along and get fed up enough to add a maiden name field to some table, and
start using it, but never get around to telling the first two people.  


#12 of 26 by jazz on Thu Jul 10 18:12:00 2003:

        The big problem I see is that databases aren't properly engineered
from the start, and in order to fix that problem, they aren't re-engineered,
but rather patched.


#13 of 26 by flem on Thu Jul 10 19:15:22 2003:

Yeah, that's certainly a problem, but it's not quite the same problem as the
fact that users are more interested in having their data be convenient for
them than in whether or not it's in the correct field in the right table. 
There's no engineering process in the world that can deal with the fact that
if Joe User wants to see the Part Number on the Inventory screen instead
of clicking through to the Parts Definition screen, he'll cheerfully get 
used to typing it into the Part Name field instead of where
it belongs, rather than submitting a change request and waiting some 
indefinite length of time for it to be "fixed".  Maybe, if he's 
consciencious, he'll type it into both fields, but he'll probably spell 
it wrong 10% of the time.  


#14 of 26 by jazz on Thu Jul 10 21:43:59 2003:

        Pre-entry validation works pretty well for that, assuming the numbers
are sufficiently different in scope.


#15 of 26 by slestak on Fri Jul 11 06:01:06 2003:

An MS Access database can be fairly easily migrated to MySQL by converting
the Access DB into a comma delimited file from within MS Access. Once you have
a comma delimited text file,(pun) you can use a text editor to remove pesky
space problems without risking damage to the original DB. Once these steps
are completed, you can import the comma delimited file into MySQL using an
import command:  "shell> mysqlimport [options] database textfile1 [textfile2
...]" This command can be used for each table in your new MySQL DB. If running
Mac OS X, MySQL,Apache2 and PHP can be installed and configured with minor
head aches. Once these components are in place, drop a *.php file containing
"<php? phpinfo() ?> into the root directory of the Mac OS X webserver and call
the file up in a web browser. If you get a long purple page with the operating
system finger print,MySQL info,Apache2 installation details etc, your on your
way. PHP allows for rapid development of web page interfaces to MySQL and
other DB's. You can serve up your DB to users on the lan, localhost, or the
internet via Apache2. If your in a big hurry, you can use several "form
generators",written in PHP to create test forms that look nasty and allow you
to test and develop further without having to recode a bunch. I used these
methods to create a MySQL DB containing 485,000 / 20+ table entries with funky
old FAA MS Access garbage DB's. The end result was up and serving the web in
one day. I apologize to all for the rip and slash posting...I gotta run...If
you want help, I'm bored and you can email me on Grex. I dig Grex.


#16 of 26 by scott on Fri Jul 11 06:39:59 2003:

I suspect the biggest problem is the one mentioned a couple times above, the
lack of data authentication on the original ad-hoc application.  A good system
will have a separate table for "company", and when you add a new contact you
then select a company from a drop-down or something.  Otherwise there's little
or no control over what sorts of gibberish people might enter under the
"company" field.


#17 of 26 by jmsaul on Fri Jul 11 12:35:22 2003:

If any of you are Access experts, can you tell me how to change the behavior
of the wheel on the mouse, either within a specific form or within Access
itself?


#18 of 26 by polytarp on Fri Jul 11 12:46:23 2003:

Wrong item, Saul.


#19 of 26 by gull on Fri Jul 11 14:15:03 2003:

Re #6: Access is sneaky that way.  It works all right as a shared file
database, using page locking, until you get more than a few people using
it concurrently.  Then you get people getting locked out, and data
corruption.  The sneaky part is this means it fails at right about the
time it starts to become really vital to the company.  The idea is that
you'll then have to go out and buy Microsoft SQL Server as a back end.

Re #7: Oh, I get it.  Our Commence RM sales database has the same data
consistancy problems.  Some day it will have to be dealt with, but I
hate to think about it.  It's also an absolute horror from a
normalization standpoint -- for example, whoever designed it put in not
only a "first name" and "last name" field, but also a "last,first"
field.  Then they made the "last,first" field the "name" field for the
category (somewhat analogous to a primary key in an SQL database.)  This
isn't an issue for manual entry, since the form fills out that field
after you enter the last name and first name, but it's a real pain when
they ask me to import stuff from CSV files.  I have to write a script to
create a file with the extra field in it.

Re #10: We're using Access as the front end, still, with linked tables
to the MySQL server.  That has its own painful aspects but it avoids
retraining our users and lets us keep the same functionality. 
Eventually we may migrate to a web-based front end, but it's unlikely
because it'd be slower and more awkward for the users.


#20 of 26 by cross on Fri Jul 11 15:35:04 2003:

This response has been erased.



#21 of 26 by jazz on Fri Jul 11 15:48:43 2003:

        Fortunately, it sounds like none of those databases had any binary
objects in them, that's where it gets really hairy, combined with things like
PHP's inclusion of a pg_escape_bytea function, but not a pg_unescape_bytea
function.


#22 of 26 by flem on Fri Jul 11 15:57:22 2003:

re 17:  Dealing with the mouse wheel in Access seems to be hard.  Here's a
couple of links; googling gives plenty more.  

http://support.microsoft.com/search/preview.aspx?scid=kb;en-us;Q278379
http://www.worship.co.za/BlackandWhiteInc/MouseWheel.htm

The second one seems more promising.  


#23 of 26 by jmsaul on Fri Jul 11 22:24:42 2003:

Cool, thanks!


#24 of 26 by russ on Sat Jul 12 06:27:21 2003:

(I *love* drift that basically says "Microsoft is for losers".)


#25 of 26 by slestak on Sun Jul 13 00:50:07 2003:

Being an absolute beginner in all of the languages I am attempting to learn
including *DB* syntax and scripting,one reality always appears at the worst
possible moment,"coding standards".Standardization seems to be a dirty word
to big proprietary software companies.The evidence being in their careless
cretion of their "own standards" for everything they build.As said above,this
behavior leads to the purchase of additional products that will supposedly
fix a problem or limitation in previous products or releases.The almighty
dollar has spoken in the Microsoft antitrust cases now settled.One penalty
being that Microsoft must give away hardware and software to schools at
reduced cost or for "free".This "penalty" will create even more "need" for
MS products and further a bigger MS monopoly.

Perhaps because of the continuous migration of "hard copy" information to
electronic data dump,we should consider better standards for electing
lawmakers.Lawmakers with the capacity for and interest in these electronic
solutions we are placing such faith in.Maybe a special "technology awareness
test"?I feel as though I might sleep a bit better at night knowing that nobody
will be passing a new "superbill" limiting our right to communicate or read
words.

I suppose boycotting and refusing to work on projects requiring the use of
certain products is the only recourse.That may not make a difference either.

Oh well *DB to *DB fixer-upper-tools....big money comming right up.


#26 of 26 by pvn on Tue Jul 15 07:25:39 2003:

http://c:\aux

I rest my case.



There are no more items selected.

You have several choices: