You are not logged in. Login Now
 0-6          
 
Author Message
remmers
Superficial C/Java Comparison Mark Unseen   Oct 18 15:56 UTC 1999

 I'm bored, so I think I'll post the following: 

 Here's a C program to count the number of whitespace characters in
 standard input:
 
        #include <stdio.h>
        #include <ctype.h>
        
        int main(void)
        {
          int c;
          int sp = 0;
          while ((c = getchar()) != EOF)
            if (isspace(c)) ++sp;
          printf("spaces: %d\n", sp);
        }
 
 
 Here's a Java program to do the same:
 
        import java.io.*;
        class JavaWhite {
            public static void main(String[] args)
            throws IOException {
                BufferedInputStream in = 
                    new BufferedInputStream(System.in, 1024);
                int c;
                int sp = 0;
                while ((c = in.read()) != -1)
                    if (Character.isWhitespace((char)c)) ++sp;
                System.out.println("spaces: " + sp);
            }
        }
 
 Running the C program on a 1Mb file and timing it with /bin/time gives
 the result:
 
 real        0.1
 user        0.1
 sys         0.0
 
 Reading the same input file, the Java program's times are:
 
 real        0.9
 user        0.8
 sys         0.1
 
 Comments?

6 responses total.
scott
response 1 of 6: Mark Unseen   Oct 18 19:32 UTC 1999

Interesting item for the Consumer conference.

Is this a compiled Java program?  
i
response 2 of 6: Mark Unseen   Oct 19 00:57 UTC 1999

My first thought was that the c program should enjoy a much larger speed
advantage over the java.  A peek at the c code shows why it's "slow" - 
it's optimized for source code compactness & readability.  (Though 
counting "whitespace characters" (there's several of 'em) instead of
just spaces makes it much slower on 'most any general-purpose CPU.)
remmers
response 3 of 6: Mark Unseen   Oct 20 18:52 UTC 1999

(Opps!!  I meant to post this in "jellyware", not "consumer".
Well, feel free to comment anyway. You might turn this into a
discussion the relative benefits of C and Java for consumers...)

<remmers blushes>

Out of curiosity, how would you make the C version faster?
i
response 4 of 6: Mark Unseen   Oct 23 04:38 UTC 1999

<i peeks at a few things & plays around>
It looks like the local version of C takes care of the first three
things i suspected might drag down your code's speed - overhead of
calling getchar(), overhead of calling isspace(), and non-lookup-
table logic in isspace().  That leaves the overhead of reading one
character at a time from stdin and (minor) bit-fiddling in the local
isspace()'s lookup table.  

Running your code against /vmunix, i got times of 1.3/1.2/0.1 by 
compiling with gcc -O2.  Hacking it to take input in large blocks
(with fread), use its own isspace() lookup table, and unrolling the 
innermost loop a bit, i got times of 0.7/0.5/0.1 (same gcc -O2).

Though my hacks double the speed, it's still managing less than
3MB/sec.  I'm not really impressed with the assembly code that 
gcc puts out, but i doubt that even hand-crafted assembly code
would get much more than another 2x speed-up.  Oh, well - i was
(wrongly) assuming grex's CPU's rated a few more MIPS than that. 
remmers
response 5 of 6: Mark Unseen   Oct 27 17:38 UTC 1999

Hm, that was some nice optimizing you did.

I entered a version of this item in the place I originally intended
for it -- the "jellyware" conference.  Discussion there is heading
off in a different direction, though.
i
response 6 of 6: Mark Unseen   Sep 3 02:44 UTC 2001

I briefly played with this again.  My idea was to build an array

unsigned char is16space[256*256],

initialize it to 

is16space[i] =  2 if isspace[i>>8] and isspace[i&0xFF],
                1 if      "        xor      "         ,
                0 otherwise,

and use that to sum over the input buffer twice as fast (by doing the
sum 2 bytes at a time vs. 1 at a time).  (Yes, may be *slower* if the
effective L1 data cache size isn't fairly large.)  My quick & dirty 
attempt abused typing more than gcc could stand, however, and i'm not
feeling interested enough tonight spend more time. 

One of these years i'll learn enough of the local assembly language to
do this "right".... 
 0-6          
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss