XUM32 and its implementation.
by Ilya O. Levin

This document briefly describes XUM32 hash calculation method. The main purpose of implementing XUM32 is replacement of CRC32 somewhere CRC32 is wrongly using as a MDC. CRC32 is an error detection function and cannot protect against non-accidental file tamper. You can find all the necessary details about it in [1].

XUM32 is a combination of two 32-bits functions CRC32 and ELF with a data size confusion; a final result of XUM32 is also a 32-bits value.

CRC32 is one of well-known hash algorithm and it is no needs to explain and describe it here. To whom it may concern however there is a document [2] available. The table driven implementation of CRC32 in ANSI C is look like below.
unsigned long crc32tbl[256]={0x0L, 0x77073096L, /* the rest of table skipped */}
#define INIT_CRC32(crc)   {crc = 0xFFFFFFFFL;}
#define CALC_CRC32(crc,y) {crc = ((crc>>8)&0x00FFFFFFL)^crc32tbl[(crc^y)&0xFF];}
#define DONE_CRC32(crc)   {crc = crc^0xFFFFFFFFL;}
:
int i;
unsigned long crc;  
char buf[9216];    // data buffer
:
INIT_CRC(crc)
for (i=0;i<data_size;i++) CALC_CRC(crc, buf[i])
DONE_CRC(crc)
You can obtain a complete source of CRC32 implementation as a C include-file at http://www.nattyware.com/bin/crc32h.zip.

ELF is a hash function used in the UNIX ELF format for object files. Its ANSI C implementation is here as it were defined in [3]:
unsigned long ElfHash ( const unsigned char *name )
{
    unsigned long   h = 0, g;
    while ( *name )
    {
        h = ( h << 4 ) + *name++;
        if ( g = h & 0xF0000000 )
            h ^= g >> 24;
        h &= ~g;
    }
    return h;
}
Computing of XUM32 contains just two stages and is not a rocket science at all. The first one is computing both CRC32 and ELF hash values:
#define HASHSIZE 997
static int M = HASHSIZE;
unsigned long I,elf=0, h=0, g=0;
:
INIT_CRC32(crc)
for(i=0; i<data_size; i++)
{  
    CALC_CRC32(crc, buf[i])
    h = ( h << 4 ) + buf[i];
    if (g = h & 0xF0000000L) h ^= g >> 24;
    h &= ~g;
}
elf=(h % M)
DONE_CRC32(crc)
The second and final stage is combining these hash values and data size together. The first step of this stage is rotating a data size value like this:
#define SWAPLONG(x)  x=(x<<16)+(x>>16)
SWAPLONG(data_size);
This rotation is to be sure a data size value remains in 32-bits for a large data amount.

The next step is doing a confusion. The original XUM32 implementation is using a XOR operation, however ADD if necessary may replace it.
xum32=data_size; 
xum32^=crc; 
xum32^=elf;
The whole implementation of XUM32 in ANSI C will look like follows:
#include <fcntl.h>
#include <sys/stat.h>
#include <stdlib.h>

unsigned long crc32tbl[256]={0x0L, 0x77073096L, /* the rest of table skipped */}
#define INIT_CRC32(crc)   {crc = 0xFFFFFFFFL;}
#define CALC_CRC32(crc,y) {crc = ((crc>>8)&0x00FFFFFFL)^crc32tbl[(crc^y)&0xFF];}
#define DONE_CRC32(crc)   {crc = crc^0xFFFFFFFFL;}
#define SWAPLONG(x)  x=(x<<16)+(x>>16)
#define HASHSIZE 997

int main(int argc, char *argv[])
{
  char buf[32000];
  long rc, fhn;
  unsigned long crc, i, xum32=0, elf=0, h=0, g=0;
  static int M = HASHSIZE;

  if (argc<2) return (1);

  fhn=open(argv[1], O_RDONLY|O_BINARY,0);
  if (fhn!=-1) 
  { 
    printf("%s - ", argv[1]);
    INIT_CRC32(crc);
    while (!eof(fhn))
    { 
       rc=read(fhn, &buf, sizeof(buf));
      for(i=0; i<rc; i++)
      { CALC_CRC32(crc, buf[i]);
        h = ( h << 4 ) + buf[i];
        if (g = h & 0xF0000000L) h ^= g >> 24;
        h &= ~g;
      }
      elf+=(h % M);
      xum32+=rc;
    }
    DONE_CRC32(crc);
    close(fhn);
    SWAPLONG(xum32);
    xum32^=crc;
    xum32^=elf;
    printf("%08X\n", xum32);
  }
  return 0;
}
Quite simply, isn't it? This is the original XUM32 implementation as it were developed. You may vary a XUM32 sensibility by changing a buffer size from 32000 bytes to a smaller one or by replacing confusing XOR with ADD. However please note after it you'll receive different results against an original implementation.

As you can see XUM32 can easily be used instead of CRC32 because it requires a minor modification of source code. It does not requires a whole re-design of algorithm as you should to do shall you replace CRC32 to other one such as MD5, etc. So, if you're using CRC32 as modification detection functions then stop it right now. Use XUM32 instead - it'll cost you nothing but better reliable validation and safety.


References:
[1] "CRC and how to reverse it." Anarchriz/DREAD
[2] "A painless guide to CRC error detection algorithm." Ross N. Williams
[3] "Hashing Rehashed." Andrew Binstock, Dr. Dobb's Journal, APR96
  Home
Products:
     Pixie
     WinJanitor
     XUM32
Support
Contacts

 Copyright © Nattyware, 2000-2004  All rights reserved.