|
XUM32 and its implementation.
by Ilya O. Levin
This document briefly describes XUM32 hash calculation method. The main
purpose of implementing XUM32 is replacement of CRC32 somewhere CRC32 is
wrongly using as a MDC. CRC32 is an error detection function and cannot
protect against non-accidental file tamper. You can find all the necessary
details about it in [1].
XUM32 is a combination of two 32-bits functions CRC32 and ELF with a
data size confusion; a final result of XUM32 is also a 32-bits value.
CRC32 is one of well-known hash algorithm and it is no needs to explain and
describe it here. To whom it may concern however there is a document [2]
available. The table driven implementation of CRC32 in ANSI C is look like
below.
unsigned long crc32tbl[256]={0x0L, 0x77073096L, /* the rest of table skipped */}
#define INIT_CRC32(crc) {crc = 0xFFFFFFFFL;}
#define CALC_CRC32(crc,y) {crc = ((crc>>8)&0x00FFFFFFL)^crc32tbl[(crc^y)&0xFF];}
#define DONE_CRC32(crc) {crc = crc^0xFFFFFFFFL;}
:
int i;
unsigned long crc;
char buf[9216]; // data buffer
:
INIT_CRC(crc)
for (i=0;i<data_size;i++) CALC_CRC(crc, buf[i])
DONE_CRC(crc)
You can obtain a complete source of CRC32 implementation as a C include-file
at http://www.nattyware.com/bin/crc32h.zip.
ELF is a hash function used in the UNIX ELF format for object files. Its
ANSI C implementation is here as it were defined in [3]:
unsigned long ElfHash ( const unsigned char *name )
{
unsigned long h = 0, g;
while ( *name )
{
h = ( h << 4 ) + *name++;
if ( g = h & 0xF0000000 )
h ^= g >> 24;
h &= ~g;
}
return h;
}
Computing of XUM32 contains just two stages and is not a rocket science at all.
The first one is computing both CRC32 and ELF hash values:
#define HASHSIZE 997
static int M = HASHSIZE;
unsigned long I,elf=0, h=0, g=0;
:
INIT_CRC32(crc)
for(i=0; i<data_size; i++)
{
CALC_CRC32(crc, buf[i])
h = ( h << 4 ) + buf[i];
if (g = h & 0xF0000000L) h ^= g >> 24;
h &= ~g;
}
elf=(h % M)
DONE_CRC32(crc)
The second and final stage is combining these hash values and data size
together. The first step of this stage is rotating a data size value like this:
#define SWAPLONG(x) x=(x<<16)+(x>>16)
SWAPLONG(data_size);
This rotation is to be sure a data size value remains in 32-bits
for a large data amount.
The next step is doing a confusion. The original XUM32 implementation is using
a XOR operation, however ADD if necessary may replace it.
xum32=data_size;
xum32^=crc;
xum32^=elf;
The whole implementation of XUM32 in ANSI C will look like follows:
#include <fcntl.h>
#include <sys/stat.h>
#include <stdlib.h>
unsigned long crc32tbl[256]={0x0L, 0x77073096L, /* the rest of table skipped */}
#define INIT_CRC32(crc) {crc = 0xFFFFFFFFL;}
#define CALC_CRC32(crc,y) {crc = ((crc>>8)&0x00FFFFFFL)^crc32tbl[(crc^y)&0xFF];}
#define DONE_CRC32(crc) {crc = crc^0xFFFFFFFFL;}
#define SWAPLONG(x) x=(x<<16)+(x>>16)
#define HASHSIZE 997
int main(int argc, char *argv[])
{
char buf[32000];
long rc, fhn;
unsigned long crc, i, xum32=0, elf=0, h=0, g=0;
static int M = HASHSIZE;
if (argc<2) return (1);
fhn=open(argv[1], O_RDONLY|O_BINARY,0);
if (fhn!=-1)
{
printf("%s - ", argv[1]);
INIT_CRC32(crc);
while (!eof(fhn))
{
rc=read(fhn, &buf, sizeof(buf));
for(i=0; i<rc; i++)
{ CALC_CRC32(crc, buf[i]);
h = ( h << 4 ) + buf[i];
if (g = h & 0xF0000000L) h ^= g >> 24;
h &= ~g;
}
elf+=(h % M);
xum32+=rc;
}
DONE_CRC32(crc);
close(fhn);
SWAPLONG(xum32);
xum32^=crc;
xum32^=elf;
printf("%08X\n", xum32);
}
return 0;
}
Quite simply, isn't it? This is the original XUM32 implementation as it were
developed. You may vary a XUM32 sensibility by changing a buffer size
from 32000 bytes to a smaller one or by replacing confusing XOR with ADD.
However please note after it you'll receive different results against
an original implementation.
As you can see XUM32 can easily be used instead of CRC32 because it requires
a minor modification of source code. It does not requires a whole re-design
of algorithm as you should to do shall you replace CRC32 to other one such
as MD5, etc. So, if you're using CRC32 as modification detection functions
then stop it right now. Use XUM32 instead - it'll cost you nothing but
better reliable validation and safety.
References:
[1] "CRC and how to reverse it." Anarchriz/DREAD
[2] "A painless guide to CRC error detection algorithm." Ross N. Williams
[3] "Hashing Rehashed." Andrew Binstock, Dr. Dobb's Journal, APR96
|
|
Home
Products:
Pixie
WinJanitor
XUM32
Support
Contacts
|