Maresware computer forensics, extracting data article

DATABASE RECORD EXTRACTION

Detective Bill Moylan,
Nassau County, NY Police Dept
Computer Crime Section

In December 1998, I posted a question to the IACIS list requesting help with sorting and extracting data from a large database. I received several suggestions that unfortunately did not work out, and I struggled with SQL queries until I was cross-eyed. The timing of my posting was faulty, however, because it was right around that time that Dan Mares was busy retiring from Civil Service and entering the private arena, and he missed my first request for assistance. When I had occasion to speak to Dan directly, he was able to describe a procedure using some Maresware programs that accomplished exactly what I wanted to do. I have used the procedure several times since, and decided to write it down because I cannot easily call it to mind when I need it (Alzheimer’s, I guess).

The problem involved an 8-mb master database file, containing business records. I wanted to extract only the records of transactions in a set of 1100 zip codes. The database contained 7,000 records, and the zip codes were not contiguous, so that any kind of Excel sort was not helpful. I was able to export the database to a DBASE type file, which is a FIXED-LENGTH RECORD type database file, which is necessary for this procedure.

The first step was to isolate the data, by removing the first part of the file, which contained the database header, leaving the actual data records in the file. This was accomplished using Maresware Filsplit. To extract the data, Filsplit is run using parameters for the offset value (-d) (number of bytes to pass before the start of data), the record length, (-r) and the number of bytes to copy (This parameter is not exact, the value in this case should exceed the total number of bytes in the file). In order to find the record length, I used Maresware Hexdump to display the data in HEX mode, and determined the structure of each record, including the record length, and also the position within the record of the zip code field

Filsplit input.fil output.fil –r 1545 -d 286 –D 9000000

Where

–r = record length
–d = displacement to start of data
-D = number of bytes to copy

Once the data is isolated, it must be sorted on any one field. In this case I used the first field, which contained the customer number. I imported the data into Excel, and sorted on the first field, saving the file as a text file. This file MUST be properly sorted in order for the remaining processes to work correctly. In order to check that the file is properly sorted, I ran Maresware Sortchek, inputting the record length, the position in the record of the first character of the sort field and the number of characters to check.

Sortchek input.fil –r 1545 –p 0 –l 6

Where

–r = record length
-p = pass this many characters before comparing (in this case, 0 bytes)
-l = number of characters to compare (in this case, 6 character customer number)

When executed this program will display the number of records being checked, and will show if the records are properly sorted. It will display the location of the first missorted record, if any.

Once the file is properly sorted. The actual extraction can be done. The program used for the extraction is Maresware Search, and is executed using a parameter file to hold the option values for the extraction. In this case a parameter file "param.txt" was prepared using the following values. (They must be entered in this order)

15450 (number of bytes to work on at a time, it is a multiple of record length)
1545 (record length)
265 (location in record of first character to compare{zip code field})
5 (length of field to be compared {zip code=5)
11101
12802
Another
Another
Another
Once the parameter file is properly prepared the command line is simply:

SEARCH input.fil output.fil param.fil
The parameter file contained the above values and the 1100+ zip codes (sorted ascending) to be searched for. This program jumped to the location in each record to check, compared the next 5 bytes against each of the 1100+ zip codes, and if a match was found, exported the entire record to the output file. The result was a database file containing only the records pertaining to the zip codes that I was interested in(about 750 records). Once this file was extracted, the Filsplit program was used once again, this time to copy only the first 286 bytes of data containing the header information. That resulting 286 byte file was replaced on the new data file using DOS

COPY hdr.fil + data.fil final.fil.

Now the extracted file was in DBASE format and could be imported back into the original program, and could be viewed, and records and invoices printed in their original format.

This procedure is only one of the ways that these programs can be used to manipulate large data records. The .pdf files that accompany Maresware will explain each of these (and other) program utilities in detail. I have dealt recently with files that are just to large to be handled by any spreadsheet or other kinds of Windows programs, and must be handled by specialized tools in a DOS environment. I have had some of these Maresware programs in my possession for years, and never realized how powerful these tools are. I have purchased the new commercial version of the Maresware utilities, and recommend that they be made a part of everyone’s software toolbox.

My thanks to Dan Mares, who patiently explained to me (several times) the concepts and procedures needed to solve this problem. I have used the same procedure several times since he first explained it to me and each time have been impressed with the speed and accuracy of the results.