DBF Tools

At one point I dealt with DBF files. A lot of them. In order to process them faster, I wrote a couple tools that others may enjoy. They are GPL v3 and I don't really support them any longer. I may help out if you ask nicely though!


Download: dbfagg.zip

This program's misison in life is to aggregate data so you have fewer records to parse. It can sum, produce an average, and do much more. You pass it the input file (or it can read from standard input), the name of the script to use, and the result database file that should be created.

The magic is primarily in the script file. You list the different fields in the result dbf file, and only certain data types are allowed for some functions. The script file is not case sensitive. It also does not do a lot of checking of field names, so make sure to not use the same name twice in your results.

This list is a complete list of functions that you can use in your script file. The function name is followed by what data types it works with. For more information, see the scripts.txt file in the archive.

KEY (C, N)
When any key field changes, one record is written to the results. For best results, you will want your key fields grouped together in the database. You must have at least one KEY in your script. All of the other functions are optional.
This acts like another grouping, similar to key, but the groups are not written out to disk when a SUBKEY changes. The information just keeps gathering until a KEY field changes. Multiple SUBKEYS are allowed.
Stores the largest value encountered for the group. Numeric fields only.
Stores the smallest value encountered for the group. Numeric fields only.
Stores the summed values for the group. Numeric fields only.
Stores the median value (summed values divided by the number of values). This works only with numeric fields.
Stores the value that was encountered the most times in the group.
Returns the number of records that were grouped together as a new field in the resulting table. Creates a N(9) field, so have no more than 999,999,999 records in a single group (but you can specify a different size if you like).

The aggregation process can rename and resize fields. Each line in the script file should follow one of these formats:

The name of the function, as detailed above.
The name of the field in the input table
OUTPUT_FIELD (optional)
The name that the resulting field should be called in the output file. If not specified, it defaults to INPUT_FIELD
The size the field should be in the resulting table. If you are summing values and you might need more digits, you should specify this value.
The new precision of the output field. Useful for averaging and showing additional digits. If you make OUTPUT_FIELD_PREC larger, you should probably make OUTPUT_FIELD_LEN longer as well. Only use this with numeric fields.


Download: dbfcat.zip

I had to aggregate several databases. They were so large that they were first broken into chunks, then aggregated. The down side is that I didn't have a tool to combine them together for further aggregation. That is, I didn't have a tool until I wrote dbfcat.

dbfcat will open and read several dbf files and write out a single, combined dbf file. You can easily go beyond the 2 gb limit that lots of dbf software has, so keep that in mind.

dbfcat is also linked against libz, letting you read directly from a gzipped dbf file (*.dbf.gz), saving you disk space and actually speeding up the process. It's faster because you are not waiting for the disk reads and potentially the network traffic in order to get the information.

The merged .dbf file is written to stdout, so you can pipe that into a file or into gzip to recompress it and save the file somewhere.

The term 'disc jockey' was coined in Variety back in 1937, describing radio announcers who stayed up all night riding records. Tyler Akins <>
Legal Info