At one point I dealt with DBF files. A lot of them. In order to process them faster, I wrote a couple tools that others may enjoy. They are GPL v3 and I don't really support them any longer. I may help out if you ask nicely though!
This program's misison in life is to aggregate data so you have fewer records to parse. It can sum, produce an average, and do much more. You pass it the input file (or it can read from standard input), the name of the script to use, and the result database file that should be created.
The magic is primarily in the script file. You list the different fields in the result dbf file, and only certain data types are allowed for some functions. The script file is not case sensitive. It also does not do a lot of checking of field names, so make sure to not use the same name twice in your results.
This list is a complete list of functions that you can use in your script file. The function name is followed by what data types it works with. For more information, see the scripts.txt file in the archive.
The aggregation process can rename and resize fields. Each line in the script file should follow one of these formats:
I had to aggregate several databases. They were so large that they were first broken into chunks, then aggregated. The down side is that I didn't have a tool to combine them together for further aggregation. That is, I didn't have a tool until I wrote dbfcat.
dbfcat will open and read several dbf files and write out a single, combined dbf file. You can easily go beyond the 2 gb limit that lots of dbf software has, so keep that in mind.
dbfcat is also linked against libz, letting you read directly from a gzipped dbf file (*.dbf.gz), saving you disk space and actually speeding up the process. It's faster because you are not waiting for the disk reads and potentially the network traffic in order to get the information.
The merged .dbf file is written to stdout, so you can pipe that into a file or into gzip to recompress it and save the file somewhere.