Database Format
Search:

This page will give you a detailed, extra geeky definition of the format of the Palm OS database that D&D Helper uses. If you are not a computer nut, turn back!

General Database Format

Below is the format for the general database. Records are numbered sequentially, starting at 0. The database has a "header" record (#0), then is followed by different sections that contain the data.

  • Record 0: Database header
    • Version [2 bytes]: Version number of database. Currently 1. Version 0 was depreciated.
    • Flags [2 bytes]: Flags for the database.
      • 0x0001 : Generate multiple entries (good for word generators)
    • Section Record Counts [2 bytes per section]: Number of records for each section. Minimum of 1.
  • Records 1-n: Sections (one or more of the following)
    • Random Entry
    • Letter Pairing
    • Phrase-Structure Rule Grammar

Random Entry section

This type of section is a "pick one of the following" type of list. Good for lists of riddles, random situations, etc.

  • First Record: Header
    • Type [2 bytes]: 0
    • Flags [2 bytes]:
      • 0x0001 : Use random chances (see Chances below)
    • Items [2 bytes]: Number of items in this section. This shouldn't match the number of records unless you only put one entry in each record.
  • All the rest in the section: Data records
    • Number of entries in this record [2 bytes]: Multiple entries can be combined into a single record, saving space and making the database much faster when transferring to the Palm. This works better with smaller data, such as wordlists, but doesn't really lose anything with larger records.
    • Chance and Offset Array [2 or 4 bytes per entry]:
      • Chance [2 bytes]: The chance for this entry. If chances are not used, these two bytes are omitted from the record. Entries are sorted in ascending chance order. (see Chances below)
      • Offset [2 bytes]: The offset from the beginning of the record where this entry starts. The entry must be terminated with a NULL.
    • Entries: Each entry is a null-terminated string. This contains the riddle, situation, word, or whatever data the entry contains. Max size is unknown, but is probably limited by the OS to whatever you can fit in a single record, which is a little under 64k.

Letter Pairs section

One method of building words is to analyze a list of words from a language and figuring out what two letters start words. Then you see what letters possibly follow those two letters and add the third letter. You repeat until you end the word by randomly picking two letters at the end of the word or until a maximum word length is reached.

It is strongly suggested that you use random chances with this method of word creation. One good example of why is in the English language, the letter Q is almost always followed by U. So, let's assume that our rules have Q always followed by U. With chances enabled, you'll see the letter U after every Q unless the word is just too long and had to be stopped. Without chances in the database, the letter U has a 50% chance of showing up. If U is not picked, the word will be ended. This is not exactly what people would like. However, adding chance data increases the database size a lot (over doubles the size), but produces much better results.

  • First Record: Header
    • Type [2 bytes]: 1
    • Flags [2 bytes]:
      • 0x0001 : Use random chances (see Chances below)
    • Maximum Length: The maximum length of the word. Note: I'd like to have the length taper instead of just snipping it right now.
    • Starting Pairs: (one or more of these)
      • Chance [2 bytes]: The chance that this letter pair is used. If not using random chances, these two bytes are omitted from the record. (see Chances below)
      • Starting pair [2 bytes]: Two letters that words could start with.
  • All the rest in the section: Data records
    • Starting Letter [1 byte]: The first letter of the pair.
    • Entries [1 byte]: Number of entries of letters that can follow the starting letter.
    • Second Letter Definition: (one of these per Entries)
      • Second letter [1 byte]: The second letter of the pair
      • Third letter options [1 byte]: Number of potential third letters that follow the letter pair.
    • Third Letter Definition: (one of these per Third letter options per Entries):
      • Chance [2 bytes]: The chance for this ending. If chances are not used, these two bytes are omitted from the record. If chances are used and the is no chance high enough, the word is considered finished (just like adding 0xFFFF for the chance and 0x00 for the character, but it saves 3 bytes). (see Chances below)
      • Next Letter [1 byte]: A letter that could follow the letter pairs. If not using chances, you should use 0x00 to signify that the next "letter" is actually the end of the word.

Phrase-Structure Rule Grammar section

Based on phrase-structure rules, this section has the potential to create words, sentences, phrases, spell names, and darn near everything else and can be a lot smaller than the letter pairing section or the random entry section.

This is based on trees, where one rule can expand into multiple rules. Search for phrase structure rules examples on Google. If you are not a good programmer and you have generated a PSR tree that you want made into a database for this application, I'd be happy to do that for you. You can even check out my PSR Format page to see how to make a file that I can read. It has examples and describes the process. Still, if that seems daunting, just give me one or more tables that a DM can roll on to see what happens, and I can easily take care of converting it to PSR.

For a good description of what PSR is and how I use it, take a look at my PSR Format page. If you are not a good programmer and the description page doesn't do it for you, I'd be happy to help out. Just provide for me a table that a DM would roll on to see what happens, and I'll make it PSR. If you want to expand it a little, I'm sure we can work things out.

If you are trying to create a database from scratch or with an application (not using the format illustrated here and not using the PHP class), then you will need to get into these gory details. Sorry, but it is hard to describe this format without some really bad examples.

Imagine that W is a rule that is define to generate a word, such as "desk" or "book." Now, let's say that we want to generate a compound word with rule C. The rule C would expand to WW, which means to place two words together. This could generate useful words like homework, bookend, and freemason. Unfortunately, it could just as easily generate diskdisk, bubbleslime, and phonetube. It all depends on how you make the rules.

Let's say that we want to generate last names. Rule L (for Last name) could potentially expand to WW, just like how C expanded to WW. We would want the first letter capitalized. Also, let's refer to rule W as the 9th rule in the database... The record would be (in hex) 01 03 09 01 01 09 00 (<-- null terminated). Confusing? Probably.

The main rule that is always expanded is the first record (#1).

  • First Record: Header
    • Type [2 bytes]: 2
    • Flags [2 bytes]: (none defined yet)
  • All the rest in the section: Data records
    • Flags [2 bytes]:
      • 0x0001 : Use random chances for this rule (see Chances below)
    • Items [2 bytes]: Number of possible rules to expand to
    • Expanded Rules: (one or more of the following)
      • Chance [2 bytes]: The chance for this rule. If chances are not used, these two bytes are omitted from the record. (see Chances below)
      • Expanded rule: Null-terminated string of data. Command characters may be embedded in the text. Please see Command Codes.

Chances

For entries with chances, you wish to assign higher and lower chances to particular things. One example is if you are generating a word and you have a Q. You would most likely want a U after it and rarely want the word to stop with just the Q at the end.

A random number is generated (unsigned int, from 0x0000 to 0xFFFF) and all of the records are scanned with the following algorithm. The first rule or option that is available should have the lowest chance number, and the last one should have the highest (otherwise you'll always get the first one).

  • Is this entry's chance number >= my random chance number?
    • Yes: Use this entry.
    • No: Continue on. If there is nothing else possible, end word/phrase generation.

(For speed purposes, I use a binary search, but the above algorithm does the exact same job.)

Command Codes

These strings can be embedded into some types (see above) of records. They tell the program to seek data from elsewhere or to expand to further rules.

If you are a programmer, you'll notice that indexes start with 1 instead of 0. This is because the end of string code is a null, and I want to be able to easily count the number of records by merely counting the nulls.

  • Command char [1 byte]
    • 0x01 - Data contains a single byte, which is a number of a rule to expand to in the current section. Rules are numbered starting at 1 for the first rule after the header record.
    • 0x02 - Data contains a single byte, which is a number of a section to expand to. Sections are numbered starting at 1 for the first section in the file.
    • 0x03 through 0x06 - Reserved.
    • 0x07 - There is a flags byte, but no data bytes. Don't use this in your rules. It is used internally only to help carry the flags through the transformations.
    • 0x08 - There is no flags byte. The next character (non-null) is used literally. This is only if you absolutely need to add a 0x01 - 0x08 character in your rules.
  • Flags [1 byte]:
    • 0x01 - Must be set.
    • 0x02 - Capitalize first letter of generated data
  • Data [varies]: Defined by the command character. Currently just one byte or none at all.
UPS was started by Jem Casey and Claude Ryan in 1907. They thought the U.S. Postal System was too slow. They created the American Messenger Company, whose name later changed to UPS. Tyler Akins <>
Contact Me - Legal Info