Bthash Database Library

Copying The Bthash Library

This document covers the btree subroutines within the bthash library. If you want to read about hash subroutines, there is a separate document for the hash subroutines.

Key Sequence

All data in the bthash database is binary. For this reason, negative numbers sort after positive numbers. You will need to reverse the sign of numeric data before storing it in a key and after retrieving it from a key.

Binary floating point numbers won't sort properly in a bthash database key, because the exponent is usually at the head of the number.

Deleting a Btree Record

bthash uses a modified delete for efficiency. The tradeoff was between using a more efficient delete with occasional reorganization versus a more expensive and risky delete with reorganization in every delete.

When you initialize a database, you specify the location in the record of a delete byte. This byte must never have the value of hex FF. It can have any other value and be used for any other purpose, except for being part of the key.

If you insert a record with the delete byte set to hex FF, the record is rejected. If you update a record with the delete byte set to hex FF, the record is rejected.

After you delete a record with the btdlet call, the delete byte will be set in the record, only if it is an index node. If the record is a leaf node, the record will disappear and the space will be reused.

It is possible to have empty leaf nodes, when all the records in the block have been deleted. Yet an index node will always point to the leaf node for less than or greater than key comparisons. As new records are added, the leaf nodes fill up again and eventually split.

The amount of space used by leaf nodes is at least half of the database. The higher the records per node in your database the more frequently the delete will occur in a leaf node. For example, if you have 11 records per node, the ratio of index to leaf nodes is roughly 1 to 10.

The formula is:

      (1/nodesz) + (1/nodesz)**2 + (1/nodesz)**3 ...

The btdlet program is efficient, because the database is not rearranged during the delete. The only inefficiency is due to the extra space occupied by deleted index nodes. Until the database is reorganized, these deleted index nodes are used for finding records.

The btdlets subroutine counts the number of index nodes that have been deleted. When this number gets large enough, it will be time to reorganize the database.

Database Reorganization

To reorganize the database, call btreorg.

Sometimes the reorganized database is larger then the original database. This sounds contradictory, but the process of splitting blocks can't be predicted. All the added space is available for inserts. The database is reorganized in the sense that the deleted index nodes have been removed.

Database Integrity

The btdbck subroutine checks every pointer in the database to make sure it isn't duplicated by another pointer. A good bthash database has one and only one pointer to each block. Another way to state this rule is to say that a child member in a one to many relationship has one and only one parent.

Unicode

Since all data in bthash is binary, you can store Unicode text in a bthash database.

Encrypted Data

Documentation

The following documents provide information about configuring and using the bthash library.