I came across an article in Linux Journal that discussed an
interesting method of making a doubly-linked list while only using one
pointer. In Issue
129, Prokash Sinha discussed the method in detail. I had been bouncing
around a similar idea for binary search trees, but never could make anything
useful. I suppose that's because I was starting in the wrong area.
On this page, I will show two blocks of code side by side. On the left
is the code for a normal doubly-linked list. On the right is a modified
version that is more memory efficient.
The premise is simple. Instead of storing the next and
prev memory addresses, you merely store the XOR difference between
them. See the article (and subscribe today to Linux Journal)
for an explanation that is more in-depth ... or just leave me a comment and
pester me and I will explain it more in here.
You can cut your header information in half, since your struct only needs
to store the difference between the pointers. Let me show you what I
mean, assuming that we are storing integers in our linked list.
typedef struct node {
struct node *prev;
struct node *next;
int data;
} node;
|
| | typedef struct node {
struct node *diff;
int data;
} node;
|
|
Yep, that's right. A doubly-linked list with just the space for one
pointer. How do you do it? Well, it's not that hard, but first let's get
into the "Why".
I program for Palm OS. I like to code in C and tweak things until they
are faster, smaller, and better. I also would like to see perhaps a kernel
option to have memory-efficient doubly linked lists (among other things) if
possible because I want to get my Aquapad
running a lean, mean, optimized kernel. For these reasons, at times memory
is a concern. For those times, this version of a doubly-linked list could
be beneficial. Most of the time it will not be the smartest, especially if
you absolutely need the most speed you can squeeze out of the machine. That
isn't to say that the memory efficient technique is slow; it just consumes a
tiny bit of more CPU cycles. Tiny. How many? Let's find out.
The best way to judge how much time an algorithm takes is to do an
analysis of the code. I'll try to do that for you. The easiest way is to
torture test a reference implementation. I'll certainly do that for you too.
To do either, I will need a few functionions. Keep in mind that my code
isn't optimized for speed or size at this point. It is just a good working
implementation.
// sizeof((node *)) == 8
// sizeof(int) == 8
#define PTRTYPE int
#define XOR(a,b) ((node *) ((PTRTYPE)(a)^(PTRTYPE)(b)))
#define XOR3(a,b,c) ((node *) ((PTRTYPE)(a)^(PTRTYPE)(b)^(PTRTYPE)(c)))
|
|
These special #define macros will help me out. On my system, the size of
the pointer is the same as the 'int' data type. You may need to use
unsigned long, long long, short, or something else. The XOR() and XOR3()
macros just save me tons of typing. I could have made an actual function
for that, but it would have been silly to do that because the lines are so
darn small; setting up the call to the function, passing the parameters, and
passing back the return value would have chewed up more code than I would
have liked.
Basically the XOR() macro just does a bitwise XOR of two pointers. If
you recall from your binary math class, 01010000 xor 11000011 results in
10010011. Xor just flips bits in the first value as specified in the
second. So, if you start with 000 and xor it by 010, you get 010 as your
result. If you xor it again by 010, you return back to your original 000
value. Enough of that – let's see some code!
unsigned long countNodes()
{
unsigned long nodes = 0;
node *curr;
curr = nodeHead;
while (curr)
{
nodes ++;
curr = curr->next;
}
return nodes;
}
|
| | unsigned long countNodes()
{
unsigned long nodes = 0;
node *curr, *prev, *temp;
prev = NULL;
curr = nodeHead;
while (curr)
{
nodes ++;
temp = curr;
curr = XOR(prev, temp->diff);
prev = temp;
}
return nodes;
}
|
|
Aha! A simple function that just shows you how to count the number of
nodes in the doubly linked list. On the left, if you recall, is the normal
version. The right contains the modified version, which is a bit longer.
There are two extra variables and an extra assignment before the while
loop. Negligable performance hit. Inside the while loop, there are two
more assignments and one xor.
void insertValue(node *newNode)
{
node *curr, *prev;
curr = nodeHead; // nodeHead is a global
prev = NULL;
// Scan for place to insert (sorted numerically)
while (curr && curr->data < newNode->data)
{
prev = curr;
curr = curr->next;
}
// Update pointers
newNode->next = curr;
newNode->prev = prev;
if (prev == NULL)
nodeHead = newNode;
else
prev->next = newNode;
if (curr == NULL)
nodeTail = newNode;
else
curr->prev = newNode;
}
|
| | void insertValue(node *newNode)
{
node *prev, *curr, *temp; // Extra pointers here
curr = nodeHead; // nodeHead is a global
prev = NULL;
// Scan for place to insert (sorted numerically)
while (curr && curr->data < newNode->data)
{
temp = XOR(prev, curr->diff);
prev = curr;
curr = temp;
}
// Update pointers
newNode->diff = XOR(prev, curr);
if (curr != NULL)
curr->diff = XOR3(curr->diff, prev, newNode);
else
nodeTail = newNode; // Last in list
if (prev != NULL)
prev->diff = XOR3(prev->diff, curr, newNode);
else
nodeHead = newNode; // First in list
}
|
|
The code really isn't all that different for the two styles of linked
lists. There is one extra line inside the while loop that contains an XOR,
a dereference, and an assignment. That may, in my best guess, nearly double
the amount of code in the loop. People who like assembler might want to
argue on this point. Updating the pointers takes almost the same amount of
code – there are merely a couple more XOR commands.
temp = prev ^ curr->diff;
prev = curr;
curr = temp;
|
| | prev ^= curr ^ curr->diff;
curr ^= prev;
prev ^= curr;
|
|
If you don't want the 'temp' pointer ever to be used, you can instead
use something like this neat little trick. Same number of lines, and nearly
the same number of compiled commands, but the end result may not justify
the untidy code unless you are only being graded on bytes or interesting
techniques instead of readabiliy.
void deleteNode(unsigned long i)
{
node *prev, *curr, *temp;
// Find the node
curr = nodeHead;
while (curr && i--)
{
curr = curr->next;
}
// Might have fallen off the list
if (curr == NULL)
return;
// Erase the node
prev = curr->prev;
temp = curr;
curr = curr->next;
free(temp);
// Update links
if (prev != NULL)
prev->next = curr;
else
nodeHead = curr;
if (curr != NULL)
curr->prev = prev;
else
nodeTail = prev;
}
|
| | void deleteNode(unsigned long i)
{
node *prev, *curr, *temp;
// Find the node
prev = NULL;
curr = nodeHead;
while (curr && i --)
{
temp = XOR(prev, curr->diff);
prev = curr;
curr = temp;
}
// Might have fallen off the list
if (curr == NULL)
return;
// Erase the node
temp = curr;
curr = XOR(prev, temp->diff);
free(temp);
// Update links
if (prev != NULL)
prev->diff = XOR3(prev->diff, temp, curr);
else
nodeHead = curr;
if (curr != NULL)
curr->diff = XOR3(curr->diff, temp, prev);
else
nodeTail = prev;
}
|
|
This function just deletes the Nth node in the list. It can easily be
adapted to deleting a specific value, if you so desire. There's one more
assignment in the beginning two extra assignments in the while loop and that
XOR in there too. Likely, it will be 1/2 the speed, just like above, merely
because we have to pass double the amount of assembler instructions around
(depending on how it gets compiled). The rest of the function just contains
an extra XOR here and there, nothing to really worry about.
Now comes the fun part. Testing. I wrote a tiny little program that is
identical for each list structure. I've coded the algorithms in separate
files. When I compile them and strip them, this is the sizes that I see:
for F in countnodes deletenode insertvalue; do (
gcc -Os -Wall -o ${F}.o ${F}.c; strip ${F}.o; ); done
ls -l normal_version/*.o
-rw-r--r-- 1 root root 496 Jan 22 21:01 countnodes.o
-rw-r--r-- 1 root root 560 Jan 22 21:01 deletenode.o
-rw-r--r-- 1 root root 548 Jan 22 21:01 insertvalue.o
ls -l xor_version/*.o
-rw-r--r-- 1 root root 504 Jan 22 21:01 countnodes.o
-rw-r--r-- 1 root root 580 Jan 22 21:01 deletenode.o
-rw-r--r-- 1 root root 564 Jan 22 21:01 insertvalue.o
|
|
As you can see, only very minor changes in size, the largest is a
difference of 20 bytes. However it all depends on where those bytes are,
because if they are in the time-consuming while loops, then those
extra instructions could consume a lot of time.
I ran my test program five times on each list, altertating between
them, and tallied the amount of user time each one took. I have two tests.
The insert test just seeds the random number generator with a static value,
make a node with a random number, and insert the node into the ordered list
with the insert function above. The delete function just inserts a bunch of
numerically descending values (very fast to insert them) and deletes the Nth
node, determined "randomly" using the statically seeded random number
generator. I ran each test for both versions with 10,000, 20,000, and
30,000 inserts and deletes. The average times are listed below.
ITERATIONS INSERT DELETE
10,000 0.306 0.382
20,000 3.566 3.834
30,000 13.988 14.092
|
| | ITERATIONS INSERT DELETE
10,000 0.350 0.484
20,000 3.244 4.312
30,000 14.282 14.124
|
|
The normal linked list should be faster than the XOR version, but since I
tested the program on my web server (the nicest machine I have available to
me for testing), and it could be doing stuff in the background, I tried to
alternate the tests and then run them multiple times to average things out.
I don't know why the insertion of 20,000 values for the normal list takes
less time than with the XOR version. Let's just mark that up to a fluke.
The rest all expectedly bigger, but not a standard percentage – it
ranges from 0.2% to 26.7% longer for the XOR version. Maybe all of the data
I've collected is a fluke?
Anyway, this is a very interesting idea and if you can handle the minor
drawback of the extra time, it can save a lot of space. For about 10%
longer, you can have a list of 20,000 entries save you 160k in memory (8
bytes per pointer, 20,000 pointers). It could also make allocation easier
for the OS since you need smaller structures.
There is nothing bad about trying something out to see if it suits your
needs better. I'm not saying this is the end-all best solution. It just
can help out a lot.
- Download the Code - Code for all of the above
examples, the testing program, and list definitions. Just change to the
appropriate directory and issue a "make" command. This was compiled under
Linux, but most other platforms should be able to handle it with minor (if
any) changes.
- Linux Journal
Article - Where the idea was described, but since it requires
subscription or something to get the code .... Besides, it didn't do a good
enough job in my opinion. Everyone needs more code ... more code ...
MORE CODE!