1 / \ 2 4 / \ 3 5
Approach #1: Adjacency list
The idea here is simple, you store each node's parent in a table:
table: nodes | |
id | parent_id |
1 | null |
2 | 1 |
3 | 2 |
4 | 1 |
5 | 2 |
This is trivial to implement, but hierarchical queries become hard. In order to query for all nodes under a given branch, you have to recurse through its children. If you don't have too many nodes, you can just read the entire table into memory and cache it -- which is sufficient for most web site navigation structures, for example.
Approach #2: Store the Path as a string
Here, the idea is that each node stores its path as string. For example, a node might have a path of "1_8_13". Thus, you could find the children of node "8" by querying for all nodes with a path of "1_8_%".
table: nodes | |
id | path |
1 | "1" |
2 | "1_2" |
3 | "1_2_3" |
4 | "1_4" |
5 | "1_2_5" |
This gives you the benefit of hierarchical queries, but only if you add an index on the "path" column, forcing SQL to do the heavy lifting. And, since it's a string column, your performance will not be as fast as if it were integer-based.
Approach #3: Nested subsets
The idea here is that each subtree is kept within a range of IDs, and the range of its subtree is stored in the node. In the example, the subtree of 1 is (obviously) within the range of 1..5. However, you'll notice the subtree of 2 is NOT within the range of 3..5 because node 4 violates that rule. As a result, we need a mutable ID in order to maintain the subset.
table: nodes | |||
id | mutable_id | min_mutable_id | max_mutable_id |
1 | 1 | 1 | 5 |
2 | 2 | 3 | 4 |
3 | 3 | 3 | 3 |
4 | 5 | 5 | 5 |
5 | 4 | 4 | 4 |
Note how we had to swap the IDs of 4 and 5, so that node 2 could have a valid nested subset range of 3..4. This can easily happen on insertions as well and force us to recompute large parts of the table if shifting is required. However, hierarchical reads are fairly inexpensive, as they just become numerical range queries.
Approach #4: Expanded tree
The idea here is that you store the normal adjacency list, but maintain another table of the tree already recursively expanded-out:
|
|
So, to summarize:
Pros | Cons | |
Adjacency list |
|
|
Path substrings |
|
|
Nested subsets |
|
|
Expanded tree |
|
|
Later, I hope to implement and benchmark each approach against each other. Any other algorithms worth investigating?
0 comments:
Post a Comment