Treap Study

Problem #380

Tags: popular-algorithm data-structures c-1

Who solved this?

No translations... yet

Treap is a data-structure, a kind of binary tree which we already have seen, but with very curious property - it uses randomization to optimize internal data distribution. This means that building Treap from the same data produces different actual tree shape and placement of values inside.

Let's study this interesting concept. Particularly we'll try to see how much it is affected by poor choice of randomizer.

How it Works

Naive binary search tree, like one we met in Tree Builder has obvious issue - in unlucky case the data could be inserted in such an order that the whole structure is not well-balanced. In worst case (think of adding 1, 2, 3, ... n in order) it will become a string of nodes of length n.

Many ideas were proposed to automatically re-balance the data. In case of Treap idea is the following:

It may be regarded as a Tree regarding X and min-Heap regarding Y - hence the name (Tree + Heap).

Actually that's all! If you have seen Cartesian Tree, the only more details you need are about mentioned insertion steps. It is like this: firstly insert the new node according to its X field as in normal binary tree (i.e. descend in left-right fashion to some empty leaf and dangle it here). Now we see that Y of this node could be in wrong relation to its parent (we remember in Cartesian Tree all Y should be ordered from top to bottom). In this case it is necessary to swap (or more exactly, "rotate") the node with its parent. This involves possible detaching of some child of that parent and attaching it to our new node.

This process is repeated until the new node "bubbles up" to proper place according to its Y (while "rotations" preserve X ordering). Let's regard some example of rotation:

          600                    600
          /                      /
        300                    500
        / \                    / \
      100 500                300 550
          / \                / \
        400 550            100 400

Suppose we want 500 to rotate with its parent 300 (their Y are not shown, but suppose their order needs this rotation). This means that:

Rotation is always about changing 3 links between nodes (which probably are double-links), though some of them may be pointing at nulls.

We can cheat by first putting all the nodes into array and sorting them, then building a tree from array as in Cartesian Tree problem - but rotations are used in other trees also so it's better to do things properly and practice this useful trick.

Problem Statement

We shall use simple shift register randomizer. Generate N sequential random values to be inserted into the tree (as X-s), starting with seed S.

Now conduct M tests for which we are given M values of seed for randomizer. On every test re-initialize randomizer with one of these seeds and create the tree anew (e.g. always with the same X-s but different Y-s every time).

For every run calculate average depth (height) across all nodes. Find which run gives the best and the worst average. (ideally we want randomizer which usually gives similar results, but this one is not ideal).

For example, here is a tree with X:Y values specified using colon while depth is shown in parentheses. It is generated using seed S = 44257 and test-seed 17753.

17753:7306 (1)
  left:
    5479:29226 (2)
      right:
        10958:58453 (3)
  right:
    22128:8876 (2)
      left:
        21916:51371 (3)
      right:
        35507:14613 (3)
          right:
            43832:37206 (4)

You see, the root has height 1, it's children 2 and so on. Average over 7 elements is

(1 + 2 + 3 + 2 + 3 + 3 + 4) / 7 = 18 / 7 = 2.5714...

Randomizer

Use the following formula (it creates sequence with period 65535 and comes from the example in Wikipedia):

# calculate next value for 'seed'
bit = (seed ^ (seed >> 2) ^ (seed >> 3) ^ (seed >> 5)) & 1
seed = (seed >> 1) | (bit << 15)

For example, seed of ACE1 is followed by values 5670, AB38, 559C, 2ACE, 1567, 8AB3, 4559 (all hexadecimals).

Input data contain three values in the first line: N is the size of the tree, S is the starting seed for generating X values for the tree, and M is the amount of test-seeds, following in the second line.

Answer should give 4 values - best test-seed, best average, worst test-seed, worst average.

Example

input data:
30 44257 7
57401 6876 16061 17524 40078 41354 34145

answer
34145 4.4666667 41354 6.1666667

Note: if you need to review tree structure for larger example, here is one for N=30, S=44257 and TEST_SEED=41354. Here are also references to parent nodes after "depth" in parentheses.

You need to login to get test data and submit solution.