Sliding Window Search
The part of LZ77 compression algorithm is about searching of a substring in the text directly preceeding it.
Let us see an example with the following text:
in the town where i was born lived a man who sailed to sea and he told us of his life in a land of submarines
starting from the words
and he told we can discover the following repeating fragments:
"an"was encountered before in the word
"d "(letter with the following space) was encountered inside words
"he to"was encountered within
"l"(just a single letter) is found inside
"d "(again the letter with space) is something we have already seen...
This exercise asks you to find the longest of preceeding matches for several positions in the text. Note that while for real compression algorithm it is crucial to implement this efficiently, for the given exercise it is not needed - it is only important to get the idea of encoding with back-references.
Preceeding match should be sought for only within the block of preceeding
4096 bytes and the match of no more than
15 bytes is required. This will allow us to encode the position of the match with
12-bit value and the length of
the match with
4-bit value - i.e. only two bytes in total.
Position (or rather offset) of the match should be calculated relatively backward to current position,
so that the first preceeding symbol has the index
0, the next
1 and so on up to
4095 (i.e. it is the difference
of positions of the current substring and its previous match decreased by
Returning to our example above:
string off-len an 20-2 d_ 10-2 he to 58-5 l 19-1 d_ 7-2 u 0-0
In the last case the length
0 shows that even substring of the length
0 could not be found.
If more than one prefix of the same length could be found, please choose the smallest offset.
As a source text to fetch input data from please use the sample data file:
It is not large so it is recommended to store it in a single long byte array in your program to handle easily.
Input data have the number of testcases
N in the first line.
Second line contains exactly such amount of positions
i=1..N for which the search should be done.
Answer to each case should be
2-byte hexadecimal with its lowest
12 bits (or
3 digits) holding the offset
and the highest
4 bits (one digit) storing the length (so that offset
20 and length
2 are encoded as
Example (with real
input data 13 22255 22947 36587 9700 31780 21915 4676 25611 34708 13155 7630 46121 20855 10D7 59A4 3167 3007 301E 6AE7 3334 4721 3C77 7AD1 7F21 214A 3B04
Here the substring from position
kable and no better match could be found before rather than
k at position
The substring from position
led to listen and match of
5 letters is found at position
inside the phrase
disentang[led t]he most.
You may trace other matches manually to get the idea better.