[141] Sliding Window Search substring positions

Back to General discussions forum

PatrickQuirk     2021-12-11 02:53:01

Hello.

I am having an issue with Sliding Window Search (Problem 141). When I look through the text file for the example substrings in the problem, I do not find them where the problem says their position is. For example, the problem says that the substring "kable" appears at position 22255, but when I looked at that position the substring was not there. Instead, it was at position 21722. This discrepancy appears to be because of new line characters, as there are exactly 22255 - 21722 = 533 new line characters in the text before "kable" in position 21722. I had similar results with the other example of "led to listen". Does anyone know what is going on here?

For reference, I right-clicked the link, clicked "Save Link As", and then saved the text as a .txt file. I then read the data file into Python. I am using Mozilla Firefox on a Windows machine.

Rodion (admin)     2021-12-11 13:22:12
User avatar

Patrick, Hi!

Interesting situation. I suppose your guess about newlines conversion is correct, then let's see: I don't think browser converts file when downloading. Unless you opened-saved it then with some editor, it should remain the same when you read it with Python.

Now, how do you read it in Python? Probably most correct way should be to read it as single large string and in binary mode.

# python3
>>> f = open('doyle.txt', 'rb')
>>> data = f.read(1000000) 
>>> len(data)
47434
>>> data[22255:22260]
b'kable'
PatrickQuirk     2021-12-12 16:22:30

I had used 'r' instead of 'rb' for the mode. I did not know about the 'rb' mode; I will have to look into that.

The positions do seem to work properly in this mode.

Thanks, Rodion!

Rodion (admin)     2021-12-13 12:42:01
User avatar

I'm not python guru, but probably b means binary (as in most other languages) - e.g. it's also reading, but without any text conversions. It would be great if it helps :)

PatrickQuirk     2021-12-13 17:36:02

From what I can tell, that is exactly what the 'b' means. It seems that if I just had 'r' as the mode, Python would automatically convert all new line sequences to its default of "\n". Since the new line sequences were originally "\r\n", this decreases the number of characters in each new line sequence by 1.

Using 'rb' did help, and I was able to solve the problem.

Thanks again!

Please login and solve 5 problems to be able to post at forum