Most Frequent Word (Task 70)

Back to Problem Solutions forum

BlaineThePain     2015-09-19 21:30:41

I'm working with Java and I'm having some problem with Task 70, possibly because I don't fully grasp the potentialities of the HashMap structure.

As far as I understand HashMaps work great at storing (.put()) and retrieving (.get()) entrances because of the whole hashing process, which makes size appear as irrelevant.

My problems, though, start very early: I can't create the list of random words because of the sheer size of it. I thought about using something like this:

public static void setWords() {
    words = new HashMap<>();              //words is a class variable
    for (int i = 1; i < 900000; i++) {
        String word = generateWord(6);    //generates a word of 6 characters using linear congruence
        if (words.get(word) != null) {
            words.put(word, words.get(word) + 1);
        } else {
            words.put(word, 1);
        }
    }
}

but the program runs for minutes and minutes to no avail. I thought the problem could be in the if block, but eveng cutting it down (for test purposes) to:

public static void setWords() {
    words = new HashMap<>();
    for (int i = 1; i < 900000; i++) {
        String word = generateWord(6);
            words.put(word, 1);
    }
}

makes no difference. Can someone point me what I'm doing wrong? Or is it normal for the list to take lots of time to generate? I feel like I'm banging my head against a wall and I'm clueless about how to approach this.

Guy Gervais     2015-09-19 22:24:25
User avatar

Could we see your "generateWord" function?

The logic shown seems ok; I'm basically doing the same thing (except on .Net) and generating 900,000 words takes less than a second.

BlaineThePain     2015-09-19 23:17:57

Sure, I'm now almost positive that the problem lies beneath it (I've tried adding dummy strings and it works just fine), but I can't figure out what's wrong. Or better, I do realise that there are a lot of operations being performed at every iteration, but I wouldn't know how to avoid that. Here's my code, as taken from the solution of other tasks:

//seed, count, consonants and vowels are class variables

public static String generateWord(int length) {
    String word = "";
    for (int i = 0; i < length; i++) {
        int random = linCongGen(445, 700001, 2097152, seed, count + i + 1); 
        int index;
        if (i % 2 == 0) {
            index = random % 19;
            word += consonants.charAt(index);
        } else {
            index = random % 5;
            word += vowels.charAt(index);
        }
    }
    count += length;
    return word;
}

public static int linCongGen(int a, int c, int m, int x0, int n) {
    for (int i = 0; i < n; i++) {
        x0 = (a * x0 + c) % m;
    }
    return x0;
}
Guy Gervais     2015-09-20 00:00:50
User avatar

First problem I see is that your LCG is being re-run from the start at every iteration. "count" eventually gets very large (900,000 x 6 = 54,000,000) and your LCG will have to be executed 1+2+3+4...+54,000,000 times. That's way too much.

You need to modify your LCG so that it keeps "internal state" between each call. It should keep the "x0" value between call and only do "x0 = (a * x0 + c) % m;" once every time it is called.

That should reduce your time by quite a bit.

BlaineThePain     2015-09-20 10:15:37

Thanks for the tip, looking at this not late at night made me realise my mistake, which was possibly due to the fact that I had been using copy-pasted code from old solutions without checking it... Definitely bad practice.

Running the LCG just once per cycle, storing its last value and updating it worked liked a charm, thank you again!

Guy Gervais     2015-09-20 10:54:20
User avatar

No problem, glad I could help.

Vadim Tukaev     2015-09-21 09:19:13

Also, do not concatenate strings, instead use StringBuilder.

BlaineThePain     2015-09-21 10:12:37

Thanks, I didn't know of that class before. The rule of thumb should be using it inside loops, right? In that way the String isn't created over and over again and this positively affects performance and run-time.

Guy Gervais     2015-09-21 10:44:20
User avatar

For Java and .Net, Strings are "immutable"; so if you concatenate "abc" with "123", the runtime creates a new string of length 6 and copies "abc" and "123" to it. It does this everytime you concatenate strings together.

As usual, if you're not sure, try it both ways and measure.

stanislav.zholnin     2016-09-22 19:56:25

Guys, why is everybody using strings and hashmaps for this problem?

There are only 19 * 5 * 19 * 5 * 19 * 5 = 857375 possible words. I though it is obvious that you can just make array of 857375 numbers and generate number after number (instead of actual strings) and count how many time you generated each one.

After you have identify the most frequent number, just convert it into string with some mods and divisions.

Must be much faster. Or maybe it is just my perverted mind.

P.S. Actually I suspect that 900,000 limit for number of words in problem's conditions is a hint at this solution - first round number after 857375.

Please login and solve 5 problems to be able to post at forum