General discussions: Challenge Problem Idea

Vadim Pelyushenko 2022-03-07 05:19:37

By challenge I mean of course one of these: https://www.codeabbey.com/index/wiki/challenges

Basically the idea is https://www.codeabbey.com/index/task_view/dungeons-and-dragons-dice but...

With more test cases.
Don't allow viewing of test data (otherwise it's possible to guess and check).
- I think it could still be possible to deduce the correct answers with enough submissions, and manipulate the execution to give the exact correct answers... but this seems unavoidable. One could argue if someone is clever enough to manipulate execution where they can't see the inputs nor the score per test case they deserve it, but eh.
Maybe more diverse possible dice sets.
Maybe allow programs to give a confidence level for possible answers so they can get partial credit on some test inputs where it's not clear cut.
- Moreover this is of course an inherently probabalistic thing.

If we do add such a challenge problem... maybe it would be a good idea to hide solutions from others for https://www.codeabbey.com/index/task_view/dungeons-and-dragons-dice

Vadim Pelyushenko 2022-03-07 05:30:09

Hmmm. Actually maybe one possible counter-measure to deal with the manipulation of execution thing is to every now and then occasionally replace some test cases with newly generated ones... leading to maybe some variability in the rankings, and certainly would require your servers to do more work...

Vadim Pelyushenko 2022-03-07 05:47:38

Or... the test inputs could just be different each time. I think with enough test cases the probability of an execution that is lucky enough to make a difference in rankings could be made low enough to be satisfactory. Haven't done the math on that yet.

Rodion (admin) 2022-03-08 07:08:35

Vadim, Hi!

Glad to see you - and glad to inform that your suggestion always are a kind of curious puzzle themselves :) They provide much ground to think - but no immediate clue on how to make idea viable :)

Don't allow viewing of test data (otherwise it's possible to guess and check)

Yep, that is a limitation of "challenges" in the way we have them. And still some try-and-guessing is possible - I remember curious situation with Micro-Life - note, how different are results between few top solvers and all others...

I got your idea - to have a challenge with multiple test-cases and result aggregated over them. Just after some thinking I feel this specific problem is not good, probably, for this:

Really, what are test-cases? hidden parameters of the dice - and published results of throws. But how do we decide whether the guess is correct? Perhaps these throws really better fit to different dice configuration and test-case is simply not good due to poor random chance.

Perhaps game like "Bulls and Cows" (especially in word variant) should be better! Need to ponder a bit on this.

the test inputs could just be different each time

yep, we can try this, but this may lead to frustration - some person may got very lucky input - and later neither this person, no rivals can't repeat the feat :)

There is other approach - seen in Hamurabi task - to use solutions to produce answer on the server (in which case user doesn't know the answer and can hardly guess anything about input). Regretfully currently there is no implementation of running solutions in arbitrary languages (and there is no much chance people would like greatly to code in scheme or assembly or other small interpreter we may add). Still it seems the best direction for me to work on...

P.S. I remember there are a couple of your threads not yet answered by me - really sorry - shall try to return to this soon!

Vadim Pelyushenko 2022-03-08 08:52:59

But how do we decide whether the guess is correct?

My solution to the original dungeons and dragons problem actually computes the probability that a sequence could have been generated by any dice set. I'm fairly confident in its correctness, but you shouldn't take my word alone for that. So anyways, if you want to make sure all test cases have the correct answer match what was actually used in simulation... you can just throw away test cases where they don't match. If the test-cases are large enough, probably none of them will be thrown away.

but this may lead to frustration - some person may got very lucky input

I think it is maybe possible to set up test-cases and scoring in such a way that this is unlikely enough (e.g. 1 in a billion), and even when it would happen it would be a negligible difference.

P.S. I remember there are a couple of your threads not yet answered by me - really sorry - shall try to return to this soon!

No worries