This is a repost of a post I made to reddit on June 13th 2017. I found it again recently and wanted to document it here.
The Problem (what's the optimal encoding?)
You have a known list of n candidates on a ballot. In the act of voting, the voter must number each candidate from 1 to n, using each number once. The voter is thus describing a transformation on the list of candidates into a particular permutation. What is the minimum number of bits needed to exactly describe any order, and the method?
My naive approach is to count (from the left) the number of 'jumps' each candidate must take to arrive in their ranked position skipping any previously ranked candidates.
So if we had 3 candidates: cs = ['c1', 'c2', 'c3']
and a list of ranks rs = [2, 3, 1]
we'd do this (where zip(cs, rs)
matches candidates to ranks):
- First,
c1
should be in the 2nd position, which is 1 move from the far left open spot. Since c1
could move 2 positions though we need to use 2 bits. Record 01
- Second, 'c2' needs to be in 3rd position, which is also 1 move from the far left open spot (since we skip over the filled position from step 1). Because 'c2' could only move a maximum of 1 jump, we only need 1 bit: record
1
.
- Third, 'c3' trivially fits into the last space, so we need 0 bits.
Our final result is 011
.
To decode the permutation, take N bits in the sequence ceil(log(2, n)), ceil(log(2, n-1)), ceil(log(2, n-2)), ..., ceil(log(2, 2))
, which is [2, 1, 0] in our case. Use those bits (in this case ['01', '1']
) to move each candidate into open spots.
By numerical analysis this method very closely fits O(n Log n) space complexity. (total_bits = 0.89 * n * log(2, n)
seems to describe the pattern very closely).
The exact number of bits used will be t = Sigma(i=2, n)( ceil(log(2, i)) )
.
It occurs to me that simply listing the positions is O(n log n)
too, but the exact number of bits required is t = ceil(log(2, n))*n
which is more than my method.
Reddit user tilrman pointed out:
With n candidates, there are P(n) = n! permutations. Each voter selects one of these permutations, so a vote can be represented by an integer [0, n!-1].
2 more comments follow in that branch before I posted my soln
My Solution
clarification: in "base factorial"
I'm using base
in the same way as base 10
(our number system) or base 2
(binary numbers).
Basic idea is to use factoradics - sort of like base factorial
where the kth integer is in [0, k] and to convert back to base 10 you'd multiply it by k!
. e.g.: 3210
is 0*1! + 1*2! + 2*3! + 3*4!
. Also 4321 + 1 = 10000
which is 1*5! = 120
in base 10.
These numbers uniquely enumerate permutations for a list of n unique elements and allow one to 'read' the permutation from the number. e.g. [a, b, c, d, e]
in permutation 2310
leads to [d, c, a, e, b]
. Put a
in 2+1th free slot, put b
in 3+1th free spot, put c
in 1+1th free slot, put d
in 0+1th free slot, put e
in the last spot.
This means we can map all integers in [0, n!-1] to a unique permutation (via modulo-ing and subtracting successive factorials), and back again quickly and efficiently (though arbitrarily sized integers (e.g. in python or haskell) make this much easier than small integers - 150!
is like 110 bytes long as a raw integer)
Finally a sanity check: for a list of k elements we have a maximum count of (k-1) (k-2) ... (k-k+1)
where each bracketed expression is a digit. If we add 1 we end up with 1 followed by k zeros, which corresponds to k!
, and so our maximum integer is 1 less at k!-1
which is what we expect :)