Thursday, January 8, 2015

One way to calculate C(n, k) mod p.

I happen to meet with a way to calculate C(n, k) mod p for large n, when trying to solve http://discuss.codechef.com/questions/32706/gerald05-editorial.

The question can be described as below:
For a large number n, and a number k (0 <= k << n), calculate C(n, k) % p, p is a prime number, C(n, k) is the choice count to select k items from n total items.
For example, n <= 1000,000,000,  k <= 1000, p = 1000,000,007.

C(n, 0) = 1
C(n, k) = n!/((n - k)! * k!) = n * (n - 1) * ... * (n - k + 1) / k!

What we need to compute is C(n, k) % p.

The trouble is the multiplication value of C(n, k) is so large that It's difficult to compute it.

Can we solve like below:
long long result = 1;
while (int i = 1; i <= k; ++i) {
  result = (result * (n - i + 1)) / i % p;
}

That is the error I am making at first time. It is absolutely wrong. Because we can't guarantee (result * (n - i + 1) % i == 0) for each time, especially result is mod by p each time.

One naive solution is to calculate prime factors for n, (n - 1), ..., (n - k + 1), minus the prime factors of k!, then we can get the result of C(n, k) % p. It takes about O(n^0.5) to calculate prime factors for n, totally 2 * k value, so O(2 * k * n^0.5) = O(k * n^0.5). It seems fine if you don't need to calculate much C(n, k).

The other solution is to take advantage of  Fermat's little theorem. I am terrible with formula, so just remember the conclusion. a^(p - 1) === 1 (mod p). p is a prime number, a is not dividable by p.

The key idea of the solution is to change division to multiplication is n * (n - 1) *... * (n - k + 1) / k!.
Because we can mod p for each factor of multiplication, but not for division.

suppose n * (n - 1) * ... * (n - k + 1) / k! = x. As it is dividable, n * (n - 1) * ... * (n - k + 1) = x * k!

Multiply 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) on each side, we get:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) = x * 1^(p - 1) * 2^(p - 1) * ... * k^(p -1).

Mod p on each side, interesting thing happens on the right side:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) % p = x % p.

So we have successfully changed division to multiplication.
C(n, k) % p = n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p -2) % p.

We can mod p for each factor, in the middle of multiplication. The time to calculate k^(p - 2) % p is O(log p), so the time complexity is O(k + k * log(p)) = O(k * log(p)). It is almost O(k), about 10^5 faster than the first solution, right?

Let me show how to calculate a^b % p at last. It is a quick calculate which deserves to remember.

long long modpow(long long a, long long b, long long p) {
  long long result = 1;
  while (b != 0) {
    if (b & 1) {
      result = (result * b) % p;
    }
    a = (a * a) % p;
    b >>= 1;
  }
  return result;
}

I can write result * b and a * a directly here because know it will not overflow in advance. Take care of the value range, and check if it will overflow for multiplication or addition whenever possible.
No need to compare this O(logb) version of modpow with O(b) version.

One use of the content in this page is in http://discuss.codechef.com/questions/32706/gerald05-editorial.

No comments:

Post a Comment