Monday, June 15, 2015

Working with an experimental git branch

After several month's experience, I find it very helpful to control source code with git. I think it forces me to write small, standalone pieces of code, which is very good for reading and keeping trace of progress. However, when I am trying to write something new, I don't know what actually should be added, and the feature may not be able to work until I add several bunches of code. So these codes are experimental and may be too ugly to appear in places other than my local machine. A good way to deal with this situation is to create a separate experimental branch, adding whatever code I like. After experimenting and changing, the code becomes stable, and I can split the code in experimental branch into pieces, and rewrite them in another branch, and submit them. After that, I can remove the experimental branch as all its features have been submitted.
In one sentence, I think an experimental branch may be a good choice if the code is bigger than one commit or you are not very sure of what you want.

Sunday, June 14, 2015

How many ways to make a n-node connected graph with k roads?

The problem is as the title. And each node is distinct. Connected graph means each node in the graph can reach any other node in the graph. And there is one road allowed between two nodes. Each of the k roads should be used in the graph. For example, there are 3 ways to connect 3-node graph with 2 roads: A-B-C, A-C-B, B-A-C. There are 1 way to connect 3-node graph with 3 roads. There are 0 ways to connect 3-node graph with 4 or more roads.
The problem seems easy at the first glance. But it is complex to remove the duplications. I think there are two ways to consider the solution. One ways is to add up all ways can be used to connect n-node graph with k roads, the other way is to use C(n*(n-1)/2, k) to subtract all ways that cannot connect n-node graph with k roads. I find the solution in the latter case. So how many ways are there to not connect a n-node graph with k roads? Well, the graph can be made up of two sub-connected graphs, or three sub-connected graphs, or even more. I am sure no one wants to enumerate that after trying it myself. No matter how many sub-connected graphs there are? there must have a sub-connected graph that containing the n'th node. That graph may contains 1 to n-1 nodes, using 0 to k roads, and should be connected. It is in fact a sub problem of the original one. And beside this special sub-connected graph, we don't care about how other connected sub-graphs are connected. Whatever they are, they forms a way to build a non-connected n-node graph with k roads.
So here comes the solution, I can solve it by dynamic programming. let dp[n][k] means how many ways to build a n-node connected graph with k roads.
dp[n][k] = C(n*(n-1)/2, k) - {any n1, k1 to build a sub connected graph containing node n, 1 <= n1 < n, 0 <= k1 <= k, dp[n1][k1] * C((n-n1)*(n-n1-1)/2, k-k1)}.

Below is the code in java:

import java.util.*;
import java.math.BigInteger;

public class Temp {
  BigInteger bigIntAdd(BigInteger a, BigInteger b) {
    if (a == null) {
      return b;
    }
    if (b == null) {
      return a;
    }
    return a.add(b);
  }

  BigInteger[][] buildSelection(int n) {
    BigInteger[][] C = new BigInteger[n+1][n+1];
    C[0][0] = new BigInteger("1");
    for (int i = 1; i <= n; ++i) {
      C[i][0] = C[i][i] = new BigInteger("1");
      for (int j = 1; j < i; ++j) {
        C[i][j] = bigIntAdd(C[i-1][j], C[i-1][j-1]);
      }
    }
    return C;
  }

  BigInteger bigIntMul(BigInteger a, BigInteger b) {
    if (a == null || b == null) {
      return new BigInteger("0");
    }
    return a.multiply(b);
  }

  BigInteger bigIntSub(BigInteger a, BigInteger b) {
    return a.subtract(b);
  }

  public static String answer(int N, int K) {
    Temp instance = new Temp();
    int maxN = Math.max(N * (N - 1) / 2, N);
    BigInteger[][] C = instance.buildSelection(maxN);

    // dp[i][j] means the number of ways to connect i nodes with j roads.
    BigInteger[][] dp = new BigInteger[N + 1][K + 1];
    dp[1][0] = new BigInteger("1");
    for (int i = 2; i <= N; ++i) {
      int minJ = i - 1;
      int maxJ = i * (i - 1) / 2;
      for (int j = minJ; j <= maxJ && j <= K; ++j) {
        // dp[i][j], change i to bitmask 2^(i)-1,
        // dp[2^i-1][j] = C[i*(i-1)/2][j] - dp[mask1][k1] * C[(i-n1)(i-n1-1)/2][j-k1].
        // mask1 means the connected part containing node i-1.
        BigInteger value = C[i*(i-1)/2][j];
        for (int n1 = 1; n1 < i; ++n1) {
          BigInteger countOfMask1 = C[i-1][n1-1];
          int minK1 = n1 - 1;
          int maxK1 = Math.min(n1 * (n1-1) / 2, j);
          BigInteger sum = new BigInteger("0");
          for (int k1 = minK1; k1 <= maxK1; ++k1) {
            BigInteger mulRes = instance.bigIntMul(dp[n1][k1], C[(i-n1)*(i-n1-1)/2][j-k1]);
            sum = instance.bigIntAdd(sum, mulRes);
          }
          sum = instance.bigIntMul(sum, countOfMask1);
          value = instance.bigIntSub(value, sum);
        }
        dp[i][j] = value;
      }
    }
    return dp[N][K].toString();
  }

  public static void main(String[] args) {
    System.out.printf("answer(%d, %d) -> %s\n", 2, 1, answer(2, 1));
    System.out.printf("answer(%d, %d) -> %s\n", 4, 3, answer(4, 3));
    System.out.printf("answer(%d, %d) -> %s\n", 3, 2, answer(3, 2));
  }
}

Thursday, May 21, 2015

Use git to read linux kernel

The linux kernel is very big, and hard to understand. Even if we are looking at just one component, the component can still be very complex. What's more, the component is not stand-alone, and we usually can't find how it works without finding its connections with other components. But thanks to the git version control tool used by the linux kernel, we can trace the history of how a component was added into the kernel, and changed gradually. Even a big component is developed from a very simple version, and code is added piece by piece. What's more, each commit of code usually has one central idea from the author. So it should be much easier to understand the first simple version, and catch each following code commits.
I just tried this method. Firstly, I download the linux kernel using `git clone` command. Then, I make a little change of a core file in the component I want to understand. The change can make me easier to trace down the first code commit. Then, use gitk to find the first commit. In the gitk GUI interface, you can see what is changed in a commit, by right click on a line that is changed, I can jump to the commit this line is added. By continuously jumping like this, I can finally arrive to the first commit of this component, or the first commit of the linux kernel when it starts using gitk. Whatever, the earliest commit you can find should be much easier to understand. Of course, if you can understand the component in the current linux kernel version, that's great, and you don't need to find how it arrives here. But if you are not smart enough, and have a great pain trying to understand something in the linux kernel like me, this article should provide a simpler way to climb the linux kernel code mountain!

Monday, March 2, 2015

Build and run android linux kernel in emulator

If we want to learn android linux kernel, do something about it is the best way. Here records the steps to build and run android linux kernel in emulator. It is a good start for customizing the kernel.

Some steps may have alternative choices, but the following steps are proved by my own practice.
0. We need a linux-x86/linux-x86_64 machine.

1. Download aosp sources. We need the prebuilt gcc tool chain. From https://source.android.com/source/downloading.html.
   $cd aosp_master
   $repo init -u https://android.googlesource.com/platform/manifest
   $repo sync -c -j256
   $export PATH=$PATH:`pwd`/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8/bin

2. Download and build goldfish kernel source. It's the kernel used by android emulator. From http://source.android.com/source/building-kernels.html.
   $git clone https://android.googlesource.com/kernel/goldfish.git
   $cd goldfish
   # We can use git branch -r to find which remote branch we want to checkout.
   $git checkout -t origin/android-goldfish-3.4 -b goldfish3.4
   $export ARCH=arm
   $export SUBARCH=arm
   $export CROSS_COMPILE=arm-eabi-
   $make goldfish_armv7_defconfig
   $make
   Now we get the kernel zImage in arch/arm/boot/zImage

3. Download android sdk. From http://developer.android.com/sdk/installing/index.html and http://developer.android.com/sdk/installing/adding-packages.html.
   Download android sdk from http://developer.android.com/sdk/installing/index.html.
   Run tools/android, install packages (In my experience, the default option to install is enough for this experiment).
   Create an android virtual device for arm_v7. You can see a list of available targets by `tools/android list targets` command. In my enviroment, I create a avd by the following command:
   $./android create avd --name myavd --target 1 --abi default/armeabi-v7a
   You can see what avds you have by:
   $./emulator -list-avds

4. Good, now its time to run emulator with the kernel.
   $./emulator -avd myavd -kernel "path to the arch/arm/boot/zImage"
   We can verify what kernel is used by add some options in emulator.
   $./emulator -avd myavd -kernel "path to the arch/arm/boot/zImage" -show-kernel -verbose 2>&1 | tee emulator.log

Now we can have fun, looking around to see what we can do with the android linux kernel!


Monday, January 12, 2015

calculate factor count of k in n!

Define a function Z(n, k), Z(n, k) is the factor count of k exist in n!.
For example, Z(4, 2) = 3, as there are three factors of 2 in 4!.

In a naive calculation, the time complexity to calculate Z(n, k) is O(nlogn).

int Z(int n, int k) {
  int result = 0;
  for (int i = 2; i <= n; ++i) {
    int temp = i;
    while (temp % k == 0) {
       temp /= k;
       ++result;
    }
  }
  return result;
}

But there is a simple way to generate the result. in 1...n, there are n/k numbers which are dividable by k, there are n/(k^2) numbers which are dividable by k^2, and so on. So we can generate Z(n, k) like below, in O(log(n)) time.

int Z(int n, int k) {
  int result = 0;
  int temp = k;
  while (n / temp >= k) { // Take care temp shouldn't become negative or overflow.
     result += n / temp;
     temp *= k;
  }
  result += n / temp;
  return result;
}


Thursday, January 8, 2015

One way to calculate C(n, k) mod p.

I happen to meet with a way to calculate C(n, k) mod p for large n, when trying to solve http://discuss.codechef.com/questions/32706/gerald05-editorial.

The question can be described as below:
For a large number n, and a number k (0 <= k << n), calculate C(n, k) % p, p is a prime number, C(n, k) is the choice count to select k items from n total items.
For example, n <= 1000,000,000,  k <= 1000, p = 1000,000,007.

C(n, 0) = 1
C(n, k) = n!/((n - k)! * k!) = n * (n - 1) * ... * (n - k + 1) / k!

What we need to compute is C(n, k) % p.

The trouble is the multiplication value of C(n, k) is so large that It's difficult to compute it.

Can we solve like below:
long long result = 1;
while (int i = 1; i <= k; ++i) {
  result = (result * (n - i + 1)) / i % p;
}

That is the error I am making at first time. It is absolutely wrong. Because we can't guarantee (result * (n - i + 1) % i == 0) for each time, especially result is mod by p each time.

One naive solution is to calculate prime factors for n, (n - 1), ..., (n - k + 1), minus the prime factors of k!, then we can get the result of C(n, k) % p. It takes about O(n^0.5) to calculate prime factors for n, totally 2 * k value, so O(2 * k * n^0.5) = O(k * n^0.5). It seems fine if you don't need to calculate much C(n, k).

The other solution is to take advantage of  Fermat's little theorem. I am terrible with formula, so just remember the conclusion. a^(p - 1) === 1 (mod p). p is a prime number, a is not dividable by p.

The key idea of the solution is to change division to multiplication is n * (n - 1) *... * (n - k + 1) / k!.
Because we can mod p for each factor of multiplication, but not for division.

suppose n * (n - 1) * ... * (n - k + 1) / k! = x. As it is dividable, n * (n - 1) * ... * (n - k + 1) = x * k!

Multiply 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) on each side, we get:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) = x * 1^(p - 1) * 2^(p - 1) * ... * k^(p -1).

Mod p on each side, interesting thing happens on the right side:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) % p = x % p.

So we have successfully changed division to multiplication.
C(n, k) % p = n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p -2) % p.

We can mod p for each factor, in the middle of multiplication. The time to calculate k^(p - 2) % p is O(log p), so the time complexity is O(k + k * log(p)) = O(k * log(p)). It is almost O(k), about 10^5 faster than the first solution, right?

Let me show how to calculate a^b % p at last. It is a quick calculate which deserves to remember.

long long modpow(long long a, long long b, long long p) {
  long long result = 1;
  while (b != 0) {
    if (b & 1) {
      result = (result * b) % p;
    }
    a = (a * a) % p;
    b >>= 1;
  }
  return result;
}

I can write result * b and a * a directly here because know it will not overflow in advance. Take care of the value range, and check if it will overflow for multiplication or addition whenever possible.
No need to compare this O(logb) version of modpow with O(b) version.

One use of the content in this page is in http://discuss.codechef.com/questions/32706/gerald05-editorial.