Hello world

Thursday, May 21, 2015

Use git to read linux kernel

The linux kernel is very big, and hard to understand. Even if we are looking at just one component, the component can still be very complex. What's more, the component is not stand-alone, and we usually can't find how it works without finding its connections with other components. But thanks to the git version control tool used by the linux kernel, we can trace the history of how a component was added into the kernel, and changed gradually. Even a big component is developed from a very simple version, and code is added piece by piece. What's more, each commit of code usually has one central idea from the author. So it should be much easier to understand the first simple version, and catch each following code commits.
I just tried this method. Firstly, I download the linux kernel using `git clone` command. Then, I make a little change of a core file in the component I want to understand. The change can make me easier to trace down the first code commit. Then, use gitk to find the first commit. In the gitk GUI interface, you can see what is changed in a commit, by right click on a line that is changed, I can jump to the commit this line is added. By continuously jumping like this, I can finally arrive to the first commit of this component, or the first commit of the linux kernel when it starts using gitk. Whatever, the earliest commit you can find should be much easier to understand. Of course, if you can understand the component in the current linux kernel version, that's great, and you don't need to find how it arrives here. But if you are not smart enough, and have a great pain trying to understand something in the linux kernel like me, this article should provide a simpler way to climb the linux kernel code mountain!

Monday, March 2, 2015

Build and run android linux kernel in emulator

If we want to learn android linux kernel, do something about it is the best way. Here records the steps to build and run android linux kernel in emulator. It is a good start for customizing the kernel.

Some steps may have alternative choices, but the following steps are proved by my own practice.
0. We need a linux-x86/linux-x86_64 machine.

1. Download aosp sources. We need the prebuilt gcc tool chain. From https://source.android.com/source/downloading.html.
$cd aosp_master
$repo init -u https://android.googlesource.com/platform/manifest
$repo sync -c -j256
$export PATH=$PATH:`pwd`/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8/bin

2. Download and build goldfish kernel source. It's the kernel used by android emulator. From http://source.android.com/source/building-kernels.html.
$git clone https://android.googlesource.com/kernel/goldfish.git
$cd goldfish
# We can use git branch -r to find which remote branch we want to checkout.
$git checkout -t origin/android-goldfish-3.4 -b goldfish3.4
$export ARCH=arm
$export SUBARCH=arm
$export CROSS_COMPILE=arm-eabi-
$make goldfish_armv7_defconfig
$make
Now we get the kernel zImage in arch/arm/boot/zImage

3. Download android sdk. From http://developer.android.com/sdk/installing/index.html and http://developer.android.com/sdk/installing/adding-packages.html.
Download android sdk from http://developer.android.com/sdk/installing/index.html.
Run tools/android, install packages (In my experience, the default option to install is enough for this experiment).
Create an android virtual device for arm_v7. You can see a list of available targets by `tools/android list targets` command. In my enviroment, I create a avd by the following command:
$./android create avd --name myavd --target 1 --abi default/armeabi-v7a
You can see what avds you have by:
$./emulator -list-avds

4. Good, now its time to run emulator with the kernel.
$./emulator -avd myavd -kernel "path to the arch/arm/boot/zImage"
We can verify what kernel is used by add some options in emulator.
$./emulator -avd myavd -kernel "path to the arch/arm/boot/zImage" -show-kernel -verbose 2>&1 | tee emulator.log

Now we can have fun, looking around to see what we can do with the android linux kernel!

Monday, January 12, 2015

calculate factor count of k in n!

Define a function Z(n, k), Z(n, k) is the factor count of k exist in n!.
For example, Z(4, 2) = 3, as there are three factors of 2 in 4!.

In a naive calculation, the time complexity to calculate Z(n, k) is O(nlogn).

int Z(int n, int k) {
int result = 0;
for (int i = 2; i <= n; ++i) {
int temp = i;
while (temp % k == 0) {
temp /= k;
++result;
}
}
return result;
}

But there is a simple way to generate the result. in 1...n, there are n/k numbers which are dividable by k, there are n/(k^2) numbers which are dividable by k^2, and so on. So we can generate Z(n, k) like below, in O(log(n)) time.

int Z(int n, int k) {
int result = 0;
int temp = k;
while (n / temp >= k) { // Take care temp shouldn't become negative or overflow.
result += n / temp;
temp *= k;
}
result += n / temp;
return result;
}

Thursday, January 8, 2015

One way to calculate C(n, k) mod p.

I happen to meet with a way to calculate C(n, k) mod p for large n, when trying to solve http://discuss.codechef.com/questions/32706/gerald05-editorial.

The question can be described as below:
For a large number n, and a number k (0 <= k << n), calculate C(n, k) % p, p is a prime number, C(n, k) is the choice count to select k items from n total items.
For example, n <= 1000,000,000, k <= 1000, p = 1000,000,007.

C(n, 0) = 1
C(n, k) = n!/((n - k)! * k!) = n * (n - 1) * ... * (n - k + 1) / k!

What we need to compute is C(n, k) % p.

The trouble is the multiplication value of C(n, k) is so large that It's difficult to compute it.

Can we solve like below:
long long result = 1;
while (int i = 1; i <= k; ++i) {
result = (result * (n - i + 1)) / i % p;
}

That is the error I am making at first time. It is absolutely wrong. Because we can't guarantee (result * (n - i + 1) % i == 0) for each time, especially result is mod by p each time.

One naive solution is to calculate prime factors for n, (n - 1), ..., (n - k + 1), minus the prime factors of k!, then we can get the result of C(n, k) % p. It takes about O(n^0.5) to calculate prime factors for n, totally 2 * k value, so O(2 * k * n^0.5) = O(k * n^0.5). It seems fine if you don't need to calculate much C(n, k).

The other solution is to take advantage of Fermat's little theorem. I am terrible with formula, so just remember the conclusion. a^(p - 1) === 1 (mod p). p is a prime number, a is not dividable by p.

The key idea of the solution is to change division to multiplication is n * (n - 1) *... * (n - k + 1) / k!.
Because we can mod p for each factor of multiplication, but not for division.

suppose n * (n - 1) * ... * (n - k + 1) / k! = x. As it is dividable, n * (n - 1) * ... * (n - k + 1) = x * k!

Multiply 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) on each side, we get:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) = x * 1^(p - 1) * 2^(p - 1) * ... * k^(p -1).

Mod p on each side, interesting thing happens on the right side:
n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p - 2) % p = x % p.

So we have successfully changed division to multiplication.
C(n, k) % p = n * (n - 1) * ... * (n - k + 1) * 1^(p - 2) * 2^(p - 2) * ... * k^(p -2) % p.

We can mod p for each factor, in the middle of multiplication. The time to calculate k^(p - 2) % p is O(log p), so the time complexity is O(k + k * log(p)) = O(k * log(p)). It is almost O(k), about 10^5 faster than the first solution, right?

Let me show how to calculate a^b % p at last. It is a quick calculate which deserves to remember.

long long modpow(long long a, long long b, long long p) {
long long result = 1;
while (b != 0) {
if (b & 1) {
result = (result * b) % p;
}
a = (a * a) % p;
b >>= 1;
}
return result;
}

I can write result * b and a * a directly here because know it will not overflow in advance. Take care of the value range, and check if it will overflow for multiplication or addition whenever possible.
No need to compare this O(logb) version of modpow with O(b) version.

One use of the content in this page is in http://discuss.codechef.com/questions/32706/gerald05-editorial.

Sunday, December 14, 2014

how to prevent output from getting interleaved

If an application only has one process and one thread, that is fine, there will be no competition to print information, which goes to stdout.
But what if there is one process but multiple threads? It is fine to use a single printf to print whatever you want, and that will not get interleaved with information printed by other threads. Below is an example.

void ThreadFn() {
while (true) {
printf("This is first info\nThis is second info\n");
printf("This is third info\n");
}
}

For the example above, you can always see a first info is followed by a second info, but may not followed by a third info. As far as we use a single printf, we are safe to guarantee that it is in integrity. However, information printed by more that one printf is vulnerable to get interleaved. The reason is in the implementation of printf, when it want to write to stdout, there is a line buffer(of course you can set it to unbuffered mode, but that is unusual). Multiple threads will use the same buffer for the same file, like stdout. To make sure the access to the buffer is thread-safe, it is considered by libraries to be a critical section which is protected by a mutex. Only up to one thread can access the buffer at a time. So the content written by a single printf is copied to the buffer in a single access to the buffer. But for two printf, the thread should apply for the mutex and access buffer more than once, there may be other thread accessing the buffer between two printf.

Then what about multiple processes? There is no synchronization mechanism between multiple processes for printing. When a process calls printf, the content goes to a buffer, when the buffer is full, it is dumped to the terminal/file by write/writev system call. There is no guarantee of the buffer size or how the buffer content will be dumped. To dump the buffer, some libraries may dump constant size at a time, others may dump all at once. So to guarantee integrity of information printed, we'd better call write/writev system call directly when multiple processes is getting involved. Below is an example.

void ProccessFn() {
...
int nwrite = strlen(buf);
char* p = buf;
while (nwrite > 0) {
int written = write(fileno(stdout), p, nwrite);
if (written == -1) {
if (errno != EINTR) {
perror("write");
break;
}
} else {
nwrite -= written;
p += written;
}
}
}

Thursday, December 11, 2014

use popen to implement single direction communication between two processes

popen() is a posix api to implement single direction communication between two processes. It is suitable for the following conditions: 1. create a subprocess and read its standard output. 2. create a subprocess and write to its standard input.
I think it is something between system() and pipe() plus fork/exec. With system(), you can only start a child process and wait for the termination of child process. With pipe/fork/exec, as it is a little complex, I will show the typical usage below:

MainProcessFn() {
int pipe_fd[2];
pipe(pipe_fd);
int pid = fork();
if (pid == 0) {
close(pipe_fd[0]); // close read side
dup2(pipe_fd[1], 1); // dup to stdout
close(pipe_fd[1]);
exec("some command");
} else {
close(pipe_fd[1]) // close write side
FILE *fp = fdopen(pipe_fd[0], "r");
// read from child process's standard output.
}
}

As we can see, we can do bidirectional communication between two processes with pipe. We can only do unidirectional communication with popen, while the benefit is simpler code.

MainProcessFn() {
FILE* fp = popen("some command", "r");
// read from child process's standard output.
pclose(fp);
}
With pclose, we can wait for the termination of child process, and get the exit status. In fact, popen/pclose is just a library api implemented based on system calls like pipe/fork/exec/waitpid, but just more convenient to use.

Tuesday, December 9, 2014

One way to realize parallelism

When we want to take benefits from multiprocessors, or just multitasking operating systems, we have to write multi-threads or multi-processes system. I met with such a condition before.
What I need to do is to spawn many processes in one process, each process does a small piece of job, gives feedback to the main process. There are about 1000 processes I need to spawn, so I may not spawn them at once as it makes heavy burden in system.
The first solution I can think up of is to use multi-threads to spawn processes. For example, I first create 10 threads in the main process, and in each thread, it spawns a process and waits for its finish, then it spawns and waits another process. This progress goes on until all jobs are finished. And I can make sure at any time there will not be more than 11 processes and 10 more threads. But the problem is I use pthread to create thread, and use fork to create process. It is very dangerous to use thread and fork at the same time. When you call fork in one thread, the new process will only have one thread, other threads are just stopped running, but the resources (like memory/lock) they occupy are not freed for reuse. For example, when one thread calls fork, another is using printf to print something to stdout. It seems fine, but in printf implementation, it may uses mutex lock to prevent different threads accessing the print buffer at the same time (that is the reason why we have clean output in multi-threads). If the thread using printf has just got the mutex lock when fork happens, the mutex lock will not be freed, so the result is the new forked process can not print anything to stdout (Because it will get the mutex lock for stdout which will never be freed).
After taught a lesson by pthread plus fork, I should find another way to realize parallelism. The idea is simple that if I can't spawn 1000 processes at once, what about 10 at a time? I can keep an array of spawned processes, once one child process is finished, I can spawn another one. By using an array of 10, I can make up to 10 spawned processes running at the same time. The work process is like below:

while (not all work finished) {
spawn processes until the running process array is full.
wait for some process finish, collect the result.
}

It is even simpler than the multi-thread one, but it works better. So I think it deserves to record it :-)