Tuesday, December 9, 2014

One way to realize parallelism

When we want to take benefits from multiprocessors, or just multitasking operating systems, we have to write multi-threads or multi-processes system. I met with such a condition before.
What I need to do is to spawn many processes in one process, each process does a small piece of job, gives feedback to the main process. There are about 1000 processes I need to spawn, so I may not spawn them at once as it makes heavy burden in system.
The first solution I can think up of is to use multi-threads to spawn processes. For example, I first create 10 threads in the main process, and in each thread, it spawns a process and waits for its finish, then it spawns and waits another process. This progress goes on until all jobs are finished. And I can make sure at any time there will not be more than 11 processes and 10 more threads. But the problem is I use pthread to create thread, and use fork to create process. It is very dangerous to use thread and fork at the same time. When you call fork in one thread, the new process will only have one thread, other threads are just stopped running, but the resources (like memory/lock) they occupy are not freed for reuse. For example, when one thread calls fork, another is using printf to print something to stdout. It seems fine, but in printf implementation, it may uses mutex lock to prevent different threads accessing the print buffer at the same time (that is the reason why we have clean output in multi-threads). If the thread using printf has just got the mutex lock when fork happens, the mutex lock will not be freed, so the result is the new forked process can not print anything to stdout (Because it will get the mutex lock for stdout which will never be freed).
After taught a lesson by pthread plus fork, I should find another way to realize parallelism. The idea is simple that if I can't spawn 1000 processes at once, what about 10 at a time? I can keep an array of spawned processes, once one child process is finished, I can spawn another one. By using an array of 10, I can make up to 10 spawned processes running at the same time. The work process is like below:

   while (not all work finished) {
      spawn processes until the running process array is full.
      wait for some process finish, collect the result.
   }

It is even simpler than the multi-thread one, but it works better.  So I think it deserves to record it :-)

No comments:

Post a Comment