Thursday, November 27, 2014

How to write a binary search

Binary search algorithm is a simple and widely used search algorithm. It works for sorted arrays. However, it is also error-prone. When I write a piece of binary search code, I usually want to check each case to make sure everything is handled. Just like below:

int BinarySearch(int s[], int len, int key) {
  int low = 0, high = len - 1;
  while (low <= high) {
    int mid = low + (high - low) / 2;  // It is not easy to take care of the overflow.
    if (s[mid] > key) {
      high = mid - 1;
    } else if (s[mid] == key) {
      return mid;
    } else {
      low = mid + 1;
    }
  }
  return -1;
}

While it is right, I need to write two comparisons in the loop. What if only one comparison can be used?

int BinarySearch(int s[], int len, int key) {
  int low = 0, high = len - 1;
  while (low + 1 < high) {
    int mid = low + (high - low) / 2;
    if (s[mid] > key) {
      high = mid - 1;
    } else {
      low = mid;
    }
  }
  // We need to test both low and high, as both of them may be the result.
  if (low < len && s[low] == key)
    return low;
  if (high >= 0 && s[high] == key)
    return high;
  return -1;
}

But I just read a piece of binary search code written in <Beautiful code>, as below.

int BinarySearch(int s[], int len, int key) {
  int low = -1, high = len;
  while (low + 1 < high) {
    int mid = low + (high - low) / 2;
    if (s[mid] > key) {
      high = mid;
    } else {
      low = mid;
    }
  }
  // We only need to test for low, because high is always above result.
  if (low != -1 && s[low] == key) {
    return low;
  }
  return -1;
}

Of course, all of the three can be used, and have little difference with each other. But it may be fun to compare them.

Monday, November 24, 2014

Reading "Inside c++ object model"

While it is easy for us to imagine how c language code is translated into assembly code and corresponding machine code, things become complicate in c++ language.

C++ programming styles

C++ language brings at least four programming styles: procedure programming, object based programming, object oriented programming and generic programming.

1. Procedure programming style is to keep compliance with c, allowing us to keep a bunch of data, make some operations on the input data like a pipeline, and produce whatever data we want. While It is clear in itself, the problem is the real world is complex. It becomes harder and harder to maintain and develop when the data and operation becomes more and more complex. That's the reason why object oriented programming languages are invented.

2. Object based programming style gives the concept of object, which binds data and operations on the data together. In procedure programming, what you can use are only the built-in types like int, float, char. But in object based programming style, you can build more powerful types called abstract data type(ADT). You can define the data inside ADT and operations supported by ADT. You can construct ADT based on other ADTs. ADTs are easier to use and maintain than built-in types, because it provides an important concept in computer science called Abstract Layer. Encapsulation and interface are inherently included in it.

3. Object oriented programming style is even more powerful than object based programming. Object oriented programming supports a concept called polymorphism. Derived class can implement operations in a different way of the base class. It means user can use the same code to deal with both base class and derived classes. It becomes easier to extend and maintain, and does good to code reuse. Many interesting stories happen in object oriented programming because of inheritance and polymorphism.

4. Generic programming style is believed to be an extreme programming style for code reusing. You can define a template for a class or a function, without deciding the real data type it needs to deal with until instantiation time. While it is more familiar to many c++ programmers, it is widely use in implementing c++ standard library.


How to implement C++ ?

Well, how to implement c++ is big question which is better left for c++ standard committee and compiler producers. However, we c++ programmers always can't help to wonder what is going on under c++ programming language. How is the constructors and destructors be called? How to implement inheritance and even multiple inheritance? How does polymorphism happen? How is exception handling implemented? Absolutely no magic is here, only c++ compiler, linker/loader, runtime library work together to fulfill all the functions supported by c++.
The book <Inside c++ object model>, written as early as 1996, is trying to give us an insight of how c++ features are implemented by the compilers. I was astonished by the content of the book. It just said things like how can we synthesis default constructor, or how can we record data to support inheritance. It is a big challenge for the c++ compilers and runtime libraries to fulfill the c++ features, but they are very smart, and do a lot of work behind that. So I think it may be fun to study the details.
As time goes on, more powerful c++ features are supported, the compilers are becoming more and more smart. Many of the technologies described in the book may be out-of-date. So I should prove the implementation carefully.
I plan to partition the study into several chapters.
Chapter1: How to implement a class? It is a class with only data members and member functions.
Chapter2: How to implement class inheritance? We study three types of inheritance: single inheritance, multiple inheritance, and virtual inheritance.
Chapter3: How to implement polymorphism? We focus on the implementation of dynamic binding of virtual functions.
There may be further data added after this, according to the study condition.