Stroke Width Transform algorithm for Python

11 August 2017 marrrcin python , computer-vision , swt , c

Stroke Width Transform (SWT) is a computer vision algorithm (actually, it's an image operator) which can be used in the task of detecting text in images. This is a non-trivial task, especially for camera pictures, but SWT performs pretty well in this field.

TL;DR

How to wrap libccv computer vision library for using SWT in Python. Wrapper was built using SWIG.

Intro

My usecase for the SWT was the following: split big dataset of unknown images into ones that contain text (most likely documents / scans) and others. Easy, right? Yes, as long as you have fast implementation of this algorithm. Unfortunately, I needed to use Python, because the image filtering was only a part of bigger solution. I search here and there and of course, there are some implementations in Python, but they are not ready for production. One of them i.e took 5 minutes to process 2000px x 3000px image and it allocated gigabytes of memory. Just a single image. NOPE for that.

So... I searched further, but finnaly I haven't found anything either working or performat. I decided to find any other language implementation of SWT and I've found libccv. Quick look at the repository - it's written in C. OK, not my kind of language, but I gave it a try and surely, it will be fast. I've compiled example aplication which took image as an input and producted output in the form of 4-element tuples (x, y, width, height), each of which was an rectangle coordinates in which the text on the image is. It produced satysfying results, so I decided to use it from Python. How? Let's dig into it!

Prerequisites

  • Python 2.7.* on Linux (tested on Ubuntu 16.04 x64)
  • Python packages: scikit-image, numpy, matplotlib (only for preview)
  • gcc compiler
  • SWIG
  • basic knowledge of C

SWT example breakdown from libccv

For starters, take a look at the code from https://github.com/liuliu/ccv/blob/07fc691c5344940751011c3af96d0ab202b1b4e6/bin/swtdetect.c The code comes from libccv repository and it's an example of using SWT text detection. It is a console application, which takes file name as an argument, runs the algorithm and outputs coordinates of rectangles with text to standard output.

The probles with it are the following:

  • it is a console application
  • it takes file name as an input
  • it outputs to stdout

My requirements for the function (which I wanted to call from Python) were the following:

  • it is a standalone library / module
  • it takes blob of an image as an input (do not write anything to disk)
  • it outputs returning array-like object

SWT function wrapper in C

My requirements are pretty simple to transform to function header:

int* swt(char *bytes, int array_length, int width, int height);

Compiling the code above to plain english: take an array of bytes, it's length (welcome to C world!), width and height of an image, do some magic and output array of numbers.

In order to implement and compile the wrapper without issues, just checkout libccv from it's repository and put your files in libccv/lib folder. This approach is suggested by libccv author.

Implementation of SWT for Python wrapper:

#include "ccv.h"
#include <jpeglib.h>
#include "io/_ccv_io_libjpeg.c"
#include <sys/time.h>
#include <ctype.h>
#define SUCCESS 1
#define FAILURE 0

int* swt(char *bytes, int array_length, int width, int height){
    int *result_array;
    int status = FAILURE;
    ccv_enable_default_cache();
	ccv_dense_matrix_t* image = 0;

	FILE *stream;
	stream = fmemopen(bytes, array_length, "r");
	if(stream != NULL){
        int type = CCV_IO_JPEG_FILE | CCV_IO_GRAY;
        int ctype = (type & 0xF00) ? CCV_8U | ((type & 0xF00) >> 8) : 0;
        _ccv_read_jpeg_fd(stream, &image, ctype);

        if (image != 0)
        {
            ccv_array_t* words = ccv_swt_detect_words(image, ccv_swt_default_params);
            if (words)
            {
                int i;
                int result_idx = 1;
                result_array = (int*)malloc((4 * words->rnum + 1) * sizeof(int));
                result_array[0] = 4 * words->rnum;
                for (i = 0; i < words->rnum; i++)
                {
                    ccv_rect_t* rect = (ccv_rect_t*)ccv_array_get(words, i);
                    result_array[result_idx++] = rect->x;
                    result_array[result_idx++] = rect->y;
                    result_array[result_idx++] = rect->width;
                    result_array[result_idx++] = rect->height;
                }
                ccv_array_free(words);
                status = SUCCESS;
            }
            ccv_matrix_free(image);
        }
        ccv_drain_cache();
	}


    if(status != SUCCESS){
        result_array = (int*)malloc(1 * sizeof(int));
        result_array[0] = 0;
    }
    return result_array;
}

Explanation

There are some hacky things in there, I will explain them one by one.

FILE *stream;
stream = fmemopen(bytes, array_length, "r");

My function takes array of bytes as an input, but none of the ccv_read function overloads accepts this kind of data. I've found great fmemopen standard C function, which transforms char* buffer into file handler. But... none of the ccv_read function overloads accept this kind of data (AGAIN!). Yeah, but they read the JPEGs from files somehow!

That lead me to next part:

#include <jpeglib.h>
#include "io/_ccv_io_libjpeg.c"

In the _ccv_io_libjpeg.c file, there's a function which accepts FILE* handler and reads JPEG image into libccv image format. I wanted to use it.

int type = CCV_IO_JPEG_FILE | CCV_IO_GRAY;
int ctype = (type & 0xF00) ? CCV_8U | ((type & 0xF00) >> 8) : 0;
_ccv_read_jpeg_fd(stream, &image, ctype);

Libccv needs to have grayscale image as an input for SWT, thus some bitwise magic to set proper flags.

int *results_array;
result_array = (int*)malloc((4 * words->rnum + 1) * sizeof(int));
result_array[0] = 4 * words->rnum;

Dynamic arrays were (and still are) such pain in C. SWT outputs 4 numbers for every match, so the size of the array should be 4 times bigger. The +1 is my hack to output everyhing in a single array and read it in Python. I used this one additional index to output the length of the whole array (I will drop this value in the upcoming SWIG wrapper, wait for it!).

if(status != SUCCESS){
    result_array = (int*)malloc(1 * sizeof(int));
    result_array[0] = 0;
}

When something goes wrong, I wanted to output something anyway, so I just create 1 element array, with 0, meaning, there is nothing to push to Python from the wrapper.

Python wrapper for SWT using SWIG

SWIG (http://www.swig.org/) seems to be really old, but it does the job pretty well (with really small amount of code). Plus it's still active in 2017. It allows to wrap any C code into Python module really fast - that's exactly what I wanted. You can install SWIG by using the following command:

sudo apt-get install swig

In order to build the wrapper, you need to create SWIG interface file (I named it ccvwrapper.i):

%module ccvwrapper

%{
#include "ccvwrapper.h"
%}

%typemap(out) int* swt {
	int i;
	$result = PyList_New($1[0]);
	for(i = 0; i < $1[0]; i++){
		PyObject *o = PyInt_FromLong($1[i+1]);
		PyList_SetItem($result, i, o);
	}
	free($1);
}

%include "ccvwrapper.h"

The most important path in here is the typemap delaration. This mapping converts my dynamic C array into Python list. When you want to return array from C code and use it in Python, you need to write this kind of code (this does not apply for basic types - they are automatically mapped to Python types). Header file ccvwrapper.h contains only my swt function declaration.

Building the wrapper

Just a simple bash script, following the convention from SWIG and libccv documentation. NOTE: remember to prefix output C library name with underscore _ ! SWIG requires it for building Python modules.

#!/bin/bash
swig -python ccvwrapper.i
gcc -fpic -c ccvwrapper.c ccvwrapper_wrap.c ccv_algebra.c ccv_basic.c ccv_cache.c ccv_classic.c ccv_io.c ccv_memory.c ccv_output.c ccv_resample.c ccv_sift.c ccv_swt.c ccv_transform.c ccv_util.c ./3rdparty/sha1/sha1.c -I/usr/include/python2.7

ld -shared ccvwrapper.o ccvwrapper_wrap.o ccv_algebra.o ccv_basic.o ccv_cache.o ccv_classic.o ccv_io.o ccv_memory.o ccv_output.o ccv_resample.o ccv_sift.o ccv_swt.o ccv_transform.o ccv_util.o sha1.o -ljpeg -o _ccvwrapper.so

cp ccvwrapper* ~/PycharmProjects/path_to_your_project
cp _ccvwrapper* ~/PycharmProjects/path_to_your_project

Build script uses swig command to compile interface file into two wrapepr files:

  • ccvwrapper_wrap.c
  • ccvwrapper.py

C file needs to be compiled along with the custom code and it's dependencies. After compilation, we need to link everything into shared library with name _ccvwrapper.so. For the convenience I added two cp commands just to copy my wrapper into my project directory.

Using SWT Python wrapper

The wrapper is ready to be used. After putting all of the outputs from the build process into the project folder, you will be able to use the wrapper without any problems. Example code:

from __future__ import print_function
import ccvwrapper
import numpy as np
from skimage import draw
from skimage.io import imread, imshow, imsave
from matplotlib import pyplot as plt


def rectangle_perimeter(r0, c0, width, height, shape=None, clip=False):
    rr, cc = [r0, r0 + width, r0 + width, r0], [c0, c0, c0 + height, c0 + height]

    return draw.polygon_perimeter(rr, cc, shape=shape, clip=clip)


if __name__ == "__main__":
    image_name = "test_input.jpg"
    bytes = open(image_name, "rb").read()
    swt_result_raw = ccvwrapper.swt(bytes, len(bytes), 1024, 1360)
    swt_result = np.reshape(swt_result_raw, (len(swt_result_raw) / 4, 4))

    image = imread(image_name, as_grey=False)
    for x, y, width, height in swt_result:
        for i in xrange(0, 3): # just to make lines thicker
            rr, cc = rectangle_perimeter(y + i, x + i, height, width, shape=image.shape, clip=True)
            image[rr, cc] = (255, 0, 0)

    imshow(image)
    imsave("result.jpg", image)
    plt.show()

Results

Input
SWT input image (compressed)
Output
SWT output image (compressed)

You can download uncompressed input and output images here:

Summary

Seems pretty easy, right? With a little amount of work with, I was able to leverage power of SWT algorithm from Python with the speed of low-level C code.

Please comment if you encounter any problem and share if you like it!

Additional links & resources

Comments