@lexi.ch home, code, stuff

Language array processing speed comparison with 'selectionsort' 29.12.2015

Recently I implemented the very simple Selection Sort algorithm for solving some unrelated problem in PHP. After doing some simple tests to check if the algorithm was working, I gave it something to bite on: I produced a list with 100'000 integer numbers, which I fed the algorithm, and started the script.

After 10 minutes I decided to stop the still running script, as I thought something must be wrong with my algorithm. So I re-checked the code again, and came to the conclusion that the code must be correct. So what was happening here? Sorting some random 100'000 numbers with Selection Sort should surely not take that much time, even if the algorithm's performance is O(n2)?

Selection Sort is spending most of its time with array access: fetching an element at index, swap element X with element Y: It searches for the smallest element on each array loop and swaps it with the end of the sorted part of the list. So this algorithm's speed depends highly on the array index access speed. For programming languages that support native, in-line memory arrays (e.g. C, C++, Java), this should be a piece of cake. But what about the interpreted languages such as PHP, Python, Ruby?

So I decided to implement the very same algorithm in different programming languages, just to check if it was really the speed of the PHP script. The result was somewhat surprising: Not that I hadn't expected different results, but the scale of the differences is frappant. I never guessed that script/language speed would be THAT different!

I have collected the test results below. My test setup was as follows:

The test results

for executing selectionsort only on a list with 100'000 pre-generated int numbers

Language Compiler version sort time compiler flags
C++ Apple LLVM version 6.0 (clang-600.0.57) 3.54s g++ -O2
java javac 1.7.0_71 4.02s  
nodejs 0.12.0 4.05s  
groovy 2.4.1 127.71s groovyc -> jvm bytecode
ruby 2.2.1p85 276.91s  
python 3.4.3 382.77s  
php 5.5.30 441.08s  
php, SplFixedArray 5.5.30 407.88s  
php 7 7.0.1 160.78s  
php 7, SplFixedArray 7.0.1 171.92s  

Clearly, C++ is the winner, followed directly by native Java. What really surprised me was the result of nodejs: It seems that nodejs (V8) is highly effective in compiling javascript into native code.

I was a bit disappointed by groovy, as groovy also compiles to JVM bytecode, and has support for native int-Arrays (instead of the ArrayList Java collection). I didn't do any investigation in that fact, anyway.

ruby and python are in a range I expect from an interpreted script language: As they do a lot of type checking and array index checking, the array access is not as fast as in native languages.

And then there is PHP. Whoa, what happened to PHP? PHP is the only language that do not differ between numeric arrays and hash list: in PHP, every array is stored as hash list internally. So array access like $arr[1000] needs travsersing a hash list, instead of directly accessing the addressed memory. But it is still a very bad outcome: 125 times slower than the native C++ code. Ugly.

PHP also knows a "proper" array implementation, which is part of the SPL library: SplFixedArray allows integer-based arrays. I did a 2nd implementation with this data structure to compare it to the standard PHP array. To my disappointment, this was only about 10% faster than the native implementation.

So my last recent addition to this test was PHP 7, my last hope for PHP :-) And indeed, PHP 7 is more than twice as fast as its older companion, PHP 5.5. A new hopeTM for PHP! In fact, the native array implementation was slightly faster than the SplFixedArray implementation.

The code

All the code can be found here on github.

Creating the random numbers

Every test should have the same prerequisites, so I started by generating a sample file with random numbers. Each test runs on the same number sample, so that the results are comparable. Also, reading and storing into an array was not part of the measurement.

For generating the numbers I used a very simple C++ program:


//  main.cpp
//  create_random_numbers
//
//  Created by Alexander Schenkel on 18.03.15.
//  Copyright (c) 2015 Alexander Schenkel. All rights reserved.

#include 
#include 
#include 
#include 

using namespace std;

int main(int argc, const char * argv[]) {
    srand(time(NULL));
    int size = 100000;
    string filename = "/tmp/random_numbers.txt";
    
    // Fill file with random numbers:
    ofstream file;
    file.open(filename);
    for (int i = 0; i < size; i++) {
        int nr = rand() % size;
        file << nr << endl;
    }
    file.close();
    cout << "written file." <;>

The original PHP algorithm

The original PHP algorithm (only the selectoin sort part) looked as follows. This also works for an SplFixedArray implementation, by the way:

function selectionsort(&$arr) {
    $sIndex = 0;
    $smallestIndex = 0;
    $tmp = null;
    $count = count($arr);
    while ($sIndex < $count) {
        $smallestIndex = $sIndex;
        for ($i = $sIndex; $i < $count; $i++) {
            if ($arr[$i] < $arr[$smallestIndex]) {
                $smallestIndex = $i;
            }
        }

        if ($sIndex < $smallestIndex) {
            $tmp = $arr[$smallestIndex];
            $arr[$smallestIndex] = $arr[$sIndex];
            $arr[$sIndex] = $tmp;
        }
        $sIndex++;
    }
}

The same in C++

The C++ algorithm uses native C arrays instead of std::array: This way, I made sure to avoid framework overhead:

void selectionsort_arr(int arr[], size_t size) {
    size_t sIndex = 0;
    size_t smallestIndex = 0;
    int tmp;
    while (sIndex < size) {
        smallestIndex = sIndex;
        for(size_t i = sIndex; i < size;i++) {
            if (arr[i] < arr[smallestIndex]) {
                smallestIndex = i;
            }
        }
        if (sIndex < smallestIndex) {
            tmp = arr[smallestIndex];
            arr[smallestIndex] = arr[sIndex];
            arr[sIndex] = tmp;
        }
        sIndex++;
    }
}

Java

Also in Java, I used a plain int array instead of ArrayList or Vector:

public void selsort(int[] arr) {
        int sIndex = 0;
        int smallestIndex = 0;
        Integer tmp = null;
        int i = 0;
        int count = arr.length;

        while (sIndex < count) {
            smallestIndex = sIndex;
            for (i = sIndex; i < count; i++) {
                if (arr[i] < arr[smallestIndex]) {
                    smallestIndex = i;
                }
            }
            if (sIndex < smallestIndex) {
                tmp = arr[smallestIndex];
                arr[smallestIndex] = arr[sIndex];
                arr[sIndex] = tmp;
            }
            sIndex++;
        }
    }

Groovy

Groovy adds some nice features like type inference for Java: It makes Java more act like a Scripting Language, which compiles to JVM Bytecode. It turned out that it is not that easy to really get a native Java array, and the results show that still some more work is done on the array by Groovy:

def selectionsort(int[] arr) {
    int sIndex = 0;
    int smallestIndex = 0;
    int tmp;
    int i = 0;
    int count = arr.length;

    while (sIndex < count) {
        smallestIndex = sIndex;
        for (i = sIndex; i < count; i++) {
            if (arr[i] < arr[smallestIndex]) {
                smallestIndex = i;
            }
        }
        if (sIndex < smallestIndex) {
            tmp = arr[smallestIndex];
            arr[smallestIndex] = arr[sIndex];
            arr[sIndex] = tmp;
        }
        sIndex++;
    }
}

Ruby

def selectionsort(arr)
     sIndex = 0;
     smallestIndex = 0;
     tmp = nil;
     i = 0;
     count = arr.length;
    while (sIndex < count) do
        smallestIndex = sIndex;
        for i in sIndex...count do
            if (arr[i] < arr[smallestIndex])
                smallestIndex = i;
            end
        end
        if (sIndex < smallestIndex)
            tmp = arr[smallestIndex];
            arr[smallestIndex] = arr[sIndex];
            arr[sIndex] = tmp;
        end
        sIndex = sIndex+1;
    end
end

Python

def selectionsort(arr):
    sIndex = 0;
    smallestIndex = 0;
    tmp = None;
    i = 0;
    count = len(arr);
    while sIndex < count:
        smallestIndex = sIndex;
        for i in range(sIndex,count):
            if arr[i] < arr[smallestIndex]:
                smallestIndex = i;
        if sIndex < smallestIndex:
            tmp = arr[smallestIndex];
            arr[smallestIndex] = arr[sIndex];
            arr[sIndex] = tmp;
        sIndex = sIndex+1;