2011-05-17

Parallel Collections in Scala 2.9.0

Just played around with Scala 2.9.0's parallel collections and so far i am really happy with it. I did not spend more than an hour just scratching on the surface of it but here is a little example usage and comparison between parallel and non parallel collections.


First I needed to have a timer function that can take the time of each run and also do som JIT warmup.


  def time(name: String, func:  => Unit) : Long = { 
    func // JIT warmup
    val then = System.currentTimeMillis
    func
    val timeDiff = System.currentTimeMillis - then
    printf("time for %-8s: %5dms\n", name, timeDiff)
    return timeDiff
  }



This higher order function takes a name of the run and a function to execute. The function that it is going to execute is without any parameters or return values. The first line is just a JIT warmup to get a better comparison between the runs.


I also needed some test cases to try it out so i thought that regexp would be a nice fit. There is a file called words on many unix/linux systems which contains a lot of dictionary words and that file was perfect for this test. I read all the lines from this file using the Source class.

  var lines = Source.fromFile("/usr/share/dict/words").getLines.toList

Now to the good stuff. To get a parallel version of this list of lines it is enough to just type:

   val par = lines.par

It is actually not a List but a ParSeq but that is not important for this specific test.  Then it comes down to running a test with the timer function and the regular expression.

  val scaleRe = """.*scala.*""".r
  val normalTime = time("normal", { lines.foreach(scaleRe.findFirstIn(_)) })

  val parTime    = time("parallel", { par.foreach(scaleRe.findFirstIn(_)) })

The results from test on my computer (Intel(R) Xeon(R) CPU W3530  @ 2.80GHz) looks like this:

98569 lines
time for normal  :    93ms
time for parallel:    24ms
parallel is 3.88 times faster


Not surprisingly the parallel version is much faster. Below is the full source of the test.

package test 

import scala.io.Source

object ParallelTest {

  def time(name: String, func:  => Unit) : Long = { 
    func // JIT warmup
    val then = System.currentTimeMillis
    func
    val timeDiff = System.currentTimeMillis - then
    printf("time for %-8s: %5dms\n", name, timeDiff)
    return timeDiff
  }
 
  def main(args : Array[String]) {
   
    var lines = Source.fromFile("/usr/share/dict/words").getLines.toList
    val par   = lines.par
   
    val scaleRe = """.*scala.*""".r
   
    printf("%d lines\n", lines.size)

    val normalTime = time("normal", { lines.foreach(scaleRe.findFirstIn(_)) })
    val parTime    = time("parallel", { par.foreach(scaleRe.findFirstIn(_)) })
   
    printf("parallel is %.2f times faster\n", normalTime.toDouble / parTime )
  }
  
}

2011-05-09

Protecting your users passwords no matter what type of service

The recent attacks on the Sony Playstation Network showed that even the biggest of companies fail to uphold high security and protection of the users information.

The most important part of an account is the username and password since it allows you to log in to that account and make purchases or other monetary transactions. It is also a very important that the password is kept safe because it is very likely that the user is reusing the password for different sites and services.

If the hacker gets the hold of the e-mail address along with the password it is probably not rare that she can use the same password to log in to the users e-mail account. With a couple of searches it is the possible to get access to other accounts and financial information.

So even if you are building a small system that has limited functionality and you think that your system wont be damaged so much if a hacker breaches the security then think again. If the hacker can get hold of just a couple of passwords she can probably use that information to get access to other services. If this is an internal corporate service you can not take any risks since many attacks come from the inside.

Security related algorithms are constantly replaced by newer more secure ones so when you are building a new service or web page that captures the users password make sure that it uses the latest and greatest algorithm.

At the time of writing SHA-256 seems to be the recommended hashing algorithm but there seems to be a new one called SHA-3 just around the corner. When you hash your users passwords do not forget to put some salt in it.

2011-04-20

Compiling GCC 4.6 in Ubuntu 10.10 for C++11 Development

GCC 4.6 was released recently and it has a lot of new features. It is especially nice to see all the improvements in the C++11 (C++0x) area. This short guide will show you how you can download the source code and compile GCC 4.6 on your Ubuntu 10.10 installation. If you have another distribution you can probably still use some of the information in this guide.

1. Installing Perquisites

The following packages are required to be able to compile GCC 4.6.

  • libgmp3c2 libgmp3-dev - Library for arbitrary precision arithmetic
  • libmpfr-dev - Multiple precision floating-point computation developers tools
  • libmpc-dev - Multiple precision complex floating-point library development package
  • flex - A fast lexical analyzer generator
  • bison - A parser generator that is compatible with YACC
To install them just open a terminal window and type:

$ sudo apt-get install libgmp3-dev libgmp3c2 libmpfr-dev libmpc-dev flex bison

You can also use Synaptic or  Ubuntu Software Center.

2. Obtaining The Source Code

Go to the GCC mirrors page: http://gcc.gnu.org/mirrors.html and then select the appropriate mirror closest to you. Enter the releases/gcc-4.6.0 directory and download the gcc-4.6.0.tar.bz2 file.

Example:
$ wget ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-4.6.0/gcc-4.6.0.tar.bz2

Unpack the source code and go in to that directory with a terminal.  I have chosen to use /opt/gcc-4.6.0

3. Compiling The Source Code

The first thing you have to do is to configure the installation.

$ ./configure --prefix=/opt/gcc-4.6.0 --disable-bootstrap CFLAGS="-g3 -O0" --enable-languages=c,c++ --disable-multilib --program-suffix=-4.6

Run make and use the -j argument to make the compilation make better use of all your cpu:s, cores and hyperthreads.

$ make -j6

Now it is time to run make install to install it to the directory that you specified as prefix during the configure step.

$ make install


Now everything should be ready to test.

4. Testing GCC 4.6 With A C++11 Program

Enter the following source code in to a file called threading.cc. It is a simple test of C++11's threading and lambda functionality. It is a good test to see if everything works since it contains both the new language features and also makes use of the latest libraries.

#include <thread>
#include <iostream>

using namespace std;

int main(int argc, char** argv)
{

    auto th = new thread([] { cout << "Hello, World!\n"; });
    th->join();

    delete th;

    return 0;
}

Compile with the following command:

$ ./c++-4.6 --std=c++0x -g -Wl,-rpath -Wl,/opt/gcc-4.6.0/lib64 -L/opt/gcc-4.6.0/lib64 -lpthread threading.cc -o threading

Basically it is telling the compiler to use c++0x as standard and also that it should use the standard library that you just compiled together with gcc. If you want the commands to be available in any directory you will have to put the bin directory in to the PATH variable. It is however important to know that you have to make sure that you use the compiled libraries instead of the ones that are installed with your distribution. There might be a better way to handle the libraries but I tried to do this without messing up my current gcc installation.