?

Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Perl, threading, locking and semaphores - Ed's journal
sobrique
sobrique
Perl, threading, locking and semaphores
So, being a fan - as I am - of Perl, I've had a reason to take a bit of a look at threaded code.
Threading is one possible implementation of parallel code and - for my purposes - is quite useful when you've got multiple things going on, which require 'something else' to respond.

Such as - for example - if you've got to log in to a lot of different servers, to perform a simple task - the 'login' takes more real time, than it does 'processing time' - so by threading, you end up with the task being accomplished faster.

Perl is actually quite easy to 'thread' with - the only really hard part is that your perl interpreter must be compiled to support threading - and if it doesn't by default, then it's a bit of hassle to recompile it. The good news is, current versions of perl seem to by default (my small sample of 'a couple of systems' I didn't need to rebuild).

The basics of how to do it, can be found in 'perldoc perlthrtut' - but it goes a bit like this:

add 'use threads;' to the start of your code.

In your perl code, create a sub routine, that will run as a thread.
sub thread_test_subroutine
{
  my ( $arg_1, $arg_2 ) = @_;
  print $arg_1;
  sleep ( rand(10) );
  print $arg_2;
  return $arg_1 + $arg_2;
}

When you call that subroutine, rather than doing so in the normal fashion, do so using
threads -> create.

E.g.
my $thread = threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );

And for the sake of neatness, you need to 'join' the thread (joining is perlish for 'wait for it to finish, and get any return codes).

threads -> join ( $thread );

And it's just like that.
Because I'm a smartarse, I wanted to get a bit more clever - you can extend this idea for creating several threads, to all run in parallel.

So for example:
for ( my $count=0; $count < 10; $count++ )
{
  threads -> create ( \&thread_test_subroutine, ( $first_arg, $second_arg ) );
}

foreach my $thread ( threads -> list() )
{
  if ( $thread -> tid() ) 
  {
    my $result = $thread -> join();
    print $thread -> tid(), " returned ", $result, "\n";
  }
}


Which creates 10 threads, waits for all 10 to 'do their thing' and then joins them. Hopefully you can see where that gets a bit handy, if you've got a lot of networked devices to do stuff with.

But the hard part when messing with threads, is things like communicating between them. There's two libraries that I've been looking at today for that purpose.

You see, when I create a 'load' of threads, the last thing I want to do is to do so in an open ended fashion - 100 threads for 100 servers might be ok.
10,000 threads for 10,000 servers might cause a bit of a problem.

That's where Thread::Semaphore comes in, and - because Thread::Semaphore isn't part of the base distribution - I've also been looking at threads::shared.

Thread::Semaphore 'lets you create a 'shared' counter. (Which defaults to 1).
There's two bits to it - 'down()' and 'up()'.
down() is used to decrease the semaphore, and - if it's zero - will wait until it can do this.
up() increases the semaphore.

So if you insert above:

my $resource_limit = Thread::Semaphore -> new ( 5 );

And within each subroutine, used:

$resource_limit -> down();
And
$resource_limit -> up();
when done, you'd end up with a scenario that - with 10 threads 'existing' - you'd only actually have 5 'running' at any time.

You can also have your resource limit locally, within a loop:
sub thread_test_subroutine
{
  my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
  print "$host";
  print $arg_1;
  #check there's a resource available to do this bit
  $semaphore -> down()
  sleep ( rand(10) );
  $semaphore -> up()
  print $arg_2;
  return $arg_1 + $arg_2;
}

my @server_list = ( "one", "two", "three", "four" );

foreach my $host ( @server_list )
{
  my $resources_per_host = Thread::Semaphore -> new ( 2 );
  for ( my $count=0; $count < 10; $count++ )
  {
    thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
  }
}

Which passes your 'semaphore' into the thread as a reference, such that you'll only have to 'active' threads per server in your list.

If you don't have Thread::Semaphore available, you can 'fake it' by using a (thread) shared variable, and a lock.

E.g.

sub thread_test_subroutine
{
  my ( $semaphore, $host, $arg_1, $arg_2 ) = @_;
  print "$host";
  print $arg_1;
  #check there's a resource available to do this bit
  {
    lock ( $semaphore );
    sleep ( rand(10) );
  }
  #lock is released, because it's now out of scope. 
  print $arg_2;
  return $arg_1 + $arg_2;
}

my @server_list = ( "one", "two", "three", "four" );

foreach my $host ( @server_list )
{
  my $resources_per_host : shared;
  for ( my $count=0; $count < 10; $count++ )
  {
    thread -> create ( \&thread_test_subroutine, ( \&resources_per_host, $host, $first_arg, $second_arg ) );
  }
}


Not quite as good, as you can only have two states - 'in use' or 'not' per host. But does still allow for a bit of crude throttling.

But it was somewhat easier than I thought to do something that was practically useful, using threaded perl. These snippets are for my own reference, rather than practical usefulness, and bear in mind - if you're playing with parallel programs, you can end up with all kinds of exciting and interesting things going on, if you're not careful.

Tags: ,

2 comments or Leave a comment
Comments
warmage From: warmage Date: January 27th, 2012 01:31 pm (UTC) (Link)
Another paradigm along these lines is to have one thread that's a message pump, but I suspect that the join() method is doing it's best to prevent you from dealing with all the interlocking that goes on (and of course can go wrong) by having your op threads updating your msg thread. Whee!
sobrique From: sobrique Date: January 27th, 2012 06:04 pm (UTC) (Link)
Perl does support queues. It's a (thread safe, shared) FIFO data structure. That's something I could use instead of semaphores, or locks in order to implement resource control, but ... it's more or less the same thing, from a different angle.
2 comments or Leave a comment