One common trick in the Wide World of Windows (Also known as the WWW, not to be confused with the World Wide Web or Weasley’s Wizard Wheezes) is to prevent the user from running more than 1 instance of an application.  Try it sometime, open up something like Microsoft Word.  Then, try to open it again and you’ll find that it doesn’t work, it simply refocuses the previous session.  It’s a pretty useful trick and Windows gives you lots of ways to make this happen.

However, on a real multi-user operating system like Linux or Unix, such behavior is shunned.  You expect multiple copies of everything to be open at once since you may have multiple users running the program simultaneously.  Every now and then, however, it’s advantageous to get this similar behavior.  I ran into a case-in-point this week.  A user was running ezViz, my pride and joy, on one of the supercomputers.  He submitted several jobs, each one running several instances of ezViz serially.  Unfortunately, they all wound up on the same node, and he wound up running 8 versions simultaneously, each one wanted 3.5G of a 16G machine.  It wasn’t pretty. 

After thinking about it for a while, I thought it might be helpful to give the user a way to specify a maximum number of simultaneous runs.  Runs beyond that number would simply wait their turn.  First attempts were to implement something like ‘ps‘ to check the running processes and see how many were running.  Not only is this difficult, but there are alot of race conditions and such from multiple versions trying to check simultaneously. 

I did some research and a friend of mine suggested using a shared memory block with shmget to have each process ‘register’ itself in the shared space.  Each process could register and know how many other processes were going, and then decide whether to wait or continue.  While that’s definately an option, a similar but far better method is Semaphores.  Come on inside for more..
[tag:linux][tag:unix][tag:source][tag:semaphore]

Semaphores are a common construct in parallel processing.  You can use a semaphore an a type of exclusive barrier.  You usually use a semaphore to indicate the status of a non-parallel resource.  For example, if you had several processes that all wanted to use the sound card, but the sound card could only support 1 sound at a time, then you could use a semaphore.  Each process would request the semaphore, and one that got it would continue on it’s merry way until it gave it back.  All the others would wait (at 0% CPU) until the semaphore was available again, and so on and so forth.  It’s kinda like a lending library situation, just a way to keeping track of a finite resource.

In my case, it would be a little more complicated than simply "Available" or "In Use".  I would need to let the user specify how many instances of ezViz they felt were ok, and then block after that.  Luckily, most semaphore implementations allow such things.  As an example, here’s a trivial program that blocks when more than 2 instances are run at a time.

#include <sys/types.h>
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/stat.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <pwd.h>
#include <fcntl.h>
#include <limits.h>

void Block(void) {
    key_t semkey;
    int semid, pfd, fv;
    struct sembuf sbuf;
    char *lgn;
    char filename[PATH_MAX+1];
    struct stat outstat;
    struct passwd *pw;

    pw = getpwuid(getuid());
    printf("HomeDir: %sn", pw->pw_dir);
    if ((semkey = ftok(pw->pw_dir, ‘a’)) == (key_t) -1) {
      perror("IPC error: ftok"); exit(1);
    }

    /* Get semaphore ID associated with this key. */
    if ((semid = semget(semkey, 0, 0)) == -1) {

      /* Semaphore does not exist – Create. */
      if ((semid = semget(semkey, 1, IPC_CREAT | IPC_EXCL | S_IRUSR |
          S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)) != -1)
      {
          printf("Creating a new semaphore…n");
          /* Initialize the semaphore. */
          sbuf.sem_num = 0;
          sbuf.sem_op = 3;  /* This is the number of runs
                               without queuing. */
          sbuf.sem_flg = 0;
          if (semop(semid, &sbuf, 1) == -1) {
              perror("IPC error: semop"); exit(1);
          }
      }
      else if (errno == EEXIST) {
          if ((semid = semget(semkey, 0, 0)) == -1) {
              perror("IPC error 1: semget"); exit(1);
          }
      }
      else {
          perror("IPC error 2: semget"); exit(1);
      }
    }

    printf("Got the semaphore…n");
    sbuf.sem_num = 0;
    sbuf.sem_op = -1;
    sbuf.sem_flg = SEM_UNDO;
    if (semop(semid, &sbuf, 1) == -1) {
          perror("IPC error: semop"); exit(1);
    }
}

void main(void) {
    int i;
    Block();
    printf("Ok, i’m at the end of the app…n");
    for(i=0; i<10; i++) {
        printf("%i…n", i);
        sleep(1);
    }
}

The "main" function simply blocks, then counts to 10 over 10 seconds.  The "Block" function handles creating the semaphore (if it doesn’t exist yet), and retrieving it.  We initialize it’s value to 3, meaning that 2 processes can run simultaneously and the 3rd will block.  Using the SEM_UNDO flag means the semaphore will be automatically released in the event the application crashes or something bad.

A few other interesting things of note:

  • Semaphores are referenced via their "key", which is typically created with the ftok function.  The ftok function takes a filesystem path and an ID number to generate the key.  By passing it the user’s home-directory (retrieved via pw->pw_dir), each user has their own semaphore.  For a global semaphore, we could simply pass it / or /tmp.
  • The semaphore functions (semget, semop) all return fairly descriptive error messages, so perror is a great tool
  • At no point do we release the Semaphore.  We let that happen automatically with the SEM_UNDO flag.
  • At no point do we destroy the Semaphore. 

This last one is important.  Since we never destroy the semaphore, then we can only really set the limit once.  After that, we pretty much have to reboot the machine to clear it.  In this case, the semaphore identified as ‘a’ against our home directory is forever limited at 3.

That is, unless we provide a way to destroy them.  That’s next.
[page_break]
So, once we have our code working with a semaphore, we really need to provide a way for the user to reset this number or delete the semaphore altogether.  This will fix the (unlikely) case of a semaphore being "stuck" (the SEM_UNDO flag didn’t properly release it), or the user simply wants to change the number of simultaneous runs.

The following code can erase a semaphore:

include <sys/types.h>
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/stat.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <pwd.h>
#include <fcntl.h>
#include <limits.h>

void main(void) {
    key_t semkey;
    int semid, pfd, fv;
    struct sembuf sbuf;
    char *lgn;
    char filename[PATH_MAX+1];
    struct stat outstat;
    struct passwd *pw;

    pw = getpwuid(getuid());
    if ((semkey = ftok(pw->pw_dir, ‘a’)) == (key_t) -1) {
      perror("IPC error: ftok"); exit(1);
    }

    /* Get semaphore ID associated with this key. */
    if ((semid = semget(semkey, 0, 0)) == -1) {
      printf("Specified Semaphore doesn’t exist.n"); exit(1);
    }

    printf("Waiting for semaphore…n");
    sbuf.sem_num = 0;
    sbuf.sem_op = -1;
    sbuf.sem_flg = SEM_UNDO;
    if (semop(semid, &sbuf, 1) == -1) {
          perror("IPC error: semop"); exit(1);
    }

    if (semctl(semid, 0, IPC_RMID) == -1) {
        perror("IPC error: semctl"); exit(1);
    }
    printf("Semaphore Removed.n");
}

Alot of this will seem familiar from the first one.  The only real differences are as follows:

  • We don’t bother to create the semaphore if it already exists.  That would be silly, since all we plan to do is destroy it anyway.
  • At the end, we use "semctl" with "IPC_RMID" to remove the semaphore from the system.

Now, this piece of code will wait until the semaphore is properly released and then remove it.  So the next time the previous code is run, it will recreate the semaphore, reinitializing it with the desired number.  I’ve tried it here, and it works beautifully.

In fact, you can even use this to perform some other interesting operations.  If you remove the code to retrieve the semaphore before it’s destroyed, then it will destroy it just fine.  As an interesting side-effect, tho, you’ll see all the processes that were blocked waiting for it die.  The semop function will exit with an error code indicating it was destroyed.  In the previous example, this leads to a quick error message and an exit(1), meaning it will effectively kill every processes currently queued up while letting the running ones finish.

So, is any of this particularly useful?  Probably not.  The right solution (in my case) is to simply talk to the user and show him a better way to get his work done.  But adding this to ezViz does give me an interesting method of simply kicking off 128 processes with a limit of 5, and watching them all finish naturally (without having to manage a controller script to make it happen).

No related posts.