pthread_create parameter copying

Today, our professor in CSCI 5451 - Introduction to Parallel Computing lectured about POSIX Threads Programming, better known as Pthreads. This programming interface allows for parallel computation within a process (hence the same address space) and provides functionality for spawning, synchronizing, and joining the separate streams of instructions (threads).

To spawn a new thread, you need to call the pthread_create function. This function takes thread attributes, a start routine and a start routine argument as input and returns a status code and a thread pointer. For full reference, see the pthread_create manpage.

During the class, a question was raised whether the argument sent to the start routine was copied or not. Instead of reading the documentation, I decided to test this myself. I started out with the following C program:

#include <stdio.h>
#include <pthread.h>

#define NUM_THREADS 8

void *  
thread_report(void *s)  
{
  int *arg = s;
  sleep(1);
  printf("Address of argument: %x\n", arg);
  printf("Value of argument: %i\n", *arg);
}

int  
main(int argc, char **argv)  
{
  int i, j;
  pthread_t threads[NUM_THREADS];

  /* fork */
  for (i = 0; i < NUM_THREADS; ++i) {
    pthread_create(&threads[i], NULL, thread_report, &i); 
  }

  /* join */
  for (j = 0; j < NUM_THREADS; ++j) {
    pthread_join(threads[j], NULL);
  }
  pthread_exit(NULL);
}

This program spawns a total of eight new threads, sending the current loop iteration value to the thread_report function. If the value is copied, every thread should print different addresses and values. The thread was also programmed to sleep one second before printing in order to make sure all of the threads was spawned before reporting.

A separate integer j was used for joining the threads to avoid changing the i variable after the threads were spawned.

Running the program gave the following output:

Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  
Address of argument: ee5dd678  
Value of argument: 8  

In other words, the values are not copied. All threads refer to the same memory location which is the stack allocated i on the main thread. This does also mean if i goes out of scope while the threads are running, the value sent to each thread will become undefined.

This makes perfect sense. If pthread_create was to copy the value, it would need to know the type of the start routine argument, or more specifically the size of that argument's type. It needs this in order to know how much memory to copy. C, nor the C compiler, cannot do this for you.

To pass the loop iterator variable to the spawned threads, I allocated an integer array in order to save the start routine arguments. The main function now looks like this:

int  
main(int argc, char **argv)  
{
  int i, j;
  pthread_t threads[NUM_THREADS];
  int thread_arguments[NUM_THREADS];

  /* fork */
  for (i = 0; i < NUM_THREADS; ++i) {
    thread_arguments[i] = i;
    pthread_create(&threads[i], NULL, thread_report, &thread_arguments[i]); 
  }

  /* join */
  for (j = 0; j < NUM_THREADS; ++j) {
    pthread_join(threads[j], NULL);
  }
  pthread_exit(NULL);
}

The output of this altered program now reads:

Address of argument: c4ceaef4  
Value of argument: 1  
Address of argument: c4ceaf00  
Value of argument: 4  
Address of argument: c4ceaef8  
Value of argument: 2  
Address of argument: c4ceaefc  
Value of argument: 3  
Address of argument: c4ceaef0  
Value of argument: 0  
Address of argument: c4ceaf04  
Value of argument: 5  
Address of argument: c4ceaf08  
Value of argument: 6  
Address of argument: c4ceaf0c  
Value of argument: 7  

As you can see, evey thread does now have its own argument address and value.

This is a simple problem, and many will probably think it's obvious that Pthreads pass variables by reference. However the consequences of this semantics are a little different than for serial code. As you can see, we are dependent upon keeping the thread_arguments in scope during the thread execution, which would not have been needed in serial code.

For full source code, please see https://github.com/hawkaa/csci5451examples/tree/master/pthreadvariable_copy.