CSE 30 -- Lecture 13 -- Nov 12

Lecture

Threads are virtual CPUs. The idea is similar to that of virtual memory. You can have more virtual CPUs than you have real, physical CPUs. The different threads can be running ``simultaneously''; because the CPUs have their own register set and thus program counters, the threads are running different functions in the program.

For example, Netscape uses multiple threads. One is used to update the graphics, another is used to handle the network I/O (fetching web documents), etc.

Kernel versus user threads

Kernel threads are implemented by the operating system. The kernel takes care of multiplexing the threads on top of the physical CPUs. The advantages are that if you are running on a multiprocessor system, the kernel can actually make use of the CPUs: a 32-CPU system can run 32 threads simultaneously. If you have fewer threads than CPUs, then of course some CPUs will be idle. Conversely, if you have more threads than CPUs, then the kernel will share the physical CPUs among the threads by timeslicing.

User threads (aka coroutine or cooperative threads) run on a single processor. Context switching among these threads typically occur cooperatively: the running thread must explicitly call a function (in our case, thr_yield) to give up the processor to the next runnable thread. This means that coroutine threads can't take advantage of a multiprocessor system.

You should know which is more efficient in terms of context switch costs and set up, and which works better on multiprocessors.

Locking

Kernel threads require the use of locks to prevent simultaneous access to shared data structures. Since a data structure update may cause the data structure to be temporarily inconsistent, if a second (or third or fourth...) thread examines a data structure during an update, then the second thread can become confused.

Locking is easier with user threads. This is because a thread that is in the midst of updating a shared data structure can simply not call thr_yield -- this prevents any other threads from running, so nobody else will see the data structure before it's ready.

The disadvantage of this approach is that if a data structure update requires a long time, then we lose concurrency -- this is important if, for example, the other threads that are being locked out include ones that update the GUI. Perhaps worse yet, if the update requires calling common routines, then those common routines are also required to not ever call thr_yield: this means knowledge of whether it's okay to yield has to be distributed throughout the program.

To mitigate these problem, we could allow yields at arbitrary points, and require locking. Here, coroutine locking would simply mean that a thread attempting to take a lock that's already taken by another thread must block and wait until the lock is released. This translates to remembering that a thread is trying to get a lock, and the scheduler bypassing threads that are trying to get a lock that is still taken.

Scheduling

Kernel threads typically have some kernel-provided mechanisms for specifying the priority of the threads. Differing priorities allow the programmer to give one thread a higher fraction of the CPU time (when there are more threads than CPUs).

Fancier scheduling policies for user threads are also possible: we could mark a thread as a high priority thread, in which case the thread package might, for example, make it run twice as ``often'' as other threads. This is not particularly good without preemption, of course, since threads can still hog the CPU by not yielding. (``Often'' may be defined as the number of executions between yields, or be based on real-time clock measurements.)

Assignment 5

You are to complete the implementation of a coroutine threads package. The entry points are:

void thr_init(struct thr_state *tsp,
              void             (*fn)(void *),
              void             *fnarg,
              int              *stack,
              int              stack_size);

void thr_go(void);

void thr_yield(void);

This is, for example, used by user code in the following way:

void tmain(void *arg)
{
	int	thread_id = (int) arg;
	int	i = 0;

	for (i = 0; i < 10; i++) {
		printf("Thread %d: %d\n",thread_id,i);
		thr_yield();
	}
}

void foo(int	who,
	 int	num)
{
	printf("I am thread %d, and this is %d times through\n",who,num);
	thr_yield();
}

void t2main(void *arg)
{
	int	thread_id = (int) arg;
	int	i;

	for (i = 0; i < 10; i++) {
		foo(thread_id,i);
		printf("[ %d:%d ]\n",thread_id,i);
		thr_yield();
	}
}

void main(void)
{
	struct thr_state	t0, t1, t2;
	int			t0stk[512], t1stk[512], t2stk[512];

	thr_init(&t0,tmain,0,t0stk,sizeof t1stk);
	thr_init(&t1,tmain,1,t1stk,4 * 512); /* equiv */ 
	thr_init(&t2,t2main,2,t2stk,2048); 

	thr_go();
	printf("All done!\n");
}

The output should be the following; the scheduling of the threads is round-robin -- t0 runs until it yields, then t2 runs until it yields, and then t3 runs until yielding, and then we start all over again. Make sure you understand why the output is interleaved the way they are.

Thread 0: 0
Thread 1: 0
I am thread 2, and this is 0 times through
Thread 0: 1
Thread 1: 1
[ 2:0 ]
Thread 0: 2
Thread 1: 2
I am thread 2, and this is 1 times through
Thread 0: 3
Thread 1: 3
[ 2:1 ]
Thread 0: 4
Thread 1: 4
I am thread 2, and this is 2 times through
Thread 0: 5
Thread 1: 5
[ 2:2 ]
Thread 0: 6
Thread 1: 6
I am thread 2, and this is 3 times through
Thread 0: 7
Thread 1: 7
[ 2:3 ]
Thread 0: 8
Thread 1: 8
I am thread 2, and this is 4 times through
Thread 0: 9
Thread 1: 9
[ 2:4 ]
I am thread 2, and this is 5 times through
[ 2:5 ]
I am thread 2, and this is 6 times through
[ 2:6 ]
I am thread 2, and this is 7 times through
[ 2:7 ]
I am thread 2, and this is 8 times through
[ 2:8 ]
I am thread 2, and this is 9 times through
[ 2:9 ]
All done!

I'll provide thr_init and thr_go, and the basic infrastructure. You'll write thr_yield. This will be due on 11/24 before class.

The code to the parts that I'll supply you are in the public directory (~/../public/assn5/). The file header.mips contains the MIPS assembly-time constants used in the rest of the code. The file driver.mips is a test driver, as is driver2.mips; driver2.mips is just like driver.mips except that the $s registers are loaded with values, so that when you step through the code while testing in xspim, the register value display will show these values and make the current thread more-or-less apparent.

You must create a thr.mips file to complete the coroutine threads package. I have a working version -- not particularly optimized -- which is 177 lines of MIPS code and comments. You may wish to start with student.mips; it contains the thr_init and thr_go code from my version of the thread package. This is 142 lines, so I've deleted only about 35 lines or so of MIPS assembly!

Note: I had updated the student.mips file on Monday to add a single instruction that makes it possible to re-run the program when testing. While it's always a good idea to clear all memory and registers and reload the program as you're debugging so the simulated machine is always in a known state. This prevents problems that occur when a memory location contains a value as a result of a run that is different from the inital value from the load.

By the way, instead of doing 3 loads, you should do

% cat header.mips driver.mips thr.mips > all.mips
% xspim -file all.mips &

instead. When you edit the program, re-run the cat command in your shell window by typing

% !cat

and then do the reload in the xspim window.

Anyway, the added instruction is the

			sw	$zero, num_threads

instruction in the thr_go routine. To make it easy to find, the code immediately around it is:

thr_go_loop:            jal     thr_yield
                        lw      $t0, num_threads
                        bgt     $t0, 1, thr_go_loop
                        sw      $zero, num_threads
                        lw      $ra, 4($sp)             # we are alone!
                        addu    $sp, $sp, 4
                        jr      $ra

bsy+www@cs.ucsd.edu, last updated Mon Nov 30 21:53:18 PST 1998.

email bsy & tutors