CSE 30 -- Lecture 15 -- Nov 19

In this lecture, I went over function pointers (e.g., those used in the 3rd sample solution to assignment 4), processes and threads, and assignment 6.

Function Pointers

In C or C++, you can declare functions:

int	pop4(void);
void	push4(int	val);
int	pop32(void);
void	push32(int	val);

and then variables:

int	(*pop)(void);
void	(*push(int);

so that your mode switch routine would contain the fragment

	pop = pop4;
	push = push4;

When you want to push a value, you use the function pointers:

	(*push)((*pop)() * (*pop)());

and the function pointer would encapsulate which mode the calculator is in.

In assembly, this would be:

	.data
pop:	.word 0
push:	.word 0
	.text
main:	...
	la $t0,pop4
	sw $t0,pop
	la $t0,push4
	sw $t0,push
	...
mult:	sub $sp,$sp,12
	sw $ra,4($sp)
	sw $a0,8($sp)
	lw $t0,pop
	jalr $t0
	sw $v0,12($sp)
	lw $t0,pop
	jalr $t0
	lw $t0,12($sp)
	mul $a0,$v0,$t0
	lw $t0,push
	jalr $t0
	lw $a0,8($sp)
	lw $ra,4($sp)
	add $sp,$sp,12
	jr $ra

Processes and Threads

A process is an address space, at least one set of CPU registers (a virtual processor), and access rights (such as files). A traditional process has only one set of CPU registers associated with it. A multi-threaded process would have more than one.

Each set of CPU registers -- all the registers, including $pc and $sp -- comprise a thread of execution, or a thread. Threads execute independently of each other, and communicate only through the shared memory. (Another abstraction you'll learn about in OS is shared memory; the difference is subtle: a multithreaded program share all access rights and other kernel state such as timers, resource limits, etc, and multiple processes that share memory would not -- and may have some regions of memory that are not shared.)

Kernel threads are known to the scheduler. If one thread blocks due to a I/O syscall, the OS kernel may run another thread from the same process. (It may also decide to run another process altogether.) Processes and threads block when they can't progress any further, typically due to some I/O event (e.g., reading from an empty pipe -- the process has to wait until the writer puts data into it).

User level threads, or coroutine threads, are not known to the scheduler. Thus, if one thread invokes a syscall that blocks, all threads are stopped. User threads are more efficient, however, since context switching among user threads only involve swapping the CPU registers; kernel threads require a full context switch into kernel mode and back, typically due to a timer interrupt. This is not just swapping CPU register contents twice, but also involves changing the CPU's mode from user to supervisor (aka kernel) mode and back, and this involves the MMU making kernel memory pages accessible, etc.

Kernel threads have the advantage that, absent lock contention, automatic speedup occurs (almost linearly) when the program is run on a multiprocessor, since the real processors can each run a separate thread. Coroutine threads are easier to program with, since often less locking is needed: because coroutine threads are not preemptive, as long as there are no interrupt handlers (signal handlers) that access shared structures, avoiding multiple threads from entering a critical region at the same time is simply not volunteering to yield the processor. With kernel threads, no explicit yielding is needed, since the threads may indeed run on separate processors and in any case the kernel manages the thread switching. This means that locks (or monitors, or mutexes and conditional variables, or P/V operations -- you'll learn more about these things in an OS class) are needed to control access: updates to shared data structures (the example that I used was adding an element to the head of a linked list) should be fully complete before another thread is allowed access.

Assignment 6

Due Dec 3rd, before class as per usual. This is an optimization assignment; you do not have to following the standard register usage / calling conventions. As a matter of fact, we will not read your code, just your documentation. You will be given a C program -- available here before Friday 4:40pm -- which you must hand translate to assembly. You should document what techniques you used and what design choices you made; this is worth 20 points. Your code must give the correct answer, of course -- this is worth 30 points; we will use test cases to verify this. The remaining 50 points is given based on the number of instructions that your code requires to run to completion. Efficiency is important -- you may use large tables if needed, up to the spim emulator's limitations. Unlike the calculator, space is not at a premium, but speed is.

The assignment is now available in C form.

We are San Diego Software Design, and we are writing a screensaver product that will be used for desktop machines, palmtops, and set top boxes (TV/cable boxes) -- all using the MIPS CPU. The portion that you are working on is the Conway's Life engine. The video model is as follows: there are two pages of memory mapped video memory that can be accesses; animation is done by changing which is the current page being displayed. (This is a technique called ``page flipping'' -- by changing pages in hardware, animation image flicker can be eliminated.) The video memory for our initial version is black-and-white, so pixels are bit-mapped: rows of pixels are packed bits in contiguous words, and each row starts on a fresh word boundary. A virtual screen library also allows the use of the same bit-mapped memory model for display within windows. (See the C code if this is unclear.)

You have been provided with the following C code: life.h, driver.c, and speed.c. This code works, but not fast enough on many of the company's targetted platforms. (These files are also available in the ~/../public directory. The assembly language test driver will appear there in a little while.)

Your job, as the assembly language hacker of the project team, is to speed up the code in the speed.c file. It contains the core Life transform that the project manager has determined to be the bottleneck (from profiling the code). Good luck.

A compiled version of the life code is now in ~/../public/life. To see it work, run as ~/../public/life -x -s .

Note: you must follow the standard calling and register usage conventions between main and life_xform, since the code in main is written by other members of the product team -- but within life_xform itself and any other subroutines that you write to be called by life_xform, you may do as you wish. In particular, you may wish to write your own versions of cell_alive and set_cell, and there you can cheerfully ignore the calling / register usage conventions.

You'll be provided with test scaffolding that corresponds to driver.c.

bsy@cse.ucsd.edu, last updated Wed Dec 3 20:00:23 PST 1997.

email bsy