int pop4(void); void push4(int val); int pop32(void); void push32(int val);and then variables:
int (*pop)(void); void (*push(int);so that your mode switch routine would contain the fragment
pop = pop4; push = push4;When you want to push a value, you use the function pointers:
(*push)((*pop)() * (*pop)());and the function pointer would encapsulate which mode the calculator is in.
In assembly, this would be:
.data pop: .word 0 push: .word 0 .text main: ... la $t0,pop4 sw $t0,pop la $t0,push4 sw $t0,push ... mult: sub $sp,$sp,12 sw $ra,4($sp) sw $a0,8($sp) lw $t0,pop jalr $t0 sw $v0,12($sp) lw $t0,pop jalr $t0 lw $t0,12($sp) mul $a0,$v0,$t0 lw $t0,push jalr $t0 lw $a0,8($sp) lw $ra,4($sp) add $sp,$sp,12 jr $ra
Each set of CPU registers -- all the registers, including $pc and $sp -- comprise a thread of execution, or a thread. Threads execute independently of each other, and communicate only through the shared memory. (Another abstraction you'll learn about in OS is shared memory; the difference is subtle: a multithreaded program share all access rights and other kernel state such as timers, resource limits, etc, and multiple processes that share memory would not -- and may have some regions of memory that are not shared.)
Kernel threads are known to the scheduler. If one thread blocks due to a I/O syscall, the OS kernel may run another thread from the same process. (It may also decide to run another process altogether.) Processes and threads block when they can't progress any further, typically due to some I/O event (e.g., reading from an empty pipe -- the process has to wait until the writer puts data into it).
User level threads, or coroutine threads, are not known to the scheduler. Thus, if one thread invokes a syscall that blocks, all threads are stopped. User threads are more efficient, however, since context switching among user threads only involve swapping the CPU registers; kernel threads require a full context switch into kernel mode and back, typically due to a timer interrupt. This is not just swapping CPU register contents twice, but also involves changing the CPU's mode from user to supervisor (aka kernel) mode and back, and this involves the MMU making kernel memory pages accessible, etc.
Kernel threads have the advantage that, absent lock contention, automatic speedup occurs (almost linearly) when the program is run on a multiprocessor, since the real processors can each run a separate thread. Coroutine threads are easier to program with, since often less locking is needed: because coroutine threads are not preemptive, as long as there are no interrupt handlers (signal handlers) that access shared structures, avoiding multiple threads from entering a critical region at the same time is simply not volunteering to yield the processor. With kernel threads, no explicit yielding is needed, since the threads may indeed run on separate processors and in any case the kernel manages the thread switching. This means that locks (or monitors, or mutexes and conditional variables, or P/V operations -- you'll learn more about these things in an OS class) are needed to control access: updates to shared data structures (the example that I used was adding an element to the head of a linked list) should be fully complete before another thread is allowed access.
The assignment is now available in C form.
We are San Diego Software Design, and we are writing a screensaver product that will be used for desktop machines, palmtops, and set top boxes (TV/cable boxes) -- all using the MIPS CPU. The portion that you are working on is the Conway's Life engine. The video model is as follows: there are two pages of memory mapped video memory that can be accesses; animation is done by changing which is the current page being displayed. (This is a technique called ``page flipping'' -- by changing pages in hardware, animation image flicker can be eliminated.) The video memory for our initial version is black-and-white, so pixels are bit-mapped: rows of pixels are packed bits in contiguous words, and each row starts on a fresh word boundary. A virtual screen library also allows the use of the same bit-mapped memory model for display within windows. (See the C code if this is unclear.)
You have been provided with the following C code: life.h, driver.c, and speed.c. This code works, but not fast enough on many of the company's targetted platforms. (These files are also available in the ~/../public directory. The assembly language test driver will appear there in a little while.)
Your job, as the assembly language hacker of the project team, is to speed up the code in the speed.c file. It contains the core Life transform that the project manager has determined to be the bottleneck (from profiling the code). Good luck.
A compiled version of the life code is now in ~/../public/life. To see it work, run as ~/../public/life -x -s .
Note: you must follow the standard calling and register usage conventions between main and life_xform, since the code in main is written by other members of the product team -- but within life_xform itself and any other subroutines that you write to be called by life_xform, you may do as you wish. In particular, you may wish to write your own versions of cell_alive and set_cell, and there you can cheerfully ignore the calling / register usage conventions.
You'll be provided with test scaffolding that corresponds to driver.c.
bsy@cse.ucsd.edu, last updated