CSE 30 -- Lecture 10 -- Nov 3


Read: Chapter 6.

From last time, we had:

for (i = 0; i < N; i++) {
	sq[i] = i*i;
}
Translated into MIPS assembly, it is:
		instructions		cycles
		li $t0, 0		1
		b test			1
bod:		mul $t0,$t0		7 + ? (depends on multiplier)
		mflo $t1		1
		sll $t2,$t0,2		1
		add $t2,$t2,$a1		1
		sw $t1,0($t2)		1 (if write buffer) + ??
test:		blt $t0,$a0,bod		1 (ignoring pipeline issues for now)
The runtime of this code is 3 + (12 + ?) N

Strength reduction of squaring:

for (isq = i = 0; i < N; i++) {
	sq[i] = isq;
	isq = isq + 2 * i + 1;
}
Translated into MIPS assembly:
		instructions		cycles
		li $t0, 0		1
		li $t1, 0		1
		b test			1
bod:		sll $t2,$t0,2		1
		add $t2,$t2,$a1		1
		sw $t1,0($t2)		1 (if write buffer) + ??
		sll $t3,$t0,1		1
		add $t1,$t1,$t3		1
		add $t1,$t1,1		1
		add $t0,$t0,1		1
test:		blt $t0,$a0,bod		1 (ignoring pipeline issues for now)

The run time is 4 + (8 + ?) N , or about 2/3 the runtime of the orginal.

We can strength reduce the array index calculation and save an extra cycle:

int	*sqp = sq;
for (isq = i = 0; i < N; i++) {
	*sqp++ = isq;
	isq = isq + 2 * i + 1;
}
Translated into MIPS assembly:
		instructions		cycles
		li $t0, 0		1
		li $t1, 0		1
		li $t4, $a1		1
		b test			1
bod:		sw $t1,0($t4)		1 (if write buffer) + ??
		add $t4,$t4,4		1
		sll $t3,$t0,1		1
		add $t1,$t1,$t3		1
		add $t1,$t1,1		1
		add $t0,$t0,1		1
test:		blt $t0,$a0,bod		1 (ignoring pipeline issues for now)

And this run time is 5 + (7 + ??) N , 12.5% faster still, for large values of N.

Pipelines and Hazards

We talked about the five stage pipeline for the MIPS R2000. Other implementations of the MIPS architecture have different pipeline depths.

I talked about data hazards and control hazards and how they stall the pipeline or introduce ``bubbles'' into it. I also talked about how bypass circuitry is used to eliminate some of the bubbles for data hazards, and how the branch delay slot in the MIPS architecture eliminates bubbles for control hazards. The trend in processor architecture design is not to use branch delay slots, since there are inherent problems with them (what are they?) -- instead, branch prediction is used to guess the direction that a conditional branch will take, and to speculatively execute along that branch. If the guess is right, everything is okay; if the guess is wrong, the partially executed guessed instructions are flushed from the pipeline. (Results are ``pending'' and committed to the register file only when the branch decision is known to be good.)


[ search CSE | CSE home | bsy' home page | webster i/f | yahoo | hotbot | lycos | altavista ]
picture of bsy

bsy+www@cs.ucsd.edu, last updated Tue Nov 3 17:20:33 PST 1998.

email bsy & tutors


Don't make me hand over my privacy keys!