The computation of two_dwords from r.second_bytes has shown up as
part of a critical path at times. Instead we add a 'last_dword'
flag to the reg_stage_t record which tells us more directly
whether a valid flag coming in from dcache means that the
instruction is done, thereby shortening the path to the busy output
back to execute1.
This also simplifies some of the trim_ctl logic. The two_dwords = 0
case could never have use_second(i) = 1 for any of the bytes being
transferred, so "not use_second(i)" is always 1.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>