This adds combinatorial logic that does 32-bit and 64-bit count
leading and trailing zeroes in one unit, and consolidates the
four instructions under a single OP_CNTZ opcode.
This saves 84 slice LUTs on the Arty A7-100.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>