The open-isa.org has given the RISC-V enthusiasts a hardware testbed with the Vega board. The user manual has 4413 pages or about 50 chapters. Vega carries four arithmetic units (picture 3, left), each in comparable pairs, of the arm and RISC world: (Cortex-M4, RIVCY) and (Cortex M0+, ZER0 RISCY).
After [12] RI5CY (picture 3, top right) carries the standard extensions RIVE and RIVM as well as some exciting proprietary extensions:
Shifted Load and Save - performs a load/save operation simultaneously with the address origin shift. This eliminates some of the necessary subcommands in standard access patterns.
Accelerated Multiplication - RI5CY uses a one-cycle 32-bit x 32-bit multiplier with a 32-bit result. Here are commands to speed up the multiplication.
ALU extension - RI5CY has ALU extensions that process fixed command chains from the standard set in a single command.
Hardware loops - a sufficiently small loop can run on RI5CY directly in hardware. This reduces the software overhang. Start address, destination address and a loop counter can be set in hardware. For debugging, these values are mapped into the CSR.
The cryptography accelerator CAU3 (Fig. 3, bottom right) handles, among others, the procedures DES, 3DES, AES-{128,192,256}, SHA-{1,256,512} and ECC. It is controlled by special firmware.
Entropy Generator - An independent hardware accelerator generates a 512-bit entropy that is used as input to random functions.
The presentation [11] shows generic asm-code for some of these extensions. The Vega board has a typical IoT periphery with radio standards like bluetooth, integrated sensors, an Arduino-like GPIO and low-power operation modes of the cores.
Debug interfaces
The hardware interface on the arm side is Coresight trace via SWD connector and for RISC a JTAG interface. The RISC side does not have its own trace function, so that ad-hoc performance comparison between arm and RISC is not possible. Under certain circumstances, in a multi-core scenario, the RISC side can be closed from the arm side. A profane solution, by writing/reading the external memory (Flash or RAM), will significantly influence the control flow [10] and should therefore be rejected.
It cannot be ruled out that the intercore communication via MU [10], can partially handle trace data after a firmware adjustment.