Only the definition of the semantic level with the two specifications [1, 2] makes it difficult to estimate what a scenario for an extension looks like and what implementation effort it can entail.
In principle, a semiconductor manufacturer could even bring this extension down to the user level, where free machine instruction sets of the standard extensions address a programmable logic that can carry arbitrary asm-instructions. Until such a product appears on the market,
developers will need to introduce their proprietary RISC-V extension at the chip-design level. The DVCon Europe tutorial [3] shows two design flows: the integration of the extension into a simulator and an implementation over the VHDL-defined physical layer.
The simulation is based on the Imperas platform, which has a virtual model for the RV32IM connected to a memory. The simulator itself is addressed at terminal level in the respective operating system. As an example, ChaCha20 encryption is implemented as a separate command. This consists of four logical rotations, each calling elementary operations such as XOR and ROL. With an API, these operations can be combined into an execution block, the timing completed and assigned to the new asm and machine command. Whether the phy can really cluster this elementary operation is on a completely different page: ideally there is something in the control unit that processes the elementary operations with one command, or the result is worked out by a different physical implementation.
Figure 2 shows a possible development flow for extending the virtual platform with a new command set. The desired application should first be executed as a c-program on the virtual platform. From an application point of view, it is not clear whether hardware is functionally suitable.
The simulation should be command acurate so that the application can be analyzed relastically via a debug/trace interface. Standard metrics are performance in computing cycles and timing. In the ChaCha20 example, the recursive c-definition of the four logical rotations from a function archetype is computation-inefficient.
In the next step, the desired instruction set must be characterized, and its behavior and timing must be added to the virtual platform. In the example, it is more efficient to create the four logical rotations directly as assembler instructions. With the equivalence of command and operation, multiple operations can be encapsulated in the model using the API and assigned to a single extended machine command.
The advanced instruction set must be re-verified and reanalyzed by instruction-accurate simulation in the application. If a bug is detected, the instruction set or its virtual implementation must be modified.
If the new command works in the application, the RISC-V model can be optimized and its performance used as a reference for VHDL-verification. Automated tests with pseudo-random code are particularly useful for demonstrating RISC-V compliance.