New Hybrid Chip with Cortex-A and -M ST Microelectronics merges MCU and MPU

ST Microelectronics has been the market leader for 32-bit Cortex M microcontrollers for more than a decade. On the other hand the MPU business leader is NXP with its i.MX SoCs. With the new STM32MP1, ST now also wants to win a piece of this lucrative market.

ST Microelectronics' success story in Arm-cortex M microcontrollers has resulted in a market share of over 50% in this fast-growing market. The world's first Cortex-M3 MCU was introduced in 2007, followed by M4, M0+ and the first Cortex-M7-based chip in 2014. ST's US-competitor NXP has a similar success story with its i.MX-MPU family. In this segment of Cortex-A application processors, in many cases supplemented by a Cortex-M4 MCU for latency-critical tasks, ST could only watch the dollars roll into Austin, Texas.

With the STM32MP1 manufactured in 40 nm, which contains one or two CPUs of the type Arm Cortex-A7 clocked at 650 MHz, a Cortex-M4 clocked at 209 MHz and a GPU clocked at 533 MHz (we assume a Vivante GPU of the GC-Nano family, a data sheet was unfortunately not available to us at the time this text was written) ST is targeting similar applications to NXPs i.MX8 DualX (but instead of the 32-bit Cortex-A7 it has the newer 64-bit CPUs of type Cortex-A35), which can best be compared to the STM32MP1, at least in its full configuration. Application examples include industrial automation & control, HMI, robotics and health/wellness.
In addition to the variant described, which is marketed under the name STM32MP157, there will also be a derivative STM32MP153 without GPU, the STM32MP151 is lacking a GPU and the CAN-FD interface. Also derivatives with Single-Core-Cortex-A7 are said to be available. The STM32MP153 can be compared on the NXP side with the i.MX7, which also implements no GPU, but Cortex-A7-CPUs.
Figure 1 shows the block diagram of the STM32MP157. What is noticeable at first glance is the comparatively small L2 cache with 256 KB shared between the A7 cores (NXP implemented 512 KB), but the Cortex-M4 with 448 KB SRAM (of which 384 KB system RAM and 64 KB retention RAM) has an even larger memory on chip. DDR3/3L/LPDDR2/LPDDR3 can be connected as external memory via a DDR controller and the 16/32-bit interface clocked at 533 MHz. A dual quad-SPI interface is available for Flash for this purpose.
The Vivante GPU is clocked at 533 MHz and delivers 26 million triangles/s and 133 megapixels/s respectively. It supports OpenGL ES2.0, but not OpenGL ES 3.1 or 3.0. Two MIPI-DSI lines with transfer rates of 1 Gb/s are available for the output, as is an LCD TFT controller for WXGA resolutions at 60 frames/s and 24-bit RGB color representation.
Of the rich peripherals, 1 Gbit Ethernet, two CAN-FD interfaces and two 16-bit A/D converters as well as two 12-bit D/A converters are worth mentioning. Of course, all the usual suspects such as USB, SPI, I2C, UART and a camera interface are also available.
Optionally you can order ST's already known encryption engine. Besides a random generator, there is an AES hardware accelerator which can be used both for encrypting and decrypting data with an AES algorithm and can use 128/256 bit long keys. Encryption/decryption is supported using the AES Rijndael block encryption algorithm and a NIST FIPS 197 compliant implementation of the AES encryption/decryption algorithm.
The hash processor is a fully compliant implementation of the secure hash algorithm (SHA-1, SHA-224, SHA-256), the MD5 (message-digest algorithm 5) hash algorithm, and the HMAC (Keyed-Hash Message Authentication Code) algorithm, which is suitable for a variety of applications.
The Public Key Accelerator (PKA) is intended for the calculation of cryptographic public keys, especially those related to RSA, Diffie-Hellmann or ECC (elliptic curve cryptography) via GF(p) (Galois fields). To achieve a high throughput at a reasonable price, these operations are performed in the Montgomery domain.

The embedded OTFDEC decrypts in real time the encrypted content stored in external OctoSPI memories used in memory mapped mode. The OTFDEC uses the AES-128 algorithm in counter mode (CTR). Code execution on external OctoSPI memories can be protected against error injection thanks to STMicroelectronics proprietary minicipherals.
In addition to fifteen 16-bit timers, there are also two 32-bit timers and two 16-bit timers with PWM digital outputs specially designed for motor controls.
If both Cortex-A7 cores are clocked with 650 MHz and the Cortex-M4 additionally with 209 MHz, the chip (without periphery and GPU) takes up 353 mW, with only one Cortex-A7 the value sinks to 275 mW. The Cortex-M4 alone (e.g. if one acquires sensor data or controls a motor and switches off the Cortex-A7) absorbs 92 mW. All values are valid for a supply voltage of 3.3 V at 25 °C and a core supply voltage of 1.2 V.
In standby mode, from which the chip can "wake up" in one second to the Linux console and in 3 s to a 3D graphics application (i.e. GPU operation), 36 µW are recorded, in VBAT mode with active real-time clock and tamper protection 4.5 µW.
Last but not least, the power supply chip STPMIC1 should be mentioned, which not only contains the linear regulator and DC/DC converter for the STM32MP1 itself, but can also supply any display, external memory and other external components, thus allowing a small PCB footprint and a reduced BOM for typical target applications.

NXPs i.MX 8 DualX vs. STM32MP1

With the STM32MP1, ST Microelectronics has entered a wasp nest in which dozens of derivatives of the i.MX family are cavorted by competitor NXP.
In view of the configuration of the STM32MP157 with GPU, the i.MX8 DualX, manufactured by Samsung in a 28 nm FDSOI process, is probably the most comparable, although it has designed the modern 64-bit Cortex-A35 processor instead of the Cortex-A7. Unlike the Cortex-A7, the Cortex-A35 offers all the new features of the Armv8-A, such as 64-bit support and thirty-one 64-bit general purpose registers instead of just fifteen 32-bit registers. There are also some improvements in the NEON unit for applications such as machine learning and computer vision.
The Cortex-A35 also delivers up to 40% more processing power than the Cortex-A7 - with the same clock frequency and 10% less power consumption.
The fact that the Cortex-A35 is also clocked at 1.2 GHz in the NXP chip shows that the absolute computing power on the MPU side is not comparable - the i.MX outperforms the ST chip in a dual-core configuration on paper by at least a factor of 4. The Cortex-M4 is clocked at 264 MHz in the NXP chip, i.e. only moderately higher than in the ST chip. The Vivante GC7000UltraLite GPU is clocked at 372 MHz at NXP, but the pixel rate is not specified in the provisional data sheet currently available.
The i.MX 8 also integrates a Tensilica HiFi 4 DSP for audio pre and post processing, which is clocked at 640 MHz. An analogy for ST is missing. The CAAM (Cryptographic Accelerator and Assurance Module) of NXP implements various encryption and hashing functions, a runtime integrity tester and a pseudo random number generator (PRNG). CAAM also implements a secure memory mechanism. The memory provided for this is 64 KB.
In view of this comparison, one can certainly guess that the STM32MP157 will find its market if the i.MX8 DualX means an overkill in computing power and functionality (e.g. the DSP). The STM32MP1 will certainly be priced well below the i.MX.

Development boards

For 399 dollars ST Microelectronics offers an evaluation board in full equipment, who wants it cheaper can buy for 69 dollars a demo board and for 99 dollars a demo board with integrated WVGA display and WiFi/Blutooth combination module.
The entire arm and ST microelectronics ecosystem of development tools is available to the customer, which should make it very easy, especially for the numerous STM32 MCU customers, to migrate to the new chip.
In addition, the SoC is already supported by Linux version 4.19 with long-term maintenance (LTS, Long Term Support), which of course makes it much easier to develop on both application processors.
OP-TEE for a secure execution environment based on Arms TrustZone technology can also be executed on the Cortex-A7 or Linux. On the Cortex-M4, for example, you can run an RTOS like on the STM32 MCUs, and three developer software packages allow users to choose the support option that best suits their needs to optimize development in each project phase:

  •     Starter Package (STM32MP1Starter) for quick and easy startup with any STM32MP1 microprocessor
  •     Developer Package (STM32MP1Dev) for adding own developments in addition to the STM32MP1 Embedded Software Distribution
  •     Distribution Package (STM32MP1Distrib) to create your own Linux distribution, starter and developer packages.

ST's commitment to rich software support, including the STM32CubeMP1 firmware package, is a special feature of the STM32MP1 series. STM32CubeMX facilitates the software and hardware configuration of the Cortex-A7 and Cortex-M4 cores. It handles the generation of C code for the M4 core, the configuration of the DDR SDRAM interface and the tuning tool and can also generate Linux device trees. ST also supports customers with a selection of community boards and third-party System on Module Boards (SOM).

In summary, it can be said that the SoC provides a promising entry into the MPU/MCU world from its design and its extensive SW support.
Sufficient computing power combined with high energy efficiency for the target applications is available, in the case the price tag is right, there is nothing to prevent the STM32MP1 from being seamlessly integrated into the successful STM32 universe.