Startseite > International > Inference can be as Compute Intensive as the Training

Preview on embedded world 2024

Inference can be as Compute Intensive as the Training

5. Februar 2024, 10:00 Uhr | Joachim Kroll

Dr. Salil Raje ist Vice President bei AMD und hält am 9. April die Keynote am ersten Tag der embedded world Conference 2024.

With CPUs, GPUs and FPGAs, AMD has all architectures that are used for AI. embedded world Keynote Speaker Dr. Salil Raje shares his view on AI.

▶ Diesen Artikel anhören

AI brings many functions that improve the user experience, or productivity, like generating images, creating documents, or interpreting voice commands. But in which sectors is AI being adopted to address technical and social challenges?

AI is transforming the fabric of everyday life across many sectors—and it’s not just about training and inferencing in the cloud. AI is also happening at the edge and endpoints as well.

In healthcare, AI can lead to breakthroughs in drug discovery and medical research, and even improve medical diagnoses and treatment. AMD adaptive computing customers, Clarius and Topcon, for example, are already using AI to help doctors diagnose physical injuries and eye disease, respectively, and Japan’s Hiroshima University uses AMD-powered AI to help doctors diagnose certain types of cancer. AI can also help speed up drug testing and make drugs safer by modeling their effects on the human body.

In automotive, AI is helping drive advanced safety systems by enabling cars to recognize various types of hazards and guiding drivers to safety. AI is also used for driver monitoring and passenger detection systems.

AI will also have a profound impact on the manufacturing sector. With Industry 5.0 automation, products can be made more cost effectively using AI-powered robots, with a reduced risk of human injury. Additionally, AI can help streamline product testing to enable even faster time-to-market for companies to scale products for mass production.

And on some of the world’s fastest, exascale-level supercomputers like Frontier, LUMI, and the upcoming El Capitan, AI will enable researchers to study climate change, conduct medical research, and explore potential new sources of clean energy.

Matchmaker+ Anbieter zum Thema

zu Matchmaker+

Dr. Salil Raje
As the leader of AMD’s Adaptive and Embedded Computing Group (AECG), Salil Raje is responsible for all aspects of strategy, business management, engineering, and sales for FPGAs, adaptive SoCs, embedded processors, and core markets. Raje joined AMD in 2022 from Xilinx, as part of the largest acquisition in semiconductor history. Raje holds a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Madras, and Master of Science and Doctorate degrees in Computer Science from Northwestern University. He holds eight patents in electronic design tools, ASIC, and FPGA designs, and has written more than 15 industry-recognized research papers.

Dr. Salil Raje

As the leader of AMD’s Adaptive and Embedded Computing Group (AECG), Salil Raje is responsible for all aspects of strategy, business management, engineering, and sales for FPGAs, adaptive SoCs, embedded processors, and core markets. Raje joined AMD in 2022 from Xilinx, as part of the largest acquisition in semiconductor history. Raje holds a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Madras, and Master of Science and Doctorate degrees in Computer Science from Northwestern University. He holds eight patents in electronic design tools, ASIC, and FPGA designs, and has written more than 15 industry-recognized research papers.

How will AI increase the computing power and energy requirements of intelligent systems? Is training more demanding than inferencing?

Generative AI inferencing can be as compute intensive as AI training. Both demand high levels of compute processing and energy, depending on the use case and where it’s deployed. Most new technology advances are bound by how much compute you can fit into constraints of the platform (size, cost, thermal budget, etc.). Over the years, the industry has done a good job finding ways to mitigate these challenges so they do not become showstopper issues, from creating more power-efficient technologies to meet the unique requirements and enable AI at the edge. We are looking at a number of solutions, ranging from hardware, software, data types, and fine-tuning models to help developers innovate AI at the edge.

Listen to Salil's Keynote at embedded world 2024
embedded word 2024 takes place from Apriil 9 – 11 in Nuremberg. Dr. Salil Raje will give his keynote as part of the embedded world Conference on 9 April at 10 a.m. For the complete conference programme and registration see www.embedded-world.eu

Is it fair to say that GPUs are the best choice for processing in the cloud, while dedicated AI accelerators or FPGAs provide the best performance at the edge?

We believe in a heterogenous processing approach that targets the right tasks to the right processor (e.g., GPU, CPU, adaptive SoC) to optimize for both compute and energy efficiency, and memory bandwidth. GPUs are often a good choice for the cloud, but some service providers may choose a dedicated/specialized AI accelerator for certain tasks that are done repetitively and at a huge scale, like search.

Edge workloads usually have constraints around latency with real-time inferencing, smaller form factors and lower power. Adaptive SoCs based on a heterogeneous mix of programmable logic, AI engines, CPUs, and GPUs, are well-suited to address edge requirements and provide a more optimized mix of resources ideal for processing at the edge.

In which cases does inferencing in the cloud make sense? Examples?

We envision a seamless, hybrid computing approach across cloud, edge, and endpoints that brings together the benefits of cloud computing with power-efficient, real-time inferencing. In many cases, an adaptable and scalable AI processing approach with both cloud and local processing working together can provide optimized experiences for each workload.

Inferencing in the cloud makes sense in cases where system devices are either so compact or so restricted in terms of battery power or cost that the better strategy is to send data to the cloud to do the inferencing. One case in point could be a security system. The system’s sensors must be small, cost-effective, and power-efficient, so processing captured video, images, and forced-entry data in the cloud makes sense. In agriculture, drones used for monitoring crops could extend battery life by streaming sensor data to the cloud for processing and inferencing, rather than performing these tasks locally on the drone.

Workloads that are decidedly better processed at the edge include latency-sensitive workloads and also in applications like healthcare where limiting the storage or transmission of information to the edge device can help address some information privacy and security concerns.

Microcontrollers and processors are universal devices that are well known by developers. What does it mean for them if they need special hardware for certain tasks? (This applies not only to AI, but also to security, video encoding, etc. Please address the problem of ever-growing complexity.)

At AMD, we are addressing the challenges of increasing design complexity with hardware abstraction via a unified software stack and open-source community and ecosystem. What this means is that developers can use the languages and tools they are familiar with to target different device types (GPUs, CPUs, adaptive SoCs) with the same or similar code. And, when necessary, developers have the option to optimize portions of the code to improve compute and power efficiency.

Our vision is to build an open, proven, and ready ecosystem to help accelerate software development and reduce hardware complexity. We are driving toward a unified AI stack across the AMD portfolio where whatever you develop in the cloud can seamlessly move to the edge via heterogenous deployment. While there are benefits to building with specialized architectures, the downside is the learning curve. We can help minimize this through integration with open frameworks and an open and broad ecosystem.

Does the trend towards heterogeneous computing also have to do with the fact that advances in semiconductor manufacturing no longer bring as much improvement in performance and energy savings?

The trend toward heterogenous computing exists because no single processor is optimized to do all types of processing. For instance, there have always been certain processors that are more efficient at particular arithmetic functions than others. Heterogenous computing allows developers to target specific workloads to the processors that deliver the best performance and efficiency for that task.

As workloads get more domain-specific, we can create significant improvements at each generation with architectural changes. Additionally, scalable adaptive architectures like AMD’s Versal architecture provide highly performant and energy efficient improvements that allow us to run multiple AI workloads simultaneously in real-time. Another advantage is our advanced packaging and chiplet innovations that are enabling us to drive significant improvements in power efficiency with each new generation.

Preview on embedded world 2024

Inference can be as Compute Intensive as the Training

Matchmaker+ Anbieter zum Thema

Matchmaker+

Willert Software Tools GmbH

Verifysoft Technology GmbH

MACNICA ATD Europe GmbH

Tektronix GmbH

Karl Kruse GmbH & CO KG

TDK-Lambda Europe GmbH