|Table of Contents|

Research on Design Method of Weakly-Coupled Coprocessor:a Case Study of Artificial Intelligence Application(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2024年03期
Page:
112-121
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
Research on Design Method of Weakly-Coupled Coprocessor:a Case Study of Artificial Intelligence Application
Author(s):
Yan Xinkai12Chen Fangyuan3
(1.Polytechnic Institute,Zhejiang University,Hangzhou 310000,China)
(2.Jiangsu Vocational College of Information Technology,Wuxi 214153,China)
(3.Jiangnan Institute of Computing Technology,Wuxi 214000,China)
Keywords:
coprocessorDSAweak-coupledRISC-VAI
PACS:
TP332
DOI:
10.3969/j.issn.1001-4616.2024.03.014
Abstract:
Recent years,with the rapid development of artificial intelligence,big data,meta-universe and other applications as well as the slowing down of semiconductor technology progress,there has been a huge computing gap between software application and hardware performance. As a solution,domain-specific architecture through software and hardware co-design has been widely concerned and recognized by academia and industry. Therefore,it is of great significance to design specific-coprocessor and research the design method of coprocessor for the core requirements of specific applications in order to improve the performance and efficiency of software application and improve the efficiency of hardware design. This paper analyzes the coprocessor design space of different coupling degree and different load requirements,and focuses on the design method of weakly coupled coprocessor,including the design of coprocessor instruction architecture based on RISC-V custom instruction,the control interaction interface,access interface and design framework of weakly coupled coprocessor in different application scenarios. The general requirements of artificial intelligence applications and the research status of artificial intelligence coprocessor are summarized,finally two design cases of weakly-coupled coprocessor for different AI application scenarios are described,which provides an effective support for improving the design efficiency of coprocessor.

References:

[1]HENNESSY J L,PATTERSON D A. A new golden age for computer architecture[J]. Communications of the ACM,2019,62(2):48-60.
[2]SUGGS D,SUBRAMONY M,DAN B. The AMD “Zen 2” processor[J]. IEEE micro,2020,40(2):45-52.
[3]SKILLMAN A,EDSO T. A technical overview of Cortex-M55 and Ethos-U55:Arm's most capable processors for endpoint AI[C]//2020 IEEE Hot Chips 32 Symposium(HCS). Palo Alto:IEEE,2020:1-20.
[4]JOUPPI N P,YOUNG C,PATIL N,et al. In-datacenter performance analysis of a tensor processing unit[J]. Computer architecture news,2017,45(2):1-12.
[5]贾迅,邬贵明,谢向辉,等. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展,2019,56(2):410-420.
[6]LI Y,PEDRAM A. CATERPILLAR:Coarse grain reconfigurable architecture for accelerating the training of deep neural networks[C]//2017 IEEE 28th International Conference on Application-specific Systems,Architectures and Processors(ASAP). Seattle:IEEE,2017:1-10.
[7]STARKE W J,THOMPTO B W,STUECHELI J,et al. IBM's power10 processor[J]. IEEE micro,2021,41(2):7-14.
[8]IBM. Power instruction set architecture,version 3.1b[EB/OL]. [2023-05-28]https://openpowerfoundation.org/specifications/isa/.
[9]EMIL T,DOUGLAS W,SARMA D,et al. DOJO:The microarchitecture of Tesla's exa-scale computer[C]//2022 IEEE Hot Chips 34 Symposium(HCS).Cupertino:IEEE,2022:1-28.
[10]MATTEO P,MATHEUS C,WISTOFF N,et al. A“New Ara”for vector computing:An open source highly efficient RISC-V V 1.0 vector processor design[C]//2022 IEEE 33rd International Conference on Application-specific Systems,Architectures and Processors(ASAP). Gothenburg:IEEE,2022:43-51..
[11]SCHMIDT C,WRIGHT J,WANG Z,et al. 4.3 An eight-core 1.44 GHz RISC-V vector machine in 16 nm FinFET[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:58-60.
[12]ASANOVIC K,AVIZIENIS R,BACHRACH J,et al. The rocket chip generator,UCB/EECS-2016-17[R]. Berkeley,CA:EECS Department,University of California,2016.
[13]NUCLEI SYSTEM TECHNOLOGY. Hummingbirdv2 E203 Core and SoC v0.2.1[EB/OL]. [2023-05-28]https://doc.nucleisys.com/hbirdv2/core/core.html#nice.
[14]VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[J]. Neural information processing system,2011:5998-6008.
[15]NVIDIA. NVIDIA H100 tensor core GPU architecture[EB/OL]. [2023-05-28]https://nvdam.widen.net/s/9bz6dw7dqr/gtc22-whitepaper-hopper,2022.
[16]NORMAN P J,GEORGE K,SHEN G L,et al. TPU v4:An optically reconfigurable supercomputer for machine learning with hardware support for embeddings[J]. arXiv:2304.01433,2023.
[17]JOUPPI N P,YOON D H,ASHCRAFT M,et al. Ten lessons from three generations shaped Google's TPUv4i:Industrial product[C]//2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture(ISCA). Valencia:ACM,2021:1-14.
[18]CAMBRICON. Cambricon MLU370 chip[EB/OL]. [2023-05-28]https://www.cambricon.com/index.php?m=content&c=index&a=lists&catid=360,2023.
[19]OUYANG J,DU X,MA Y,et al. 3.3 Kunlun:A 14 nm high-performance AI processor for diversified workloads[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:50-51.
[20]AUFRANC J L. Kendryte K510 tri-core RISC-V AI processor deliver up to 3 TOPS[EB/OL]. [2023-05-28]https://www.cnx-software.com/2021/07/09/kendryte-k510-tri-core-risc-v-ai-processor-3-tops/,2021.
[21]SHILOV A. Tachyum teases 128-Core CPU:5.7 GHz,950W,16 DDR5 channels[EB/OL]. [2023-05-28]https://www.tomshardware.com/news/tachyum-teases-128-corecpu-57-ghz-950w-16-ddr5-channels,2022.

Memo

Memo:
-
Last Update: 2024-09-15