[1]严忻恺,陈芳园.弱耦合协处理器设计方法研究——以人工智能应用为例[J].南京师大学报(自然科学版),2024,(03):112-121.[doi:10.3969/j.issn.1001-4616.2024.03.014]
 Yan Xinkai,Chen Fangyuan.Research on Design Method of Weakly-Coupled Coprocessor:a Case Study of Artificial Intelligence Application[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(03):112-121.[doi:10.3969/j.issn.1001-4616.2024.03.014]
点击复制

弱耦合协处理器设计方法研究——以人工智能应用为例()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2024年03期
页码:
112-121
栏目:
计算机科学与技术
出版日期:
2024-09-15

文章信息/Info

Title:
Research on Design Method of Weakly-Coupled Coprocessor:a Case Study of Artificial Intelligence Application
文章编号:
1001-4616(2024)03-0112-10
作者:
严忻恺12陈芳园3
(1.浙江大学工程师学院,浙江 杭州 310000)
(2.江苏信息职业技术学院,江苏 无锡 214153)
(3.江南计算技术研究所,江苏 无锡 214000)
Author(s):
Yan Xinkai12Chen Fangyuan3
(1.Polytechnic Institute,Zhejiang University,Hangzhou 310000,China)
(2.Jiangsu Vocational College of Information Technology,Wuxi 214153,China)
(3.Jiangnan Institute of Computing Technology,Wuxi 214000,China)
关键词:
协处理器领域特定架构弱耦合RISC-V人工智能
Keywords:
coprocessorDSAweak-coupledRISC-VAI
分类号:
TP332
DOI:
10.3969/j.issn.1001-4616.2024.03.014
文献标志码:
A
摘要:
近些年随着人工智能、大数据、元宇宙等应用的蓬勃发展和半导体工艺进步的放缓,软件应用与硬件性能之间出现了巨大的算力鸿沟,通过软硬件协同设计的特定领域架构作为应对方案得到了学术界和工业界的广泛关注和认可. 所以针对特定领域应用的核心需求设计专用协处理器,研究专用协处理器的设计方法,对于提高软件应用性能和效率,提升硬件设计效率等问题具有重大意义. 本文分析了不同耦合度和不同负载需求的协处理器设计空间,重点研究了弱耦合协处理器的设计方法,包括基于RISC-V定制指令设计协处理器指令架构、弱耦合协处理器在不同应用场景下的控制交互接口、访存接口和设计框架; 同时归纳总结了人工智能应用的共性需求和人工智能协处理器研究现状; 并给出了两种面向不同人工智能应用场景的弱耦合协处理器设计实例,为提高协处理器设计效率提供了有效支撑.
Abstract:
Recent years,with the rapid development of artificial intelligence,big data,meta-universe and other applications as well as the slowing down of semiconductor technology progress,there has been a huge computing gap between software application and hardware performance. As a solution,domain-specific architecture through software and hardware co-design has been widely concerned and recognized by academia and industry. Therefore,it is of great significance to design specific-coprocessor and research the design method of coprocessor for the core requirements of specific applications in order to improve the performance and efficiency of software application and improve the efficiency of hardware design. This paper analyzes the coprocessor design space of different coupling degree and different load requirements,and focuses on the design method of weakly coupled coprocessor,including the design of coprocessor instruction architecture based on RISC-V custom instruction,the control interaction interface,access interface and design framework of weakly coupled coprocessor in different application scenarios. The general requirements of artificial intelligence applications and the research status of artificial intelligence coprocessor are summarized,finally two design cases of weakly-coupled coprocessor for different AI application scenarios are described,which provides an effective support for improving the design efficiency of coprocessor.

参考文献/References:

[1]HENNESSY J L,PATTERSON D A. A new golden age for computer architecture[J]. Communications of the ACM,2019,62(2):48-60.
[2]SUGGS D,SUBRAMONY M,DAN B. The AMD “Zen 2” processor[J]. IEEE micro,2020,40(2):45-52.
[3]SKILLMAN A,EDSO T. A technical overview of Cortex-M55 and Ethos-U55:Arm's most capable processors for endpoint AI[C]//2020 IEEE Hot Chips 32 Symposium(HCS). Palo Alto:IEEE,2020:1-20.
[4]JOUPPI N P,YOUNG C,PATIL N,et al. In-datacenter performance analysis of a tensor processing unit[J]. Computer architecture news,2017,45(2):1-12.
[5]贾迅,邬贵明,谢向辉,等. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展,2019,56(2):410-420.
[6]LI Y,PEDRAM A. CATERPILLAR:Coarse grain reconfigurable architecture for accelerating the training of deep neural networks[C]//2017 IEEE 28th International Conference on Application-specific Systems,Architectures and Processors(ASAP). Seattle:IEEE,2017:1-10.
[7]STARKE W J,THOMPTO B W,STUECHELI J,et al. IBM's power10 processor[J]. IEEE micro,2021,41(2):7-14.
[8]IBM. Power instruction set architecture,version 3.1b[EB/OL]. [2023-05-28]https://openpowerfoundation.org/specifications/isa/.
[9]EMIL T,DOUGLAS W,SARMA D,et al. DOJO:The microarchitecture of Tesla's exa-scale computer[C]//2022 IEEE Hot Chips 34 Symposium(HCS).Cupertino:IEEE,2022:1-28.
[10]MATTEO P,MATHEUS C,WISTOFF N,et al. A“New Ara”for vector computing:An open source highly efficient RISC-V V 1.0 vector processor design[C]//2022 IEEE 33rd International Conference on Application-specific Systems,Architectures and Processors(ASAP). Gothenburg:IEEE,2022:43-51..
[11]SCHMIDT C,WRIGHT J,WANG Z,et al. 4.3 An eight-core 1.44 GHz RISC-V vector machine in 16 nm FinFET[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:58-60.
[12]ASANOVIC K,AVIZIENIS R,BACHRACH J,et al. The rocket chip generator,UCB/EECS-2016-17[R]. Berkeley,CA:EECS Department,University of California,2016.
[13]NUCLEI SYSTEM TECHNOLOGY. Hummingbirdv2 E203 Core and SoC v0.2.1[EB/OL]. [2023-05-28]https://doc.nucleisys.com/hbirdv2/core/core.html#nice.
[14]VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[J]. Neural information processing system,2011:5998-6008.
[15]NVIDIA. NVIDIA H100 tensor core GPU architecture[EB/OL]. [2023-05-28]https://nvdam.widen.net/s/9bz6dw7dqr/gtc22-whitepaper-hopper,2022.
[16]NORMAN P J,GEORGE K,SHEN G L,et al. TPU v4:An optically reconfigurable supercomputer for machine learning with hardware support for embeddings[J]. arXiv:2304.01433,2023.
[17]JOUPPI N P,YOON D H,ASHCRAFT M,et al. Ten lessons from three generations shaped Google's TPUv4i:Industrial product[C]//2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture(ISCA). Valencia:ACM,2021:1-14.
[18]CAMBRICON. Cambricon MLU370 chip[EB/OL]. [2023-05-28]https://www.cambricon.com/index.php?m=content&c=index&a=lists&catid=360,2023.
[19]OUYANG J,DU X,MA Y,et al. 3.3 Kunlun:A 14 nm high-performance AI processor for diversified workloads[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:50-51.
[20]AUFRANC J L. Kendryte K510 tri-core RISC-V AI processor deliver up to 3 TOPS[EB/OL]. [2023-05-28]https://www.cnx-software.com/2021/07/09/kendryte-k510-tri-core-risc-v-ai-processor-3-tops/,2021.
[21]SHILOV A. Tachyum teases 128-Core CPU:5.7 GHz,950W,16 DDR5 channels[EB/OL]. [2023-05-28]https://www.tomshardware.com/news/tachyum-teases-128-corecpu-57-ghz-950w-16-ddr5-channels,2022.

备注/Memo

备注/Memo:
收稿日期:2023-05-22.
基金项目:江苏省高等学校基础科学(自然科学)研究重大项目(24KJA510005).
通讯作者:严忻恺,博士,讲师,研究方向:计算机体系结构和芯片设计等. E-mail:yanxinkai@zju.edu.cn
更新日期/Last Update: 2024-09-15