参考文献/References:
[1]HENNESSY J L,PATTERSON D A. A new golden age for computer architecture[J]. Communications of the ACM,2019,62(2):48-60.
[2]SUGGS D,SUBRAMONY M,DAN B. The AMD “Zen 2” processor[J]. IEEE micro,2020,40(2):45-52.
[3]SKILLMAN A,EDSO T. A technical overview of Cortex-M55 and Ethos-U55:Arm's most capable processors for endpoint AI[C]//2020 IEEE Hot Chips 32 Symposium(HCS). Palo Alto:IEEE,2020:1-20.
[4]JOUPPI N P,YOUNG C,PATIL N,et al. In-datacenter performance analysis of a tensor processing unit[J]. Computer architecture news,2017,45(2):1-12.
[5]贾迅,邬贵明,谢向辉,等. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展,2019,56(2):410-420.
[6]LI Y,PEDRAM A. CATERPILLAR:Coarse grain reconfigurable architecture for accelerating the training of deep neural networks[C]//2017 IEEE 28th International Conference on Application-specific Systems,Architectures and Processors(ASAP). Seattle:IEEE,2017:1-10.
[7]STARKE W J,THOMPTO B W,STUECHELI J,et al. IBM's power10 processor[J]. IEEE micro,2021,41(2):7-14.
[8]IBM. Power instruction set architecture,version 3.1b[EB/OL]. [2023-05-28]https://openpowerfoundation.org/specifications/isa/.
[9]EMIL T,DOUGLAS W,SARMA D,et al. DOJO:The microarchitecture of Tesla's exa-scale computer[C]//2022 IEEE Hot Chips 34 Symposium(HCS).Cupertino:IEEE,2022:1-28.
[10]MATTEO P,MATHEUS C,WISTOFF N,et al. A“New Ara”for vector computing:An open source highly efficient RISC-V V 1.0 vector processor design[C]//2022 IEEE 33rd International Conference on Application-specific Systems,Architectures and Processors(ASAP). Gothenburg:IEEE,2022:43-51..
[11]SCHMIDT C,WRIGHT J,WANG Z,et al. 4.3 An eight-core 1.44 GHz RISC-V vector machine in 16 nm FinFET[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:58-60.
[12]ASANOVIC K,AVIZIENIS R,BACHRACH J,et al. The rocket chip generator,UCB/EECS-2016-17[R]. Berkeley,CA:EECS Department,University of California,2016.
[13]NUCLEI SYSTEM TECHNOLOGY. Hummingbirdv2 E203 Core and SoC v0.2.1[EB/OL]. [2023-05-28]https://doc.nucleisys.com/hbirdv2/core/core.html#nice.
[14]VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[J]. Neural information processing system,2011:5998-6008.
[15]NVIDIA. NVIDIA H100 tensor core GPU architecture[EB/OL]. [2023-05-28]https://nvdam.widen.net/s/9bz6dw7dqr/gtc22-whitepaper-hopper,2022.
[16]NORMAN P J,GEORGE K,SHEN G L,et al. TPU v4:An optically reconfigurable supercomputer for machine learning with hardware support for embeddings[J]. arXiv:2304.01433,2023.
[17]JOUPPI N P,YOON D H,ASHCRAFT M,et al. Ten lessons from three generations shaped Google's TPUv4i:Industrial product[C]//2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture(ISCA). Valencia:ACM,2021:1-14.
[18]CAMBRICON. Cambricon MLU370 chip[EB/OL]. [2023-05-28]https://www.cambricon.com/index.php?m=content&c=index&a=lists&catid=360,2023.
[19]OUYANG J,DU X,MA Y,et al. 3.3 Kunlun:A 14 nm high-performance AI processor for diversified workloads[C]//2021 IEEE International Solid-State Circuits Conference(ISSCC). SanFrancisco:IEEE,2021:50-51.
[20]AUFRANC J L. Kendryte K510 tri-core RISC-V AI processor deliver up to 3 TOPS[EB/OL]. [2023-05-28]https://www.cnx-software.com/2021/07/09/kendryte-k510-tri-core-risc-v-ai-processor-3-tops/,2021.
[21]SHILOV A. Tachyum teases 128-Core CPU:5.7 GHz,950W,16 DDR5 channels[EB/OL]. [2023-05-28]https://www.tomshardware.com/news/tachyum-teases-128-corecpu-57-ghz-950w-16-ddr5-channels,2022.