GPU Kernel Programming

迈向可编程观测：在GPU Kernel中构建类eBPF风格的性能探针

本文旨在梳理作者学习路径，带领读者共同探索 GPU Kernel 性能分析从宏观到微观的技术演进。引言作为一名使用eBPF进行CPU性能分析的工程师，在转向学习GPU性能优化分析时，一直在思考GPU上是否有技术也可以实现用户自定义探针式性能分析？学习NVIDIA Nsight ...

The Next Platform

Inside The Programming Evolution of GPU Computing

Back in 2000, Ian Buck and a small computer graphics team at Stanford University were watching the steady evolution of computer graphics processors for gaming and thinking about how such devices could ...

新浪网

32B逆袭GPT-5.2：首个端到端GPU编程智能体框架StitchCUDA问世

本文作者包括明尼苏达大学的李世阳（共同第一作者），张子健（共同第一作者），Winson Chen，罗越波，洪明毅，丁才文。现有的 LLM 自动化 CUDA 方法大多只能优化单个 Kernel，面对完整的端到端 GPU 程序（如整个 VisionTransformer 推理）往往束手无策。本文中 ...

The Next Platform

Unified Memory: The Final Piece Of The GPU Programming Puzzle

Support for unified memory across CPUs and GPUs in accelerated computing systems is the final piece of a programming puzzle that we have been assembling for about ten years now. Unified memory has a ...

Nature

Performance Tuning and Auto-Tuning of Algorithms for GPU Kernels

The optimisation of GPU kernels through performance tuning and auto-tuning approaches has become essential in maximising computational efficiency on modern heterogeneous architectures. Researchers ...

Electronic Design

Programming The CUDA Architecture: A Look At GPU Computing

Graphics processing units (GPUs) were originally designed to perform the highly parallel computations required for graphics rendering. But over the last couple of years, they’ve proven to be powerful ...

InfoWorld

Java plan would support GPUs and other foreign programming models

Project Babylon would extend the reach of Java to foreign programming models such as machine learning models, GPUs, SQL, and differential programming. Java would be extended to foreign programming ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果