基本信息
文件名称:NVIDIA LLM 全栈式方案使用和优化最佳实践.pdf
文件大小:1.68 MB
总页数:35 页
更新时间:2024-12-20
总字数:约4.07万字
文档摘要

NVIDIALLM全栈式方案使用和优化最佳实践

周国峰(Chandler)NVIDIA技术研发经理

GTC2024ChinaAIDay,Mar.19,2024

1

Agenda

?NVIDIAFull-StackSolutionforLLM

?BestPracticesofNVIDIAMegatron-CoreforLLMTraining

?BestPracticesofNVIDIATensorRT-LLMforLLMInference

BestPracticesofNVIDIATritonInferenceSeverforLLM

?

Deployment

?ConclusionandProspect

2

Agenda

?NVIDIAFull-StackSolutionforLLM

?BestPracticesofNVIDIAMegatron-CoreforLLMTraining

?BestPracticesofNVIDIATensorRT-LLMforLLMInference

BestPracticesofNVIDIATritonInferenceSeverforLLM

?

Deployment

?ConclusionandProspect

3

NVIDIAFull-StackSolutionforLLM

FromTraining,InferencetoDeployment

NVIDIAMegatron-Core(M-core)forLLMTraining

?Anopen-sourcelibraryforGPUoptimizedtechniquesforLLMtraining.ForcustomerstobuildcustomLLMframework.

NVIDIATensorRT-LLMforLLMInference

?Anopen-sourcelibrarythatacceleratesandoptimizesinferenceperformanceofthelatestlargelanguagemodels(LLMs)

NVIDIATritonInferenceSeverforLLMdeployment

?Anopen-sourcelibrarythatstandardizesAImodeldeploymentandexecutionacrosseveryworkload

TensorRT-LLM+TritonInferenceServerfordeployment

?ThesuggestedwaytodeployLLM-basedservicesonNVIDIAAIplatform

?SOTAperformanceandrichfunctionalities

?TensorRT-LLMbackend.TheTritonbackendforTensorRT-LLM,includingin-flightbatching,pagedKVcacheandmore.