基本信息
文件名称:Spark介绍:项目目标、组件及扩展方法.pptx
文件大小:488.25 KB
总页数:39 页
更新时间:2026-04-01
总字数:约7.69千字
文档摘要

MateiZahariaUCBerkeleyIntroductiontoSparkInternalsUCBERKELEY

OutlineProjectgoalsComponentsLifeofajobExtendingSparkHowtocontribute

ProjectGoalsGeneralityLowlatencyFaulttoleranceSimplicity:diverseworkloads,operators,jobsizes:sub-second:faultsshouldn’tbespecialcase:oftencomesfromgenerality

CodebaseSizeSpark:20,000LOCHadoop1.0:90,000LOCHadoop2.0:220,000LOC(non-test,non-examplesources)

CodebaseDetailsHadoopI/O:

400LOCMesosbackend:700LOCStandalonebackend:1700LOCInterpreter:3300LOCSparkcore:16,000LOCOperators:2000Blockmanager:2700Scheduler: