Big Data Platform at interest презентация

Big Data Platform at   interest
 Mao YeData at PinterestPinterest Data ArchitecturePinterest Data ArchitecturePinterest Data ArchitecturePinterest Data ArchitectureHadoop Platform Requirements
 Ephemeral clusters
 Access control layer
 Shared data store
Decoupling compute & storageCentralized Hive MetastoreMulti-layered PackagingExecutor Abstraction LayerWhy Qubole?
 API for simplified executor abstraction
 Advanced support for spotPinball for Workflow ManagementScale of Processing
 Scale:
 60 Billion Pins
 Hundreds of workflows
 ThousandsWhy Pinball?
 Requirements
 Simple abstractions
 Extensible in future
 Reliable stateless computing
Pinball DesignWorkflow Model 
 Workflow 
 A directed graph of nodes calledJob State
 Job state is captured in a token
 Tokens areJob State MachineMaster Worker Interaction
 Master keeps the state
 Workers claim and executeMaster
 Entire state is kept in memory
 Each state update isWorkerOpen Source
 Git repo:
     https://github.com/pinterest/pinball
 
 MailingThank You



Слайды и текст этой презентации
Слайд 1
Описание слайда:
Big Data Platform at interest Mao Ye


Слайд 2
Описание слайда:

Слайд 3
Описание слайда:

Слайд 4
Описание слайда:
Data at Pinterest

Слайд 5
Описание слайда:
Pinterest Data Architecture

Слайд 6
Описание слайда:
Pinterest Data Architecture

Слайд 7
Описание слайда:
Pinterest Data Architecture

Слайд 8
Описание слайда:
Pinterest Data Architecture

Слайд 9
Описание слайда:

Слайд 10
Описание слайда:
Hadoop Platform Requirements Ephemeral clusters Access control layer Shared data store Easy deployment

Слайд 11
Описание слайда:
Decoupling compute & storage

Слайд 12
Описание слайда:
Centralized Hive Metastore

Слайд 13
Описание слайда:
Multi-layered Packaging

Слайд 14
Описание слайда:
Executor Abstraction Layer

Слайд 15
Описание слайда:
Why Qubole? API for simplified executor abstraction Advanced support for spot instances Baked AMI customization

Слайд 16
Описание слайда:
Pinball for Workflow Management

Слайд 17
Описание слайда:
Scale of Processing Scale: 60 Billion Pins Hundreds of workflows Thousands of jobs 500+ jobs in a workflow 3 petabytes processed daily Support: Hadoop, Cascading, Hive, Spark …

Слайд 18
Описание слайда:
Why Pinball? Requirements Simple abstractions Extensible in future Reliable stateless computing Easy to debug Scales horizontally Can be upgraded w/o aborting workflows Rich features like auto-retries, per-job emails, overrun policies… Options Apache Oozie, Azkaban, Luigi

Слайд 19
Описание слайда:
Pinball Design

Слайд 20
Описание слайда:
Workflow Model Workflow A directed graph of nodes called jobs Edge Run after dependence Node Job is a node

Слайд 21
Описание слайда:
Job State Job state is captured in a token Tokens are named hierarchically

Слайд 22
Описание слайда:
Job State Machine

Слайд 23
Описание слайда:
Master Worker Interaction Master keeps the state Workers claim and execute tasks Horizontally scalable

Слайд 24
Описание слайда:
Master Entire state is kept in memory Each state update is synchronously persisted before master replies to client Master runs on a single thread – no concurrency issues

Слайд 25
Описание слайда:
Worker

Слайд 26
Описание слайда:
Open Source Git repo: https://github.com/pinterest/pinball Mailing list: https://groups.google.com/forum/#!forum/pinball-users

Слайд 27
Описание слайда:
Thank You


Скачать презентацию на тему Big Data Platform at interest можно ниже:

Похожие презентации