HadoopJust the Basics for Big Data Rookies презентация
Содержание
- 2. Agenda Hadoop Overview HDFS Architecture Hadoop MapReduce Hadoop Ecosystem MapReduce Primer
- 3. Hadoop Overview
- 4. Hadoop Core Open-source Apache project out of Yahoo! in 2006 Distributed
- 5. Why? Bottom line: Flexible Scalable Inexpensive
- 6. Overview Great at Reliable storage for multi-petabyte data sets Batch queries
- 7. Data Structure Bytes! No more ETL necessary Store data now, process
- 8. Versioning Version 0.20.x, 0.21.x, 0.22.x, 1.x.x Two main MR packages: org.apache.hadoop.mapred
- 9. HDFS Architecture
- 10. HDFS Overview Hierarchical UNIX-like file system for data storage sort of
- 11. NameNode Single master service for HDFS Single point of failure (HDFS
- 12. Checkpoint Node (Secondary NN) Performs checkpoints of the NameNode’s namespace and
- 13. DataNode Stores blocks on local disk Sends frequent heartbeats to NameNode
- 14. How HDFS Works - Writes
- 15. How HDFS Works - Writes
- 16. How HDFS Works - Reads
- 17. How HDFS Works - Failure
- 18. Block Replication Default of three replicas Rack-aware system One block on
- 19. HDFS 2.0 Features NameNode High-Availability (HA) Two redundant NameNodes in active/passive
- 20. Hadoop MapReduce
- 21. Hadoop MapReduce 1.x Moves the code to the data JobTracker Master
- 22. JobTracker Monitors job and task progress Issues task attempts to TaskTrackers
- 23. TaskTrackers Runs on same node as DataNode service Sends heartbeats and
- 24. Exploiting Data Locality JobTracker will schedule task on a TaskTracker that
- 25. How MapReduce Works
- 26. How MapReduce Works - Failure
- 27. YARN Abstract framework for distributed application development Split functionality of JobTracker
- 28. MapReduce 2.x on YARN MapReduce API has not changed Rebuild required
- 29. Hadoop Ecosystem
- 30. Hadoop Ecosystem Core Technologies Hadoop Distributed File System Hadoop MapReduce Many
- 31. Moving Data Sqoop Moving data between RDBMS and HDFS Say, migrating
- 32. Flume Architecture
- 33. Higher Level APIs Pig Data-flow language – aptly named PigLatin --
- 34. Pig Word Count A = LOAD '$input'; B = FOREACH
- 35. Key/Value Stores HBase Accumulo Implementations of Google’s Big Table for HDFS
- 36. HBase Architecture
- 37. Data Structure Avro Data serialization system designed for the Hadoop ecosystem
- 38. Scalable Machine Learning Mahout Library for scalable machine learning written in
- 39. Workflow Management Oozie Scheduling system for Hadoop Jobs Support for: Java
- 40. Real-time Stream Processing Storm Open-source project which runs a streaming of
- 41. Distributed Application Coordination ZooKeeper An effort to develop and maintain an
- 42. ZooKeeper Architecture
- 43. Hadoop Streaming Write MapReduce mappers and reducers using stdin and stdout
- 44. SQL on Hadoop Apache Drill Cloudera Impala Hive Stinger Pivotal HAWQ
- 45. HAWQ Architecture
- 46. That’s a lot of projects I am likely missing several (Sorry,
- 47. Sample Architecture
- 48. MapReduce Primer
- 49. MapReduce Paradigm Data processing system with two key phases Map Perform
- 51. Hadoop MapReduce Components Map Phase Input Format Record Reader Mapper Combiner
- 52. Writable Interfaces public interface Writable { void write(DataOutput out); void readFields(DataInput
- 53. InputFormat public abstract class InputFormat<K, V> { public abstract List<InputSplit> getSplits(JobContext
- 54. RecordReader public abstract class RecordReader<KEYIN, VALUEIN> implements Closeable { public abstract
- 55. Mapper public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> { protected void setup(Context
- 56. Partitioner public abstract class Partitioner<KEY, VALUE> { public abstract int getPartition(KEY
- 57. Reducer public class Reducer<KEYIN, VALUEIN, KEYOUT, VALUEOUT> { protected void setup(Context
- 58. OutputFormat public abstract class OutputFormat<K, V> { public abstract RecordWriter<K,
- 59. RecordWriter public abstract class RecordWriter<K, V> { public abstract void write(K
- 60. Word Count Example
- 61. Problem Count the number of times each word is used in
- 62. Mapper Code public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
- 63. Shuffle and Sort
- 64. Reducer Code public class IntSumReducer extends Reducer<Text, LongWritable, Text, IntWritable> {
- 65. So what’s so hard about it?
- 66. So what’s so hard about it? MapReduce is a limitation Entirely
- 67. So what does this mean for you? Hadoop is written primarily
- 68. Resources, Wrap-up, etc. http://hadoop.apache.org Very supportive community Strata + Hadoop World
- 69. Getting Started Pivotal HD Single-Node VM and Community Edition http://gopivotal.com/pivotal-products/data/pivotal-hd For
- 70. Acknowledgements Apache Hadoop, the Hadoop elephant logo, HDFS, Accumulo, Avro, Drill,
- 71. Learn More. Stay Connected. Talk to us on Twitter: @springcentral Find
- 72. Скачать презентацию
Слайды и текст этой презентации
Скачать презентацию на тему HadoopJust the Basics for Big Data Rookies можно ниже: