Data Science in the Real World: Making a Difference презентация

Содержание


Презентации» Информатика» Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference 
 SrinathOutline
  Making sense of World’s Data
  Building Data Systems
Michael Stonebraker
 “But then, out of nowhere, some marketing guys startedMichael Stonebraker
 “But then, out of nowhere, some marketing guys startedA Day in Your Life
 Think about a day in yourWhat can We do with Data?
 Optimize (World is inefficient)
 30%Building Data Processing SystemsData Science ArchitectureData Processing Technologies LandscapeBatch Processing
 Store and process 
 Slow (> 5 minutes forUsecase: Big Data for development
 Done using CDR data
 People densityValue of some Insights degrade Fast!
 For some usecases ( e.g.Complex Event ProcessingPredictive Analytics
 If we know how to solve a problem, thatUsecase: Predictive Maintenance
 Idea is to fix the problem before itCommunicate: Dashboards
 Idea is to given the “Overall idea” in aCommunicate: Alerts and Triggers
 Detecting conditions can be done via EventCase Study: Realtime Soccer AnalysisChanging DynamicsLarge Observational Datasets
 Stats are easy with designed experiments 
 You“It is better to be roughly right than precisely wrong.” Challenges: Causality
 Correlation does not imply Causality!! ( send a bookCurious Case of Missing DataMore Data Beat a Clever Algorithm
 Observed by large internet companiesChallenges: Feature Engineering
 In ML feature engineering is the key [1].Challenges: Taking Decisions (Context)Challenges: Updating Models
 Incorporate more data 
 We get more dataChallenges: Lack of Labeled DataTwo Takeaways
 Do your data Processing as part of a BiggerQuestions?



Слайды и текст этой презентации
Слайд 1
Описание слайда:
Data Science in the Real World: Making a Difference Srinath Perera Director Research WSO2, Apache Member (@srinath_perera) srinath@wso2.com StatDay 2015 @ University of Colombo


Слайд 2
Описание слайда:
Outline Making sense of World’s Data Building Data Systems Changing Dynamics of Data Analysis with Big Data ( Sensor Data) Challenges and Open Problems

Слайд 3
Описание слайда:
Michael Stonebraker “But then, out of nowhere, some marketing guys started talking about ‘big data, That’s when I realized that I’d been studying this thing for the better part of my academic life.”

Слайд 4
Описание слайда:
Michael Stonebraker “But then, out of nowhere, some marketing guys started talking about ‘big data, That’s when I realized that I’d been studying this thing for the better part of my academic life.”

Слайд 5
Описание слайда:
A Day in Your Life Think about a day in your life? What is the best road to take? Would there be any bad weather? How to invest my money? How is my health? There are many decisions that you can do better if only you can access the data and process them.

Слайд 6
Описание слайда:

Слайд 7
Описание слайда:
What can We do with Data? Optimize (World is inefficient) 30% food wasted farm to plate GE Save 1% initiative (http://goo.gl/eYC0QE ) Trains => 2B/ year US healthcare => 20B/ year Save lives Weather, Disease identification, Personalized treatment Technology advancement Most high tech research are done via simulations

Слайд 8
Описание слайда:
Building Data Processing Systems

Слайд 9
Описание слайда:
Data Science Architecture

Слайд 10
Описание слайда:
Data Processing Technologies Landscape

Слайд 11
Описание слайда:
Batch Processing Store and process Slow (> 5 minutes for results for a reasonable usecase) Programming model is MapReduce Apache Hadoop Spark

Слайд 12
Описание слайда:
Usecase: Big Data for development Done using CDR data People density noon vs. midnight (red => increased, blue => decreased) Urban Planning People distribution Mobility Waste Management E.g. see http://goo.gl/jPujmM

Слайд 13
Описание слайда:
Value of some Insights degrade Fast! For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrades very quickly with time. E.g. stock markets and speed of light

Слайд 14
Описание слайда:
Complex Event Processing

Слайд 15
Описание слайда:
Predictive Analytics If we know how to solve a problem, that is if we know a finite set of rules, then we can programs it. For some problems (e.g. Drive a car, character recognition), we do not know a finite fix rule set. Instead of programming, we give lot of examples and ask the computer to learn (often called Machine Learning) Lot of tools R ( Statistical language) Sci-kit learn (Phython) Apache Spark’s MLBase and Apache Mahout (Java)

Слайд 16
Описание слайда:
Usecase: Predictive Maintenance Idea is to fix the problem before it broke, avoiding expensive downtimes Airplanes, turbines, windmills Construction Equipment Car, Golf carts How Build a model for normal operation and compare deviation Match against known error patterns

Слайд 17
Описание слайда:
Communicate: Dashboards Idea is to given the “Overall idea” in a glance (e.g. car dashboard) Support for personalization, you can build your own dashboard. Also the entry point for Drill down How to build? Expose data via JSON Build Dashboard via Google Gadget and content via HTML5 + java scripts (Use charting libraries like Vega or D3)

Слайд 18
Описание слайда:
Communicate: Alerts and Triggers Detecting conditions can be done via Event Processing system ( e.g. CEP) Key is the “Last Mile” Email SMS Push notifications to a UI Pager Trigger physical Alarm

Слайд 19
Описание слайда:
Case Study: Realtime Soccer Analysis

Слайд 20
Описание слайда:
Changing Dynamics

Слайд 21
Описание слайда:
Large Observational Datasets Stats are easy with designed experiments You got to select a representative set You have a control group You have lot and lot of data and lot and lot of computing power ( compared to what you had)

Слайд 22
Описание слайда:
“It is better to be roughly right than precisely wrong.” ― John Keynes In the long run, we are all Dead!!

Слайд 23
Описание слайда:
Challenges: Causality Correlation does not imply Causality!! ( send a book home example [1]) Causality do repeat experiment with identical test If CAN’T do a randomized test (A/B test) With Big data we cannot do either Option 1: We can act on correlation if we can verify the guess or if correctness is not critical (Start Investigation, Check for a disease, Marketing ) Option 2: We verify correlations using A/B testing or propensity analysis

Слайд 24
Описание слайда:
Curious Case of Missing Data

Слайд 25
Описание слайда:
More Data Beat a Clever Algorithm Observed by large internet companies Also seen over keggle Competitions E.g. SVM vs. Logistic regression Read “A Few Useful Things to Know about Machine Learning” (Pedro Domingos)

Слайд 26
Описание слайда:
Challenges: Feature Engineering In ML feature engineering is the key [1]. You need features to form a kernel. Then you can solve with less data. Deep learning can learn best feature (combination) via semi or unsupervised learning [2]

Слайд 27
Описание слайда:
Challenges: Taking Decisions (Context)

Слайд 28
Описание слайда:
Challenges: Updating Models Incorporate more data We get more data over time We get feed back about effectiveness of decisions (e.g. Accuracy of Fraud) Trends change Track and update model Generate models in batch mode and update Streaming (Online) ML, which is an active research topic

Слайд 29
Описание слайда:
Challenges: Lack of Labeled Data

Слайд 30
Описание слайда:
Two Takeaways Do your data Processing as part of a Bigger system Think Systems, automate, make a difference Realtime vs Batch Use tools ( Do not reinvent the wheel) Think how dynamics are changing (Uncontrolled experiments, lot of Data) Do not be a data Pessimist However, do not do stupid things either

Слайд 31
Описание слайда:
Questions?


Скачать презентацию на тему Data Science in the Real World: Making a Difference можно ниже:

Похожие презентации