Understanding Feature Space in Machine Learning презентация

Содержание

2. My journey so far
3. Why machine learning?
4. The machine learning pipeline
5. Feature = numeric representation of raw data
6. Representing natural text
7. Representing natural text
8. Representing images
9. Representing images
10. Feature space in machine learning Raw data  high dimensional vectors
11. Crudely speaking, mathematicians fall into two categories: the algebraists, who find
12. Algebra vs. Geometry
13. Visualizing a sphere in 2D
14. Visualizing a sphere in 3D
15. Visualizing a sphere in 4D
16. Why are we looking at spheres?
17. The power of higher dimensions A sphere in 4D can model
18. Visualizing Feature Space
19. The challenge of high dimension geometry Feature space can have hundreds
20. Visualizing bag-of-words
21. Visualizing bag-of-words
22. Document point cloud
23. What is a model? Model = mathematical “summary” of data What’s
24. Classification model
25. Clustering model
26. Regression model
27. Visualizing Feature Engineering
28. When does bag-of-words fail?
29. Improving on bag-of-words Idea: “normalize” word counts so that popular words
30. From BOW to tf-idf
31. From BOW to tf-idf
32. Entry points of feature engineering Start from data and task What’s
33. That’s not all, folks! There’s a lot more to feature engineering:
34. Скачать презентацию

Презентации» Образование» Understanding Feature Space in Machine Learning

Understanding Feature Space in Machine Learning
Alice Zheng, Dato
September 9,

Feature = numeric representation of raw data

Feature space in machine learning
Raw data  high dimensional vectors

Crudely speaking, mathematicians fall into two categories: the algebraists, who find

The power of higher dimensions
A sphere in 4D can model

The challenge of high dimension geometry
Feature space can have hundreds

What is a model?
Model = mathematical “summary” of data
What’s

Improving on bag-of-words
Idea: “normalize” word counts so that popular words

Entry points of feature engineering
Start from data and task
What’s

That’s not all, folks!
There’s a lot more to feature engineering:

Слайды и текст этой презентации

Слайд 1

Описание слайда:

Understanding Feature Space in Machine Learning Alice Zheng, Dato September 9, 2015

Слайд 2

Описание слайда:

My journey so far

Слайд 3

Описание слайда:

Why machine learning?

Слайд 4

Описание слайда:

The machine learning pipeline

Слайд 5

Описание слайда:

Feature = numeric representation of raw data

Слайд 6

Описание слайда:

Representing natural text

Слайд 7

Описание слайда:

Representing natural text

Слайд 8

Описание слайда:

Representing images

Слайд 9

Описание слайда:

Representing images

Слайд 10

Описание слайда:

Feature space in machine learning Raw data  high dimensional vectors Collection of data points  point cloud in feature space Model = geometric summary of point cloud Feature engineering = creating features of the appropriate granularity for the task

Слайд 11

Описание слайда:

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”

Слайд 12

Описание слайда:

Algebra vs. Geometry

Слайд 13

Описание слайда:

Visualizing a sphere in 2D

Слайд 14

Описание слайда:

Visualizing a sphere in 3D

Слайд 15

Описание слайда:

Visualizing a sphere in 4D

Слайд 16

Описание слайда:

Why are we looking at spheres?

Слайд 17

Описание слайда:

The power of higher dimensions A sphere in 4D can model the birth and death process of physical objects Point clouds = approximate geometric shapes High dimensional features can model many things

Слайд 18

Описание слайда:

Visualizing Feature Space

Слайд 19

Описание слайда:

The challenge of high dimension geometry Feature space can have hundreds to millions of dimensions In high dimensions, our geometric imagination is limited Algebra comes to our aid

Слайд 20

Описание слайда:

Visualizing bag-of-words

Слайд 21

Описание слайда:

Visualizing bag-of-words

Слайд 22

Описание слайда:

Document point cloud

Слайд 23

Описание слайда:

What is a model? Model = mathematical “summary” of data What’s a summary? A geometric shape

Слайд 24

Описание слайда:

Classification model

Слайд 25

Описание слайда:

Clustering model

Слайд 26

Описание слайда:

Regression model

Слайд 27

Описание слайда:

Visualizing Feature Engineering

Слайд 28

Описание слайда:

When does bag-of-words fail?

Слайд 29

Описание слайда:

Improving on bag-of-words Idea: “normalize” word counts so that popular words are discounted Term frequency (tf) = Number of times a terms appears in a document Inverse document frequency of word (idf) = N = total number of documents Tf-idf count = tf x idf

Слайд 30

Описание слайда:

From BOW to tf-idf

Слайд 31

Описание слайда:

From BOW to tf-idf

Слайд 32

Описание слайда:

Entry points of feature engineering Start from data and task What’s the best text representation for classification? Start from modeling method What kind of features does k-means assume? What does linear regression assume about the data?

Слайд 33

Описание слайда:

That’s not all, folks! There’s a lot more to feature engineering: Feature normalization Feature transformations “Regularizing” models Learning the right features Dato is hiring! [email protected]

Скачать презентацию на тему Understanding Feature Space in Machine Learning можно ниже: