Understanding Feature Space in Machine Learning презентация

Understanding Feature Space in Machine Learning
 Alice Zheng, Dato
 September 9,My journey so farWhy machine learning?The machine learning pipelineFeature = numeric representation of raw dataRepresenting natural textRepresenting natural textRepresenting imagesRepresenting imagesFeature space in machine learning
 Raw data  high dimensional vectors
Crudely speaking, mathematicians fall into two categories: the algebraists, who findAlgebra vs. GeometryVisualizing a sphere in 2DVisualizing a sphere in 3DVisualizing a sphere in 4DWhy are we looking at spheres?The power of higher dimensions
 A sphere in 4D can modelVisualizing Feature SpaceThe challenge of high dimension geometry
 Feature space can have hundredsVisualizing bag-of-wordsVisualizing bag-of-wordsDocument point cloudWhat is a model?
 Model = mathematical “summary” of data
 What’sClassification modelClustering modelRegression modelVisualizing Feature EngineeringWhen does bag-of-words fail?Improving on bag-of-words
 Idea: “normalize” word counts so that popular wordsFrom BOW to tf-idfFrom BOW to tf-idfEntry points of feature engineering
 Start from data and task
 What’sThat’s not all, folks!
 There’s a lot more to feature engineering:



Слайды и текст этой презентации
Слайд 1
Описание слайда:
Understanding Feature Space in Machine Learning Alice Zheng, Dato September 9, 2015


Слайд 2
Описание слайда:
My journey so far

Слайд 3
Описание слайда:
Why machine learning?

Слайд 4
Описание слайда:
The machine learning pipeline

Слайд 5
Описание слайда:
Feature = numeric representation of raw data

Слайд 6
Описание слайда:
Representing natural text

Слайд 7
Описание слайда:
Representing natural text

Слайд 8
Описание слайда:
Representing images

Слайд 9
Описание слайда:
Representing images

Слайд 10
Описание слайда:
Feature space in machine learning Raw data  high dimensional vectors Collection of data points  point cloud in feature space Model = geometric summary of point cloud Feature engineering = creating features of the appropriate granularity for the task

Слайд 11
Описание слайда:
Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”

Слайд 12
Описание слайда:
Algebra vs. Geometry

Слайд 13
Описание слайда:
Visualizing a sphere in 2D

Слайд 14
Описание слайда:
Visualizing a sphere in 3D

Слайд 15
Описание слайда:
Visualizing a sphere in 4D

Слайд 16
Описание слайда:
Why are we looking at spheres?

Слайд 17
Описание слайда:
The power of higher dimensions A sphere in 4D can model the birth and death process of physical objects Point clouds = approximate geometric shapes High dimensional features can model many things

Слайд 18
Описание слайда:
Visualizing Feature Space

Слайд 19
Описание слайда:
The challenge of high dimension geometry Feature space can have hundreds to millions of dimensions In high dimensions, our geometric imagination is limited Algebra comes to our aid

Слайд 20
Описание слайда:
Visualizing bag-of-words

Слайд 21
Описание слайда:
Visualizing bag-of-words

Слайд 22
Описание слайда:
Document point cloud

Слайд 23
Описание слайда:
What is a model? Model = mathematical “summary” of data What’s a summary? A geometric shape

Слайд 24
Описание слайда:
Classification model

Слайд 25
Описание слайда:
Clustering model

Слайд 26
Описание слайда:
Regression model

Слайд 27
Описание слайда:
Visualizing Feature Engineering

Слайд 28
Описание слайда:
When does bag-of-words fail?

Слайд 29
Описание слайда:
Improving on bag-of-words Idea: “normalize” word counts so that popular words are discounted Term frequency (tf) = Number of times a terms appears in a document Inverse document frequency of word (idf) = N = total number of documents Tf-idf count = tf x idf

Слайд 30
Описание слайда:
From BOW to tf-idf

Слайд 31
Описание слайда:
From BOW to tf-idf

Слайд 32
Описание слайда:
Entry points of feature engineering Start from data and task What’s the best text representation for classification? Start from modeling method What kind of features does k-means assume? What does linear regression assume about the data?

Слайд 33
Описание слайда:
That’s not all, folks! There’s a lot more to feature engineering: Feature normalization Feature transformations “Regularizing” models Learning the right features Dato is hiring! [email protected]


Скачать презентацию на тему Understanding Feature Space in Machine Learning можно ниже:

Похожие презентации