About this Course
4.6
1,657개의 평가
284개의 리뷰

다음의 4/4개 강좌

100% 온라인

지금 바로 시작해 나만의 일정에 따라 학습을 진행하세요.

탄력적인 마감일

일정에 따라 마감일을 재설정합니다.

완료하는 데 약 48시간 필요

권장: 6 weeks of study, 5-8 hours/week...

영어

자막: 영어, 한국어, 아랍어

귀하가 습득할 기술

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree

다음의 4/4개 강좌

100% 온라인

지금 바로 시작해 나만의 일정에 따라 학습을 진행하세요.

탄력적인 마감일

일정에 따라 마감일을 재설정합니다.

완료하는 데 약 48시간 필요

권장: 6 weeks of study, 5-8 hours/week...

영어

자막: 영어, 한국어, 아랍어

강의 계획 - 이 강좌에서 배울 내용

1
완료하는 데 1시간 필요

Welcome

Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have....
4 videos (Total 25 min), 4 readings
4개의 동영상
Course overview3m
Module-by-module topics covered8m
Assumed background6m
4개의 읽기 자료
Important Update regarding the Machine Learning Specialization10m
Slides presented in this module10m
Software tools you'll need for this course10m
A big week ahead!10m
2
완료하는 데 4시간 필요

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced....
22 videos (Total 137 min), 4 readings, 5 quizzes
22개의 동영상
1-NN algorithm2m
k-NN algorithm6m
Document representation5m
Distance metrics: Euclidean and scaled Euclidean6m
Writing (scaled) Euclidean distance using (weighted) inner products4m
Distance metrics: Cosine similarity9m
To normalize or not and other distance considerations6m
Complexity of brute force search1m
KD-tree representation9m
NN search with KD-trees7m
Complexity of NN search with KD-trees5m
Visualizing scaling behavior of KD-trees4m
Approximate k-NN search using KD-trees7m
Limitations of KD-trees3m
LSH as an alternative to KD-trees4m
Using random lines to partition points5m
Defining more bins3m
Searching neighboring bins8m
LSH in higher dimensions4m
(OPTIONAL) Improving efficiency through multiple tables22m
A brief recap2m
4개의 읽기 자료
Slides presented in this module10m
Choosing features and metrics for nearest neighbor search10m
(OPTIONAL) A worked-out example for KD-trees10m
Implementing Locality Sensitive Hashing from scratch10m
5개 연습문제
Representations and metrics12m
Choosing features and metrics for nearest neighbor search10m
KD-trees10m
Locality Sensitive Hashing10m
Implementing Locality Sensitive Hashing from scratch10m
3
완료하는 데 2시간 필요

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned....
13 videos (Total 79 min), 2 readings, 3 quizzes
13개의 동영상
An unsupervised task6m
Hope for unsupervised learning, and some challenge cases4m
The k-means algorithm7m
k-means as coordinate descent6m
Smart initialization via k-means++4m
Assessing the quality and choosing the number of clusters9m
Motivating MapReduce8m
The general MapReduce abstraction5m
MapReduce execution overview and combiners6m
MapReduce for k-means7m
Other applications of clustering7m
A brief recap1m
2개의 읽기 자료
Slides presented in this module10m
Clustering text data with k-means10m
3개 연습문제
k-means18m
Clustering text data with K-means16m
MapReduce for k-means10m
4
완료하는 데 3시간 필요

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered....
15 videos (Total 91 min), 4 readings, 3 quizzes
15개의 동영상
Aggregating over unknown classes in an image dataset6m
Univariate Gaussian distributions2m
Bivariate and multivariate Gaussians7m
Mixture of Gaussians6m
Interpreting the mixture of Gaussian terms5m
Scaling mixtures of Gaussians for document clustering5m
Computing soft assignments from known cluster parameters7m
(OPTIONAL) Responsibilities as Bayes' rule5m
Estimating cluster parameters from known cluster assignments6m
Estimating cluster parameters from soft assignments8m
EM iterates in equations and pictures6m
Convergence, initialization, and overfitting of EM9m
Relationship to k-means3m
A brief recap1m
4개의 읽기 자료
Slides presented in this module10m
(OPTIONAL) A worked-out example for EM10m
Implementing EM for Gaussian mixtures10m
Clustering text data with Gaussian mixtures10m
3개 연습문제
EM for Gaussian mixtures18m
Implementing EM for Gaussian mixtures12m
Clustering text data with Gaussian mixtures8m
4.6
284개의 리뷰Chevron Right

35%

이 강좌를 수료한 후 새로운 경력 시작하기

36%

이 강좌를 통해 확실한 경력상 이점 얻기

최상위 리뷰

대학: BKAug 25th 2016

excellent material! It would be nice, however, to mention some reading material, books or articles, for those interested in the details and the theories behind the concepts presented in the course.

대학: JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

강사

Avatar

Emily Fox

Amazon Professor of Machine Learning
Statistics
Avatar

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

워싱턴 대학교 정보

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

기계 학습 전문 분야 정보

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
기계 학습

자주 묻는 질문

  • 강좌에 등록하면 바로 모든 비디오, 테스트 및 프로그래밍 과제(해당하는 경우)에 접근할 수 있습니다. 상호 첨삭 과제는 이 세션이 시작된 경우에만 제출하고 검토할 수 있습니다. 강좌를 구매하지 않고 살펴보기만 하면 특정 과제에 접근하지 못할 수 있습니다.

  • 강좌를 등록하면 전문 분야의 모든 강좌에 접근할 수 있고 강좌를 완료하면 수료증을 취득할 수 있습니다. 전자 수료증이 성취도 페이지에 추가되며 해당 페이지에서 수료증을 인쇄하거나 LinkedIn 프로필에 수료증을 추가할 수 있습니다. 강좌 내용만 읽고 살펴보려면 해당 강좌를 무료로 청강할 수 있습니다.

궁금한 점이 더 있으신가요? 학습자 도움말 센터를 방문해 보세요.