Analyze Text Data with Yellowbrick

4.4
별점
80개의 평가
제공자:
Coursera Project Network
4,755명이 이미 등록했습니다.
학습자는 이 안내 프로젝트에서 다음을 수행하게 됩니다.

Use visual diagnostic tools from Yellowbrick to steer your machine learning workflow

Vectorize text data using TF-IDF

Cluster documents using embedding techniques and appropriate metrics

Clock2 hours
Intermediate중급
Cloud다운로드 필요 없음
Video분할 화면 동영상
Comment Dots영어
Laptop데스크톱 전용

Welcome to this project-based course on Analyzing Text Data with Yellowbrick. Tasks such as assessing document similarity, topic modelling and other text mining endeavors are predicated on the notion of "closeness" or "similarity" between documents. In this course, we define various distance metrics (e.g. Euclidean, Hamming, Cosine, Manhattan, etc) and understand their merits and shortcomings as they relate to document similarity. We will apply these metrics on documents within a specific corpus and visualize our results. By the end of this course, you will be able to confidently use visual diagnostic tools from Yellowbrick to steer your machine learning workflow, vectorize text data using TF-IDF, and cluster documents using embedding techniques and appropriate metrics. This course runs on Coursera's hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with Python, Jupyter, Yellowbrick, and scikit-learn pre-installed. Notes: - You will be able to access the cloud desktop 5 times. However, you will be able to access instructions videos as many times as you want. - This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

개발할 기술

  • Data Science
  • Natural Language Processing
  • Machine Learning
  • Python Programming
  • Data Visualization (DataViz)

단계별 학습

작업 영역이 있는 분할 화면으로 재생되는 동영상에서 강사는 다음을 단계별로 안내합니다.

  1. Introduction and Loading the Corpus

  2. Vectorizing the Documents

  3. Clustering Similar Documents with Squared Euclidean Distance And Euclidean Distance

  4. Manhattan (aka “Taxicab” or “City Block”) Distance

  5. Bray Curtis Dissimilarity and Canberra Distance

  6. Cosine Distance

  7. What Metrics Not to Use

  8. Omitting Class Labels - Using KMeans Clustering

안내형 프로젝트 진행 방식

작업 영역은 브라우저에 바로 로드되는 클라우드 데스크톱으로, 다운로드할 필요가 없습니다.

분할 화면 동영상에서 강사가 프로젝트를 단계별로 안내해 줍니다.

검토

ANALYZE TEXT DATA WITH YELLOWBRICK의 최상위 리뷰

모든 리뷰 보기

자주 묻는 질문

자주 묻는 질문

궁금한 점이 더 있으신가요? 학습자 도움말 센터를 방문해 보세요.