About this Course
최근 조회 10,267

다음 전문 분야의 5개 강좌 중 4번째 강좌:

100% 온라인

지금 바로 시작해 나만의 일정에 따라 학습을 진행하세요.

유동적 마감일

일정에 따라 마감일을 재설정합니다.

완료하는 데 약 12시간 필요

권장: 3 hours/week...

영어

자막: 영어

귀하가 습득할 기술

Data AnalysisPython ProgrammingMachine LearningExploratory Data Analysis

다음 전문 분야의 5개 강좌 중 4번째 강좌:

100% 온라인

지금 바로 시작해 나만의 일정에 따라 학습을 진행하세요.

유동적 마감일

일정에 따라 마감일을 재설정합니다.

완료하는 데 약 12시간 필요

권장: 3 hours/week...

영어

자막: 영어

강의 계획 - 이 강좌에서 배울 내용

1
완료하는 데 5시간 필요

Decision Trees

In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.

...
7 videos (Total 40 min), 15 readings, 1 quiz
7개의 동영상
Machine Learning and the Bias Variance Trade-Off6m
What Is a Decision Tree?5m
What is the Process of Growing a Decision Tree?4m
Building a Decision Tree with SAS9m
Strengths and Weaknesses of Decision Trees in SAS4m
Building a Decision Tree with Python9m
15개의 읽기 자료
Some Guidance for Learners New to the Specialization10m
SAS or Python - Which to Choose?10m
Getting Started with SAS10m
Getting Started with Python10m
Course Codebooks10m
Course Data Sets10m
Uploading Your Own Data to SAS10m
Data Set for Decision Tree Videos (tree_addhealth.csv)10m
SAS Code: Decision Trees10m
CART Paper - Prevention Science10m
Python Code: Decision Trees10m
Installing Graphviz and pydotplus10m
Getting Set up for Assignments10m
Tumblr Instructions10m
Assignment Example10m
2
완료하는 데 3시간 필요

Random Forests

In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data.

...
4 videos (Total 25 min), 4 readings, 1 quiz
4개의 동영상
Building a Random Forest with SAS7m
Building a Random Forest with Python6m
Validation and Cross-Validation7m
4개의 읽기 자료
SAS code: Random Forests10m
The HPForest Procedure in SAS10m
Python Code: Random Forests10m
Assignment Example10m
3
완료하는 데 3시간 필요

Lasso Regression

Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your model’s test error rate. To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations.

...
5 videos (Total 32 min), 3 readings, 1 quiz
5개의 동영상
Testing a Lasso Regression with SAS10m
Data Management for Lasso Regression in Python3m
Testing a Lasso Regression Model in Python10m
Lasso Regression Limitations2m
3개의 읽기 자료
SAS Code: Lasso Regression10m
Python Code: Lasso Regression10m
Assignment Example10m
4
완료하는 데 3시간 필요

K-Means Cluster Analysis

Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis. You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.

...
6 videos (Total 42 min), 3 readings, 1 quiz
6개의 동영상
Running a k-Means Cluster Analysis in SAS, pt. 18m
Running a k-Means Cluster Analysis in SAS, pt. 26m
Running a k-Means Cluster Analysis in Python, pt. 18m
Running a k-Means Cluster Analysis in Python, pt. 210m
k-Means Cluster Analysis Limitations2m
3개의 읽기 자료
SAS Code: k-Means Cluster Analysis10m
Python Code: k-Means Cluster Analysis10m
Assignment Example10m
4.2
49개의 리뷰Chevron Right

33%

이 강좌를 수료한 후 새로운 경력 시작하기

22%

이 강좌를 통해 확실한 경력상 이점 얻기

Machine Learning for Data Analysis의 최상위 리뷰

대학: MGJan 16th 2019

A good introduction to Machine Learning. Makes me curious to know about the methods that are available outside of this course. Great material as usual.

대학: BCOct 5th 2016

Very good course. I recommend to anyone who's interested in data analysis and machine learning.

강사

Avatar

Jen Rose

Research Professor
Psychology
Avatar

Lisa Dierker

Professor
Psychology

웨슬리언 대학교 정보

At Wesleyan, distinguished scholar-teachers work closely with students, taking advantage of fluidity among disciplines to explore the world with a variety of tools. The university seeks to build a diverse, energetic community of students, faculty, and staff who think critically and creatively and who value independence of mind and generosity of spirit. ...

Data Analysis and Interpretation 전문 분야 정보

Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions. The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world's biggest social challenges by joining one of their competitions, or help The Connection better understand recidivism risk for people on parole in substance use treatment. Regular feedback from peers will provide you a chance to reshape your question. This Specialization is designed to help you whether you are considering a career in data, work in a context where supervisors are looking to you for data insights, or you just have some burning questions you want to explore. No prior experience is required. By the end you will have mastered statistical methods to conduct original research to inform complex decisions....
Data Analysis and Interpretation

자주 묻는 질문

  • 강좌에 등록하면 바로 모든 비디오, 테스트 및 프로그래밍 과제(해당하는 경우)에 접근할 수 있습니다. 상호 첨삭 과제는 이 세션이 시작된 경우에만 제출하고 검토할 수 있습니다. 강좌를 구매하지 않고 살펴보기만 하면 특정 과제에 접근하지 못할 수 있습니다.

  • 강좌를 등록하면 전문 분야의 모든 강좌에 접근할 수 있고 강좌를 완료하면 수료증을 취득할 수 있습니다. 전자 수료증이 성취도 페이지에 추가되며 해당 페이지에서 수료증을 인쇄하거나 LinkedIn 프로필에 수료증을 추가할 수 있습니다. 강좌 내용만 읽고 살펴보려면 해당 강좌를 무료로 청강할 수 있습니다.

궁금한 점이 더 있으신가요? 학습자 도움말 센터를 방문해 보세요.