강화 학습 전문 분야
강화 학습에 대한 개념 이해. Implement a complete RL solution and understand how to apply AI tools to solve real-world problems.
이 전문 분야 정보
배울 내용
Build a Reinforcement Learning system for sequential decision making.
Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning
귀하가 습득할 기술
100% 온라인 강좌
유동적 일정
중급 단계
Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode
완료하는 데 약 2개월 필요
영어
전문분야 이용 방법
강좌 수강
Coursera 특화 과정은 한 가지 기술을 완벽하게 습득하는 데 도움이 되는 일련의 강좌입니다. 시작하려면 특화 과정에 직접 등록하거나 강좌를 둘러보고 원하는 강좌를 선택하세요. 특화 과정에 속하는 강좌에 등록하면 해당 특화 과정 전체에 자동으로 등록됩니다. 단 하나의 강좌만 수료할 수도 있으며, 학습을 일시 중지하거나 언제든 구독을 종료할 수 있습니다. 학습자 대시보드를 방문하여 강좌 등록 상태와 진도를 추적해 보세요.
실습 프로젝트
모든 특화 과정에는 실습 프로젝트가 포함되어 있습니다. 특화 과정을 완료하고 수료증을 받으려면 프로젝트를 성공적으로 마쳐야 합니다. 특화 과정에 별도의 실습 프로젝트 강좌가 포함되어 있는 경우, 다른 모든 강좌를 완료해야 프로젝트 강좌를 시작할 수 있습니다.
수료증 취득
모든 강좌를 마치고 실습 프로젝트를 완료하면 취업할 때나 전문가 네트워크에 진입할 때 제시할 수 있는 수료증을 취득할 수 있습니다.

이 전문 분야에는 4개의 강좌가 있습니다.
Fundamentals of Reinforcement Learning
Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making. This course introduces you to the fundamentals of Reinforcement Learning. When you finish this course, you will: - Formalize problems as Markov Decision Processes - Understand basic exploration methods and the exploration/exploitation tradeoff - Understand value functions, as a general-purpose tool for optimal decision-making - Know how to implement dynamic programming as an efficient solution approach to an industrial control problem This course teaches you the key concepts of Reinforcement Learning, underlying classic and modern algorithms in RL. After completing this course, you will be able to start using RL for real problems, where you have or can specify the MDP. This is the first course of the Reinforcement Learning Specialization.
Sample-based Learning Methods
In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna
Prediction and Control with Function Approximation
In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that estimating value functions can be cast as a supervised learning problem---function approximation---allowing you to build agents that carefully balance generalization and discrimination in order to maximize reward. We will begin this journey by investigating how our policy evaluation or prediction methods like Monte Carlo and TD can be extended to the function approximation setting. You will learn about feature construction techniques for RL, and representation learning via neural networks and backprop. We conclude this course with a deep-dive into policy gradient methods; a way to learn policies directly without learning a value function. In this course you will solve two continuous-state control tasks and investigate the benefits of policy gradient methods in a continuous-action environment. Prerequisites: This course strongly builds on the fundamentals of Courses 1 and 2, and learners should have completed these before starting this course. Learners should also be comfortable with probabilities & expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), and implementing algorithms from pseudocode. By the end of this course, you will be able to: -Understand how to use supervised learning approaches to approximate value functions -Understand objectives for prediction (value estimation) under function approximation -Implement TD with function approximation (state aggregation), on an environment with an infinite state space (continuous state space) -Understand fixed basis and neural network approaches to feature construction -Implement TD with neural network function approximation in a continuous state environment -Understand new difficulties in exploration when moving to function approximation -Contrast discounted problem formulations for control versus an average reward problem formulation -Implement expected Sarsa and Q-learning with function approximation on a continuous state control task -Understand objectives for directly estimating policies (policy gradient objectives) -Implement a policy gradient method (called Actor-Critic) on a discrete state environment
A Complete Reinforcement Learning System (Capstone)
In this final course, you will put together your knowledge from Courses 1, 2 and 3 to implement a complete RL solution to a problem. This capstone will let you see how each component---problem formulation, algorithm selection, parameter selection and representation design---fits together into a complete solution, and how to make appropriate choices when deploying RL in the real world. This project will require you to implement both the environment to stimulate your problem, and a control agent with Neural Network function approximation. In addition, you will conduct a scientific study of your learning system to develop your ability to assess the robustness of RL agents. To use RL in the real world, it is critical to (a) appropriately formalize the problem as an MDP, (b) select appropriate algorithms, (c ) identify what choices in your implementation will have large impacts on performance and (d) validate the expected behaviour of your algorithms. This capstone is valuable for anyone who is planning on using RL to solve real problems. To be successful in this course, you will need to have completed Courses 1, 2, and 3 of this Specialization or the equivalent. By the end of this course, you will be able to: Complete an RL solution to a problem, starting from problem formulation, appropriate algorithm selection and implementation and empirical study into the effectiveness of the solution.
앨버타 대학교 정보
Alberta Machine Intelligence Institute 정보
자주 묻는 질문
환불 규정은 어떻게 되나요?
하나의 강좌에만 등록할 수 있나요?
네! 시작하려면 관심 있는 강좌 카드를 클릭하여 등록합니다. 강좌를 등록하고 완료하면 공유할 수 있는 인증서를 얻거나 강좌를 청강하여 강좌 자료를 무료로 볼 수 있습니다. 전문 분야 과정에 있는 강좌에 등록하면, 전체 전문 분야에 등록하게 됩니다. 학습자 대시보드에서 진행 사항을 추적할 수 있습니다.
재정 지원을 받을 수 있나요?
해당 강좌를 무료로 수강할 수 있나요?
이 강좌는 100% 온라인으로 진행되나요? 직접 참석해야 하는 수업이 있나요?
이 강좌는 100% 온라인으로 진행되므로 강의실에 직접 참석할 필요가 없습니다. 웹 또는 모바일 장치를 통해 언제 어디서든 강의, 읽기 자료, 과제에 접근할 수 있습니다.
전문 분야를 완료하는 데 얼마나 걸리나요?
It is recommended that learners take between 4-6 months to complete the specialization.
What background knowledge is necessary?
Recommended that learners have at least one year of undergraduate computer science or 2-3 years of professional experience in software development. Experience and comfort with programming in Python required. Must be comfortable converting algorithms and pseudocode into Python. Basic understanding of concepts from statistics (distributions, sampling, expected values), linear algebra (vectors and matrices), and calculus (computing derivatives)
Do I need to take the courses in a specific order?
Yes, it is recommended that courses are taken sequentially.
전문 분야를 완료하면 대학 학점을 받을 수 있나요?
Learners that complete the specialization will earn a Coursera specialization certificate signed by the professors of record, not a University of Alberta credit.
What will I be able to do upon completing the Specialization?
By the end of this specialization, you will be able to"
- Build a Reinforcement Learning system for sequential decision making.
- Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
- Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
- Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning
궁금한 점이 더 있으신가요? 학습자 도움말 센터를 방문해 보세요.






