Chevron Left
Getting and Cleaning Data(으)로 돌아가기

존스홉킨스대학교의 Getting and Cleaning Data 학습자 리뷰 및 피드백

4.6
별점
7,430개의 평가
1,189개의 리뷰

강좌 소개

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

최상위 리뷰

HS

May 03, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

BE

Oct 26, 2016

This course is really a challenging and compulsory for any one who wants to be a data scientist or working in any sort of data. It teaches you how to make very palatable data-set fro ma messy data.

필터링 기준:

Getting and Cleaning Data의 1,149개 리뷰 중 951~975

교육 기관: Greg B G

Oct 03, 2016

very good

교육 기관: Khobindra N C

May 18, 2016

Excellent

교육 기관: Luiz C

Sep 16, 2017

Useful

교육 기관: Nithya M

Jun 23, 2017

Great!

교육 기관: Shreya S

Mar 20, 2017

great

교육 기관: Veena M

Feb 21, 2016

good.

교육 기관: Chuang M

Feb 07, 2016

Great

교육 기관: Marcos A

May 12, 2019

Good

교육 기관: Nithin K G

Nov 16, 2018

Good

교육 기관: Ashish S

Feb 06, 2017

good

교육 기관: shipra g

Aug 08, 2016

Good

교육 기관: Vivekananda R C

Feb 08, 2016

Good

교육 기관: Dimitri d

Feb 23, 2017

.

교육 기관: Sergio R

Nov 27, 2016

4

교육 기관: sugyoo

Oct 23, 2016

A

교육 기관: Borja C

May 07, 2016

.

교육 기관: Raw N

Apr 07, 2017

Would have preferred if there were programming assignments that incorporated reading from data sources on the web.

For those planning to take the course, note the following:

*The course covers reading data from a myriad of sources, but largely in passing superficial detail. These sources include XML files, mySQL databases, HDF5 files, csv files, txt files with various formats (for example fixed-with files), JSON objects, and web API.

However, the course project only involves reading data from several txt files and combining them into a single R dataset.

Course topic order: In the first two weeks of the course, a lot of information is glossed over in passing- this information involves reading from the various file formats mentioned above. Week 3 involves subsetting, sorting, reshaping and merging data. Some of this may be review for you if you've taken the R programming course or the "R Programming Environment" course in the "Mastering Software Development in R" specialization. Week 4 involves string manipulation, regular expressions and working with the Dates. A lot of this is covered in Roger Peng's ebooks "R Programming for Data Science" and "Mastering Software Development in R" (both are freely available- google them).

Assessments: The only assessments in the course are 4 quizzes- each of which involves about 5 short programming exercises- and a final project which only involves topics from weeks 3 and 4 (specifically- subsetting data, sorting data, reshaping data, and working with regular expressions). So you can do the course project without understanding anything covered in weeks 1 and 2 of the course.

Mentor David Hood is fantastic for providing valuable resources to aid you with each assessment and so is Xing Su for providing a complete set of course notes. USE THE DISCUSSION FORUMS IF YOU GET STUCK!

교육 기관: Miguel C

Mar 25, 2020

I really enjoyed and learned a lot in this course.

I feel a lot more comfortable with looking for and reading data. I learned how to clean data and getting it ready for further analysis. I think the course project was particularly good for completely understanding the process of tidying data and all the aspects it involves, such as writing a code book and a README file for accompanying it.

Furthermore, I believe I further developed my R programming skills, by learning how to code new things or things I already knew but in a more efficient way, by using new packages and techniques.

Moreover, I found Professor Jeffrey Leek quite engaging, very easy to understand and I had complete confidence in his knowledge on this subject.

However, I believe the course is slightly outdated. I was often disheartened and frustrated by not being able to replicate what was being done in the lecture videos. For example, there were many links that did not work anymore and sometimes information that simply wasn't correct anymore. I found the discussion forums and many mentors responses to be very helpful. I think this can easily be fixed by writing up an errata or updating the lecture videos.

교육 기관: Tomasz J

Dec 11, 2017

The course is teaches you some principles of tidy data and cleaning data but it's very messy.

There is no systematic approach to plyr and dplyr libraries. The teachers peak some functions from one and some functions from other library, but without any clear principle. It looks that prof. Leek and prof. Peng are presenting their favorite functions without consulting each other. They are doing it in the right way, though very confusing. On the other hand loading data is approached very encyclopedically.

Assignments not only check what was taught in the videos but also sometimes require new skills and going through stackoverflow etc. (e.g. codes to read fortran files). This is not the way how you construct good coursers.

Additionally some instruction in the final assignment are provided in submission part! (Expected names of the files should be provided in the assignment description, not on the submission page).

Prof. Peng and prof. Leek are very skilled, they know their job, but they don't know how to teach efficiently. Nonetheless if you are motivated to learn, this course may be very helpful.

교육 기관: Guillermo A G

Oct 23, 2017

This course was quite challenging in comparison with the first two. I felt that the material provided by the instructors was not enough to approach the quizes and assignments, so it's necessary to spent a lot of time researching for your own in other sources. I struggled with the Course Project Assignment because I didn't understand what I was supposed to do exactly. Fortunately, the forum threads were really helpful. Nevertheless, the course's intention is very valuable and if you are patient and go all the way through it you will improve your data science skills, learn very useful techniques and habits, specially if you're a beginner. But I strongly suggest the instructors to make the course contents more explicit and helpful.

교육 기관: John Y

Aug 05, 2017

Great class for an important piece of data analytics and data science. One issue I've been noticing with R compared to using Anaconda/Python is that a lot of the libraries required for the class aren't explicitly mentioned. That's fine if you're experienced with these environments and able to read error codes with familiarity. Minor annoyance to me when I run a script and realize I don't have a library installed.

I'd imagine though its extremely frustrating for beginners who may have written perfectly good code but haven't figured out that they simply need to install certain packages to answer quiz or homework questions. Perhaps having a full library or package list for Course 1 of this series will be helpful.

교육 기관: Joshua S

Jul 10, 2020

Lecture content is very drab and filled with "...and here's another thing you can do...". I think it would be a lot more effective with more problems and various solutions. There should be a project every week along with the quizzes.

I also found the peer review process for the course project to be sub-par. I personally put a LOT of work into the course project, and put together what was a really thoughtful and well-written README and CODEBOOK just to have my project downgraded because someone wasn't sure the run_analysis did what it was supposed to. It's not a big deal but I think if an instructor saw it they would've found it very thoughtful, thorough and accurate.

교육 기관: Anton K

Aug 08, 2019

This is a very brief course, many of the topics deserve a much more thorough explanations. This part of the data analysis (i.e. data cleaning and acquisition) is in fact a complex subject and subjects are not covered in this course. There were also technical issues. For instance, the audio quality of lectures by prof. Jeff Leek is very poor. And the other major problem that I had with this course is the ambiguity of the requirements, although it wasn't difficult to finish. But if you are planning on taking this course, be ready to spend considerable amount of time on understanding the structure of the final submission's materials.

교육 기관: Harris W

May 27, 2020

Like all courses in this specialization, there is an incredible lack of practice and application for a large amount of the skills taught in lecture. I would say that only about 60 percent, or maybe less, of the content in the lecture is assessed in the quizzes and assignments. Additionally, the peer review process is wildly flawed for the final project. I did not receive any constructive criticism from anyone, and I doubt they even truly looked at my code to make sure it worked. I don't blame them, they have little incentive. Rather I place blame on the system of grading and lack of feedback.

교육 기관: Rashaad J

Jul 24, 2017

I have 2 key concerns with this course. First, I don't feel like the material presented adequately prepares you for the quizzes. For at least 2 of the 4 quizzes, I had to spend a substantial amount of time locating and reviewing other resources to answer the questions. My second concern is that for the final course assignment, there is a lack of specificity with the instructions. Not being able to answer a question is vastly different from not understanding what the question is asking and I found myself spending more time doing the latter (which is wasteful) and less time with the former.