This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.
Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.\n\nSee the videos for general presentation, but use the energy on the excersizes.
교육 기관: Narin P•
The course is very helpful when it comes to exploring commonly used R packages and learning certain best practices involved in data cleaning. I'd definitely recommend it to any data science enthusiasts. One area with slight scope for improvement could be the final project. The instructions are quite open to interpretation, which means that the final grade which you get via peer review is always going to be debatable. Other than that, I have no complaints whatsoever :)
교육 기관: Bantwale D E•
This course is really a challenging and compulsory for any one who wants to be a data scientist or working in any sort of data. It teaches you how to make very palatable data-set fro ma messy data.
교육 기관: Anton•
great course, requires a little bit of programming background with no rigid specifics though.
교육 기관: Alessandro V•
I found this course very useful for my learning needs, nevertheless I have a remark about this course. The timing estimation provided for each section are quite inaccurate, for instance: 3h for a swirl exercise are really excessive, may be 45 minutes are more realistic, but the main problem is related to time underestimation ! I mean, especially for the final assignment I spent more than 20h for completion and part of this time has been used to convince myself that a negative standard deviation was acceptable for the assignment goals. The provided estimation instead is 2h (<< 20h !!)
교육 기관: Pamela M•
I would have given just one star except the swirl() assignments are actually very good. The videos are just a (poorly) narrated glossary. Topics I learned in another course were presented here in such I way I actually got confused. Can you imagine? my knowledge was actually worsened, not improved by thus course. (!!) // If the swirl() functions were made the centerpiece of the course, and the videos were described as just a narrated glossary, at least our expectations would be in line with reality. // Even so, I come to Coursera because I WANT to be taught by an instructor. If I'd wanted a curated list of tutorials so I could teach myself, I would have done that already. Anyone who pays for this should get their money back. NOT recommended for beginners. // I going to complete it because I'm stubborn that way, but it is an unpleasant experience for me and everyone within earshot as I have to vent my frustration often just to make it through. // After week 2 I resorted to just reading the pdf of the slides and stopped watching the videos. The videos added NOTHING to my understanding. More often than not they put me to sleep. And what's worse, the narrator mispronounces "attribute". There IS a difference. I atTRIbute certain ATtributes to native speakers who mispronounce important vocabulary.
교육 기관: Liam C•
Week 1 and 2 are completely worthless. They're cursory 5-10m introductions to topics that show you HOW to start to do something, but don't explain any commands or what is going on, it's just instructions to follow. This leaves you completely unprepared to do any actual work. Then you get the assignments and you basically have to go learn everything independently. The course info is useless. I skipped these. When I want to do the type of work they cover, I'll watch some tutorials and read documentation to actually learn it. They need to focus in on one or two topics (e.g. APIs, MySQL) and actually teach you the basics of them. The lecture videos even use weird syntax without explanation (e.g. using = instead of <-. Using par(), etc.).
Like the other courses in this specialization, you'll spend almost all of your time learning independently, and not using any of the materials provided. The discussion board is sometimes useful, but you can see how little work is done to improve the course there, as people point out errors and issues which are still outstanding months/years later.
교육 기관: Md. Z M•
Pros: After putting in many hours of effort in understanding the problem statement and then actually solving it, the sense of achievement is fulfilling. I learnt a lot of skills in this course. Those skills are very important to understand the data before start doing the analyses, but are usually ignored when data science is taught to a beginner.
Cons: The course project is extraordinarily difficult and you won't get any help from the discussion forums as there are no TAs live. However, there are some threads that can help understand the problem statement. So, sift through the thread dump to find the topics relevant to you.
The quality of the video lectures are very bad; many of the packages referenced in the lectures are outdated, and require you to search for its alternative on your own, which is helpful in the long run, but demands many hours of googling and reading through the documentations.
Overall, I would recommend this course for understanding the skills required in data cleaning.
교육 기관: Neil J•
R is really just the worst, and the instructors do not make it better. The code in this class is unreadable:
- too many one liners, because "it's faster to write", though harder for other people to read
- variables are named cryptic things like spIns or x, rather than names with meaning (eg, sprays.by.insect), again "because it's faster to type"
- way too many cases of "there is more than one way to do it", which just makes things confusing because the other ways tend not to be equivalent
What I'm most concerned about is that I've seen lots of poorly written code in many different languages: Java, C++, C, Python, Perl, and now R. But I've also seen really well-written code in all the languages *but* R, I have yet to see any code in R that is flexible, maintainable, and clear. Which leads me to think that no such code exists, or it's so rare that it doesn't matter. It is clear to me that if I am to do data analysis, then I will need a different set of tools; but because this specialization is taught entirely around R (the lectures are about R, not about higher-level concepts), then this specialization is not useful to me.
교육 기관: jake s•
There is a lot of fluff in this course and at the same time it assumes that you have knowledge and skills that are not covered in this course or in the previous two (e.g. github). I'm really disappointed in the quality of this course--specifically at how vague many of the instructions were in the quiz questions and the final project-- and that most the time when explanations were asked for on the message board the professors just did some hand waving and said that figuring it out was part of the assignment. That isn't teaching (online or otherwise). And if your instructions aren't clear, you aren't doing the job of an instructor when you pass the buck and try to sell it as "part of the learning experience." I hope this fall off in quality isn't reflective of the rest of the courses in the data spec.
교육 기관: Lindsay E M•
The first two courses in this specialization were good, but the third course, Getting and Cleaning Data, was honestly very disappointing. The lectures are extremely out of date (made in 2013, and it's already June 2020...), and a lot of the code in the lectures and examples no longer works correctly because of this. Beyond that, the "updates" posted by the mentors in the discussion forums are also out of date (2016) and have limited usefulness. This is a course that is meant to teach you how to acquire and clean data in the R program, and methods and technology from 7 years ago are not the standard that I expected - technology constantly changes and updates, and this course should reflect that (but clearly doesn't).
교육 기관: Christian B•
No idea what they want for the project and the discussion forum is clogged with people asking for peer reviews. The previous courses at least provided you with a understanding of what the final product should be, in this case it's make tidy data, but with no idea on how that data should look.
교육 기관: Ramalakshmanan S P•
Thanks for this wonderful session on Getting and Cleaning Data. I would like to convey my sincere thanks to Professors Roger D. Pend, Brian D. Caffo and Jeff Leek and my fellow learners for their excellent help in completing the projet to generate Tidy dataset. I would like to name Mr. Luis Sandino for his help and effort in putting a help Guide for this assignment. I follwed it and got the assignment completed. The step by step procedure helped me and other fellow learnerrs to complete the assignment on time.
Though this course is over, still we have the doubt on the dimension of the tidy dataset, whether it is 180 by 68 or 180 by 88 as the total number of "mean' variables considered are varying. Request mentors or TAs to help us arrive at the correct dimension and help us understand the reason behind the same.
This course has witnessed the need for support from TAs and mentors. Their help and support was very valuable in understanding the subject.
Thanks to Coursera, my Professors, mentors and TAs of this course for their insight, guidance, support and effort.
Wishing Coursera and Professors all the best and Success.
The SWIRL component for learning the subject is the best and wish SWIRL support for all the heavy courses. Special thanks to those who made SWIRL course material possible for Data Scientisit's toolbox.
With Best Wishes,
교육 기관: Carlos C•
Excellent course to build upon the knowledge from the "R Programming" course. Learning to use functions from the Tidyverse packages is an essential tool if you want to learn Data Science in R. In my opinion, most of the time these are stronger and easier to use compared to Pandas, Numpy, etc., from Python. Despite the bad reviews at the top with lots of upvotes, I do think this was a great course overall. People tend to complain and don't assume responsibility to work and find solutions if they don't understand something. My humble advice is that, if you wish to immerse in the Data Science field, you should accustom yourself to researching a lot, going to other forums like StackOverflow if an error appears, etc. Thanks to Jeff Leek, Roger Peng and the others from Johns Hopkins University!
교육 기관: Pouria T•
Thank you for giving me opportunity to learn. These material (or this class) would have been super difficult, if it was taught through the same traditional channels based on my academical experiences. Yet, the materials were presented in such an amazing way that I wasn't taken over by the difficulty of the presented subjects, rather I was getting more focused to learn more and to be challenged. Thank you for letting me get 3 free online certificates. It means a lot to me and it has given me hope through this difficult time. I feel accomplished. It's a great feeling and it the best and the only gift that I have received and would probably receive this holiday.
교육 기관: Alfonso R R•
I learned so much of R with this course. Thanks Johns Hopkins. Thanks Coursera.
The course final project was so challenging that made research R tools I did know they existed. Such as generating MD files from RMD markdown notebooks, so I could mix live code with text. That's how I produced my CodeBook.md. Then I learned that there are a bunch of libraries for pretty-printing tables. I discovered even more about dplyr. And also learned how to return multiple objects from a function.
You can really write papers with all these tools in R and getting expertise about knitr and pandoc.
Thank you Jeff and team for putting together such a quality course.
교육 기관: David B•
Before taking "Getting and Cleaning Data", I had no prior R programming experience aside from completing the R programming course in the data science specialization on Coursera. I found this course to be challenging and that it covered quite a bit of ground in terms of the "getting data" more so than the cleaning data. After completing this course, I feel like I learned quite a bit more R programming and the basic knowledge for obtaining data from a variety of sources/formats and cleaning it up to make it look nice and tidy. Overall, I rate this course very positively!
교육 기관: Óscar A V R•
The course is great and useful. In my personal experience, this course were so important as R programming course, since on this course one get the essence of R and the hardest process when deal whith real cases. I could see that the videos has ensured about velocity and audibility; when I took it, it was difficult to heard and has a so high velocity.
I want contribute as beta tester, and will try to follow all the course, at my own pace giving feedback in thankful to you for the opportunity you gave me to learn free.
교육 기관: Joe D•
Forums! Use the forums. Read them before you start the week's lectures because they often include pinned topics that correct minor errors like broken links and outdated commands, as well as interesting and thoughtful supplementary material. Overall this was a very enjoyable course, Dr. Leek's lectures are straightforward and full of useful examples. I learned about just some of the power of "The Tidyverse" through this course and I'm very grateful for that.
교육 기관: Chetan T•
The journey through the entire course was quite exceptional for me. It was great to hone the skills of programming and especially in this digital world where data is key for every analysis, inference, prediction and what not! When everyone looks at neat and tidy data that one can rely on, it is extremely important to understand and know the finer nuances of what it takes to get a nice and efficient dataset and that is the essence of this course.
교육 기관: Whitchurch S R M•
This was an awesome course.
I really liked the final project.
Especially creating a Codebook as well as tidying up the data.
I feel I went too much in-depth into creating the codebook as well as the readme file. But in hindsight it was totally worth it.
My advice to future learners. push yourselves to the limit when doing the final project. You will definitely learn much much more by putting in 110% into these hard projects.
교육 기관: Alexis C•
Did not like this class when I was taking it, but now (just completed course 7) I realize how very important this class is. "Messy data" use to sound like a buzz phrase to me that people used when they could not generate valuable insights from data made available. Now I realize that that the base R functions and packages highlighted in this class are extremely useful when you need to clean up data in a reproducible way.
교육 기관: Jose A R N•
My name is Jose Antonio from Brazil. I am looking for a new Data Scientist career.
Please, take a look at my LinkedIn profile: https://www.linkedin.com/in/joseantonio11
I did this course to get new knowledge about Data Science and better understand the technology and your practical applications.
The course was excellent and the classes well taught by teachers.
Congratulations to Coursera team and Instructors.
교육 기관: Yusuf E•
The level of difficulty of this course is on par with R Programming. For the first time in the specialization you will find yourself scouring the forum for tips and suggestions on how to proceed when you get stuck in the quizzes. Fortunately, the mentors are really helpful when it comes to answering questions or clearing obscurities. I really liked this course, in fact much more than R Programming.
교육 기관: Antonios D•
This course it's a great job! There is too much information in here and a great amount of knowledge. I would like to say that in my point of view the current lesson should be updated in more different data sets examples that gives the students the opportunity of learning different kind of ways to manypulate some data. There are some standard ways so it would be great if you expand this.
교육 기관: Chris B•
It is sometimes daunting and difficult, but now I do understand so much more about downloading files from remote sites and getting them ready for analysis. What I should have done is look to the final project so as get a better understanding of what the project entailed. I also should have done more work replicating the code used in the lessons so as to appreciate how it worked.