Key Skills for a Data Scientist
posted by Ashutosh Nandeshwar on April 06, 2015 in Converge Blog
In the analytics world, you are asked this question quite often: “Which skills a data scientist must have?” Elsewhere, I have emphasized the mindset and softer characteristics more than the technical skills, and after reading that post, you may ask “Aren’t technical skills more important than the softer skills?” Yes, they are important because a data scientist would be unable to do her job if she didn’t have the technical skills, but she would be unable to succeed if she didn’t have the softer skills. Plus, one can almost be trained in technical skills, but it is very hard to extrinsically cultivate the softer skills. When you are looking for data scientists, before you are able to asses the mindset of a candidate, you must look for the proof of technical knowledge to do data science. A data scientist must show skills in these following areas:
Whether you call it data mining, machine learning or applied statistics, these skills lay the foundation of good analysis. Data mining is a general name for the process of finding patterns from data. Machine learning is a field of computer science that focuses on using various pattern detection algorithms. Some of the machine learning algorithms are association rules, nearest neighbors, decision trees, random forests, Bayesian methods and neural networks. Some methods from the applied statisticsfield have also made their way into machine learning. Multiple linear regression, logistic regression, and Bayesian methods are the most used techniques from the applied statistics field.
In this infographics crazy world, it is easy to dismiss graphics. I know I do. Bad data visualizations take up the whole space to describe a very few data points (think people, flags, buildings, exploding pie charts), whereas, good data visualizations get out of your way and actually show the underlying data (think tables, simple charts, patterns). If carefully crafted, data visualizations can tell powerful stories. The key is to avoid the trap of making it overly beautiful but hardly actionable. I believe it wasNoah Iliinsky, a data visualization expert, who said that “data visualizations are advertisements, and not art.” Your main objectives are: make the visualizations tell your story, let the data/patterns stand out, and do not distract the reader. If you follow the principles of effective data visualizations, you are more than likely to make your visualizations actionable, yet good looking.
Of all the other processes, data gathering and manipulation takes the most time in an analytics project. If you are unable to get the required data in a structure suitable for analysis, you spend even more time manipulating the data. SQL is handy in such cases. Most likely your data is stored in some database management system, such as Oracle or Microsoft SQL server. There are three things you must know to efficiently get the data out of such systems:
If you or your team members have sound knowledge of the above areas, you are in a good position to generate quality analysis. If you want to learn more about these areas or want a complete data science training guide, get this free report.
Bonus items: a sample job description and how to build a case for an analytics team,Free Analytics Report