Back to Browse

A Study Pathway for Data Science in 2020 (7 Steps)

21.5K views
Mar 23, 2020
16:28

Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I lay out a seven step pathway to becoming a successful and effective data scientist in the year 2020. I have laid out in another video that data science can be described as an intersection between: statistics, programming, communication, and domain knowledge. But that's just the high level overview, and it leaves open the question of: what are the MOST important skills to know, and what's an order of priority? I do believe there are some universal items that EVERY data scientist must know. However, once some core fundamentals are put into place, there is some flexibility based on the breadth in the field and the flavor of data scientist that one wants to become. The first three steps are universal; the next four are in a rough order of priority but can be rearranged. 1. Statistics Statistics will truly give you the "what" of data science, while domain knowledge provides the "why" and programming languages provide the "how". It is necessary because without it, you won't have the tools you need to creatively handle complex problems or make proper conclusions. In short, you should know fundamentals like probability, distributions, Bayes' rule, confidence intervals, and hypothesis testing. You need to be able to reason your way through problems, and that comes by knowing concepts like confounding variables, Simpson's paradox, assumptions of tests, bias, and variance. You also need to know statistical tests, models, and survival analysis. Coursera courses: Duke: https://www.coursera.org/specializations/statistics John Hopkins: https://www.coursera.org/specializations/jhu-data-science University of Amsterdam: https://www.coursera.org/specializations/social-science 2. SQL SQL should take priority over R or Python because first of all, it's easier. Also, it prepares you for the real world where data is truly messy and lives in a variety of environments. Additionally, work you do in R or Python tends to live downstream, and you can only start after you've used SQL to create a clean working dataset. You don't need to be a SQL guru, but you need to know how to query your data, join, use case when/exist statements, window functions, nested queries, etc. 3. One of R or Python Pick one of these two and master it. If you are coming from a statistics background that will probably be R; if you are coming from a computer science background that will probably be Python. It is less important which one you pick and more that you master one of them rather than being mediocre at two things. You want to know one of these from beginning to end: fundamentals of the language, how to tidy and manipulate your dataset, how to create visualizations, reports, models, etc. You also have key data science packages with which to familiarize yourself. If you're learning R, a good starting point is the Tidyverse. If you're learning Python, you want to know NumPy, Pandas, MatPlotLib, Seaborn, Scikit-Learn, and StatsModels. At this point, if you know all three of the above, you will be very employable. But there is still much more to learn. The following order is my recommendation but you can rearrange. Good book for R: https://amzn.to/3je8kK6 Python data science book: https://amzn.to/3cDXKcE 4. The other of R or Python If you know BOTH R and Python it will be irrelevant that some companies stick to one infrastructure. This will make you massively employable. 5. Linear algebra This helps you to innovatively create your own solutions. There is also massive crossover benefit to understanding statistics and machine learning. I recommend this book: https://amzn.to/2HEj4U4 6. UX/design principles This will improve your ability to communicate with your client and create solutions (reports, visualizations, apps, etc.) that are useful for your actual user. I highly recommend the books "The Visual Display of Quantitative Information" by Edward Tufte or "Show Me the Numbers" by Stephen Few to better understand graphical principles. Tufte book: https://amzn.to/3kVrR2o Few book: https://amzn.to/3n2qTTU 7. Machine learning I am saving this for last because it requires knowledge of other topics on here (statistics, R/Python, and linear algebra) and the importance is overstated. But it's undeniable the big push many firms are doing into this space. I highly recommend Andrew Ng's Coursera course: https://www.coursera.org/learn/machine-learning This is not an exhaustive list of all items that are important for data science. But if you know the first three -- or certainly all seven -- then I can virtually guarantee you will be an extremely marketable and successful data scientist. PayPal: [email protected] Patreon: https://www.patreon.com/richardondata BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8 LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32

Download

0 formats

No download links available.

A Study Pathway for Data Science in 2020 (7 Steps) | NatokHD