Quality over quantity: building the perfect data science project

  1. build your original project
  2. your project should demonstrate as many skills as possible. Do not use just sklearn and pandas and do not show projects with titanic or mnist dataset. they do harm to your resume. everybody is using them. Use Web-scraping

  3. do good exploratory data analysis. an ideal project is one that demonstrates not only that you’re able to answer important data science questions, but that you’re capable of asking them, too very often, you’ll have a bunch of (dirty) data, and you’ll need to figure out what to do with it to generate value for your company.
  4. make plots that are easy to show off to interviewers and recruiters before and during your interviews.
  5. create a web app (using Flask, or some other Python-based web dev framework). Ideally, you should be able to approach someone at a Meetup or during an interview, and have them try out a few input parameters, or play with a few knobs, and have some (ideally visually appealing) result returned to them.
  6. To make your project pitch compelling, make sure you have a story to tell about what you’ve built. Ideally, that story should include one or two unexpected insights you gained during your data exploration or model evaluation phase (e.g. “it turns out this class is really hard to tell apart from this other class because [reasons]”). This helps you out because it:
    • Weaves a narrative around your project that’s easier (and more interesting) for interviewers to remember; and
    • Makes it clear that you’re someone who makes a point of getting to the bottom of your data science problems.

Interview with Data Scientist at kaggle: Dr. Rachael Tatman

  1. Work on self projects, and not just Kaggle- this well help to self- learn data cleaning, annotation, design performance metric and what is the business value of your model
  2. Know the why behind things, don’t just say X doesnt work so I tried Y, know why X doesn’t work and why Y works
  3. Practise technical speaking more often at local meetups and conferences
  4. Don’t need a PhD to become a data scientist
  5. Celebrate failure! it means you are growing and getting closer to what will work
  6. Nobody knows everything and there is no ML expert. don’t be afraid to accept that you don’t know and then learn from it. But remember, you already have so kuch knowledge and you are bringing all your life experiences to learning
  7. Read textbooks and do courses
  8. Follow Twitter to stay updated
  9. ML is just a tool but an overhyped tool, don’t overrely on it and assume that it works under all situations and is always correct

Kaggle Grandmaster SRK’s Journey and Advice for Data Science Competitions

  1. Understanding the problem – It is really important to have a thorough understanding of the problem that we are trying to solve. Only after we’ve understood the problem clearly, we can derive suitable insights from data to tackle the problem and obtain good results. This applies to real life as well.

  2. Structured Thinking – It’s a unique way of thinking through the problems. Being a data scientist, one needs to be more structured in his/her thinking in order to obtain good results. Else, we might end up shooting in the dark as the number of options are way too many in most of the cases.

  3. Effective communication of results – Effective communication of derived results is as important as performing the data analysis. At times, it becomes difficult to communicate the nuances of final analysis in simple language to business people. As a Data Scientist, one must learn the art of effective communication.

Advice to Aspiring Data Scientists

  1. Get Practical Experience:
    • Knowing the theory and intuition behind algorithms is good, but getting hands-on practical experience is where the goldmine lies. Try to get your hands on a real-world dataset. See what you can wring out of it. This will be invaluable when you are sitting in an interview setting
  2. Participate in Hackathons:
    • Participating in competitions helps you understand where you stand among the community
  3. Try to Frame a Problem Statement on your own:
    • I really liked this advice. We don’t get industry experience in a hackathon setting. Instead, you can try to come up with a problem that you feel might help someone. Then build on that by collecting data around it. Solve the problem and showcase it via blogs or GitHub. It’s hard work but that translates to results sooner than you might expect
  4. Pick a Domain:
    • This is critical. So many aspiring data scientists aimlessly wander about applying to jobs in domains where they don’t hold any experience (or even interest). Pick a domain that is of interest to you and try to find datasets to work on

13 Common Mistakes Amateur Data Scientists Make and How to Avoid Them?

  1. Learning Theoretical Concepts without Applying Them
  2. Heading Straight for Machine Learning Techniques without Learning the Prerequisites
  3. Relying Solely on Certifications and Degrees
  4. Assuming that what you see in ML Competitions is what Real-Life Jobs are Like
  5. Focusing on Model Accuracy over Applicability and Interpretability in the Domain
  6. Using too many Data Science Terms in your Resume
  7. Giving Tools and Libraries Precedence over the Business Problem (understand the basic challenges in the industry you are applying data science to)
  8. Not Spending Enough Time on Exploring and Visualizing the Data (Curiosity)
  9. Not Having a Structured Approach to Problem Solving
  10. Trying to Learn Multiple Tools at Once
  11. Not Studying in a Consistent Manner
  12. Shying Away from Discussions and Competitions
  13. Not working on Communication Skills

Reproducible research best practices

How to reproduce a machine learning project

Cookie Cutter: Data Science Project Structure

Everything You REALLY Need to Know to Become a Data Scientist

5 fundamentals that you must really know :

  1. Programming language - R, python. You dont have to be a master in these. Just familiar enough that you can research your own questions, tackle coding issues and debug your own code without much help.
  2. Core machine learning algorithms - Regression, Naive Bayes, SVM, and random forests. Focus on core skills like evaluating machine learning classifiers and understanding the types of classification errors that are most important to the client. Understand how to compare various machine learning algorithms, and to have the ability to choose the correct parameters for models.
  3. Develop the skill to ask right questions: data intution, ask the right set of questions to get the most out of your data
  4. Problem-solving and critical thinking skills: translating your client needs into a concrete problem and breaking it down into a series of steps that lead to a solution. Ask ‘why’ at each stage of your datascience process. A problem solving way is :
    • Understand the problem, it’s significance, and what effect/change it will inspire.
    • Figure out where to find the data. If it doesn’t exist in a usable form, figure out how to collect it.
    • Look for trends and identify variables or features that best explain the outcome
    • Research different methods to fit a model to the data to successfully predict or explain the outcome.
    • Verify that the model fits the data well and predicts the outcome based on the business case.
    • Communicate your findings to the stakeholder such that they are able to understand the big picture impact of your solution.
  5. Communicaltion skills: tell a compelling story to engage the client and eventually lead them to implement your recommendations. Most clients do not speak data; they speak revenue, marketing, sales, or product. It’s your job as a data scientist to translate technical scientific matters into business context.

The scientific method of thinking

  1. Problem/observation
  2. Ask question why?
  3. Testable explanation (hypothesis)
  4. Experiment (control group and variations)
  5. Test Hypothesis
  6. Make conclusions
  7. Refine/iterate and repeat