Job Roles in Data Science Field

  1. Data Analyst/Business Analyst
    • Pulling data out of SQL databases, becoming an Excel or Tableu master, and producing basic data visualizations and reporting dashboards. Occasionally analyze the results of an A/B test or take the lead on your company’s Google Analytics account. Data warehousing.
    • Understand the business need and present the data science solution to clients
  2. Data Engineer/Data Management Professional/Hadoop
    • Hired by companies who start seeing a lot of traffic [and increasingly large amounts of data], and they need someone to set up a lot of data infrastructure. Heavy statistics and machine learning expertise is less important than strong software engineering skills.
  3. Machine Learning Engineer
    • Ideal for someone who has a formal mathematics, statistics or physics background and is hoping to continue down a more academic path. Companies that fall into this group could be consumer facing companies with massive amounts of data or companies that are offering data-based service.
    • Algorithms, predictive analytics and deep learning
  4. Data Scientist
    • Perform analysis, touch production code, visualize data, etc.
    • Familiarity with tools designed for ‘big data’ and experience with messy real-life datasets
    • Do anything and everything related to data

image

image


12 Steps of Predictive Modeling

  1. Understanding the problem and business objective
  2. Finding why machine learning is needed to solve the problem and what are the methods in literature
  3. Selecting a single number metric
  4. Doing exploratory data analysis
  5. Pre-processing the data
    • Data cleansing
    • Outlier removal
    • Normalization / Standardization - Dummy variable creation
  6. Feature engineering
    • Feature selection
    • Feature transformation
    • Variable interaction
    • Feature creation
  7. Selecting the modeling algorithm (from simple to complex)
  8. Parameter tuning through cross validation
  9. Building the model
  10. Ensembling of models
  11. Checking the results on un-seen data and iterate
  12. Deploy model

Flowchart to Become a Data Scientist

image