Category: Cloud Fundamentals

  • About Data Jobs in Cloud

    What is the difference between a data scientist and a data analyst?

    A data analyst works with data to solve business problems using tools like SQL, R, or other programming languages; data visualization software; and statistical analysis.

    Data analytics generally focuses on examining existing historical data to solve defined business problems and answer specific questions. Their work involves setting up dashboards, creating automated reports, and building data pipelines , and inform current decision-making.

    A data scientist uses more advanced data techniques to make predictions about the future. They may come up with the questions their team should be asking and determine how to use data to answer them.

    Data science goes beyond traditional analytics by using advanced techniques like machine learning and statistical modeling to predict future outcomes. Key differences include

    • Technical depth: Data scientists require strong programming and math skills.
    • Time orientation: Analytics focuses on describing and diagnosing past events, while data science predicts future outcomes and prescribes actions.
    • Tools: Data scientists often use complex machine learning and AI frameworks.

    Both data analysts and data scientists play valuable, complementary roles in organizations, with analysts laying the groundwork for data scientists to build upon.

    In Cloud Technology : Data Analysts vs. Data Scientists

    Data analysts typically use cloud services like:

    • Cloud data warehouses: Amazon Redshift, Google BigQuery, Azure Synapse Analytics
    • Business intelligence platforms: Tableau Online, Power BI Service, Looker, Google Data Studio
    • ETL/ELT services: AWS Glue, Azure Data Factory, Google Cloud Data Fusion
    • SQL query engines: Amazon Athena, Google BigQuery, Azure Cosmos DB


    Data scientists leverage advanced cloud offerings. They build, train, and deploy predictive models using scalable cloud infrastructure.

    • Machine learning platforms: AWS SageMaker, Azure Machine Learning, Google Vertex AI
    • Container orchestration: Kubernetes (on AWS, Azure, Google Kubernetes Engine)
    • Specialized computing resources: GPU/TPU instances (AWS EC2, Azure NCv3, Google Cloud AI Platform)
    • MLOps tools: AWS SageMaker Model Monitor, Azure Machine Learning Model Monitoring, Google Vertex AI Model Monitoring
    • Big data processing: Amazon EMR, Azure HDInsight, Google Cloud Dataproc, Databricks
    CategoryData Analytics in the CloudData Science in the Cloud
    Cloud Data WarehousesAmazon Redshift, Google BigQuery, Azure Synapse AnalyticsData scientists use occasionally for feature engineering and data exploration
    BI & Visualization ToolsTableau Online, Power BI Service, Looker, Google Data StudioData scientists use occasionally for model performance dashboards or exploratory analysis
    ETL/ELT ServicesAWS Glue, Azure Data Factory, Google Cloud Data FusionData scientists use it for querying datasets for model training
    SQL Query EnginesAmazon Athena, Google BigQuery (SQL), Azure Cosmos DBData scientists use for querying datasets for model training
    Machine Learning PlatformsNot for Data Analysts AWS SageMaker, Azure Machine Learning, Google Vertex AI
    Container OrchestrationNot for Data Analysts Kubernetes (on AWS EKS, Azure AKS, GKE)
    Compute ResourcesStandard VMs for dashboards and reportingGPU/TPU instances: AWS EC2 (p-series), Azure NCv3, Google Cloud AI Platform
    MLOps ToolsNot for Data Analysts AWS SageMaker Model Monitor, Azure ML Model Monitoring, Google Vertex AI Model Monitoring
    Big Data ProcessingDatabricks (for ETL & analytics), Amazon EMR (Hive/Spark), Azure HDInsightDatabricks (for model training), Amazon EMR, Azure HDInsight, Google Cloud Dataproc (Apache Spark/MLlib workflows)
    Primary Use CasesDashboards, reporting, KPI tracking, pipeline buildingPredictive modeling, training ML models, deploying AI pipelines, model monitoring

  • Cloud Fundamentals

    In the history of the word “cloud” and its development, core models like Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) and deployment models (public, private, community, and hybrid clouds) were critical in cloud computing.

    Data makes the cloud path dependent; “cloud is NOT a utility,” and “cloud is a retail model” due to its complexity and configurability.

    Cloud computing’s layered structure allows different experts to focus on specific areas. Users can choose how much control they want, making the cloud more flexible and useful.

    Adopting an authentic cloud is very important to deliver the benefits of cloud—cost savings, energy savings,
    rapid deployment, customer empowerment etc.

    Computer scientist Peter Mell, of the National Institute of Standards and Technology (NIST), had clearly stated the purpose of cloud computing: when agencies or companies use the definition, they have a tool to determine the extent to which the information technology implementations they are considering meet the cloud characteristics and models.

    AWS has met the criteria :

    Service Models:

    SaaS [Amazon WorkSpaces], PaaS [AWS Elastic Beanstalk], IaaS [EC2, S3]

    Deployment Models: 

    Public Cloud: AWS Global Infrastructure
    Hybrid Cloud: AWS Outposts, AWS Direct Connect

    Elasticity & scalability: 

    Auto Scaling Groups, Elastic Load Balancing (ELB), AWS Lambda

    Measured Service:

    AWS Cost Explorer, Billing Dashboard, Detailed Billing Reports  

    There are three ways to access AWS core services: with the AWS Management Console, the AWS Command Line Interface, and Software Development Kits.

    • Advantages of cloud computing over computing on-premises:
      • Avoid large capital purchases
      • Use on-demand capacity
      • Go global in minutes
      • Increase speed and agility
      • NOT: Paying for racking, stacking, and powering servers is not a benefit of cloud computing over on-premises computing.
    • Cloud Computing Model
      • Platform as a service
      • Infrastructure as a service
      • Software as a service
      • NOT System administration as a service