World of Data

World of Data, updated 8/26/25, 1:46 PM

Lecture notes for a number of intro to data classes

About Jack Zheng

Faculty of IT at Kennesaw.edu

Tag Cloud

The World of Data
A general introduction to data
processing and technology areas
Jack G. Zheng
Fall 2025
IT 6443 Data Processing Technology
Overview
• Fundamental data related concepts and terms
– What is data
– Data, information, knowledge
– Data type, format, structure, model, etc.
– Data processing

The data and data technology in today’s world
– Data sources
– Value and uses
– Data characteristics
– Data challenges
– Data technology and capabilities
• Data processing and management knowledge areas
– DAMA data knowledge areas
• Data processing job roles
– Practical data job roles and career paths
2
This lecture focuses on fundamental, high-level concepts and philosophical view
of data (instead of technical details), and a general view of the industry and
technology.
Basic Data Terms and Concepts
• Data refers to raw facts, figures, numbers, or
observations that can be collected, processed, and
interpreted to gain meaning or make decisions
– Data as a general term, refers to fundamental facts and
measures to describe things, activities, events, or abstract
models.
– Data is used to describe features of things, activities, events,
or abstract models
• Basic Terms and Concepts
– DIKW
– Data type
– Data format
– Data model
– Data structure
– Data processing
3

https://www.i-scoop.eu/big-data-action-value-context/dikw-model/


https://www.youtube.com/watch?v=K4i2FK52698


https://en.wikipedia.org/wiki/DIKW_pyramid


https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/


https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/


https://www.youtube.com/watch?v=u9DoQ9gY4z4


https://www.youtube.com/watch?v=MFUyQsJyKgg

DIKW
• The DIKW hierarchy depicts relationships between data, information,
knowledge (and wisdom).
– Data: raw value elements or facts

Information: the result of collecting and organizing data that provides context and meaning
– Knowledge: the concept of understanding information that provides insight to information,
thus useful and actionable
– Wisdom: the understanding of interactions and an integrated view, and the understanding
of implications and indirect results beyond a target domain.
4
Extended readings on DIKW

https://www.i-scoop.eu/big-data-action-
value-context/dikw-model/

https://www.youtube.com/watch?v=K4i2FK5
2698

https://en.wikipedia.org/wiki/DIKW_pyramid
Image from https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
Core readings on DIKW

https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/

DIKW Pyramid https://www.youtube.com/watch?v=u9DoQ9gY4z4

DIKW Pyramid with Sample Data https://www.youtube.com/watch?v=MFUyQsJyKgg
Data Related Terms
• Raw data
– Directly generated and logged from an event, preserved in its
original format. Example: system logs, POS transactions, etc.
• Processed data
– Transformed or cleaned using any techniques for the purpose of
storage, analysis, or presentation
• Measures/metrics
– A piece of data that is used for measuring an activity or a
phenomenon
– Metric data are aggregated or calculated based on raw data (or
other metrics); a kind of processed data
• Facts
– Data represents reality or things that actually happened
• Simulated/forecast/estimated data
– Data generated based on mathematical or statistical models
5

https://en.wikipedia.org/wiki/Data_type

Data Type
• Data type is an attribute of data that tells a computer
system how to interpret and process its value, and how a
human can use its value.
• Data types may include
– Simple (primitive) numeric types: int, decimal, etc.
– Qualitative (textual) and composite types: string, text, etc.
– Extended (app specific) data types: date/time, Boolean,
money, geo, etc.
– Abstract data types: object, array, etc.
– It even can include more digital multimedia forms like sound,
image, and video.
• The exact definition and application of data types depend
on the programming language, computing platform and
system.
6
Extended reading: https://en.wikipedia.org/wiki/Data_type

http://opendatahandbook.org/glossary/en/terms/machine-readable/


https://schoolgrades.georgia.gov/dataset

Data Format (or File Format)
• Data format defines the way how data and information is structured and
recorded in a computer file, particularly flat text file.
• Flat files are machine readable, meaning data in flat files is formatted in
a way that it can be automatically read and processed by a computer
program. Machine-readable data must have some structures, even if
they are implicit and may be defined outside the file.
– http://opendatahandbook.org/glossary/en/terms/machine-readable/
• Three major types of data formats are used in flat files in today’s data
processing world
– Comma Separated Values (CSV)
– JavaScript Object Notation (JSON)
– eXtensible Markup Language (XML)
• They are often used in data download/export, transfer, and storage.
– Example: https://schoolgrades.georgia.gov/dataset
7
Data formats will be covered with more details in later modules or classes.

https://en.wikipedia.org/wiki/Data_model

Data Model
• Data model is about representation
of data, with a set of concepts
and rules.
– A data model conceptualizes data elements and standardizes
how the data elements relate to one another
• Extended reading: https://en.wikipedia.org/wiki/Data_model
– Answers questions like: What is this data about? How is data
grouped and associated? How are they related?
• Data models depict and enable an organization to
understand its data assets through core building blocks
such as entities, relationships, and attributes.
• Typical types of examples of data models
– flat model, relational model, network model, dimensional
model
8
Data model will be covered with more details
in later modules and classes.
Example:
relational model

https://en.wikipedia.org/wiki/Data_structure


https://stackoverflow.com/questions/24228038/difference-between-datastructure-and-datamodel-with-example

Data Structure

In computer science, a data structure is a data organization,
management, and storage format that enables efficient access
and modification. More precisely, a data structure is a collection
of data values, the relationships among them, and the functions
or operations that can be applied to the data.
– https://en.wikipedia.org/wiki/Data_structure
• A data structure is a more technical and lower-level term. It
emphasizes a structure or a model that targets computer system
(vs. human) for optimal processing.
• For example, an array is a data structure, as well as dictionaries.
Also, the classes that form your data model, are data
structures too, any representation of a specific data object has to
be in form of a data structure.
– https://stackoverflow.com/questions/24228038/difference-between-
datastructure-and-datamodel-with-example
9

https://www.snowflake.com/en/fundamentals/understanding-structured-semi-structured-and-unstructured-data/


https://www.g2.com/articles/structured-vs-unstructured-data

Structured, Semi-structured, and Unstructured
• Structured data
– Structured data is data whose elements are defined for effective analysis. It
has been organized into a formatted repository that is typically a database
with all kinds of integrity rules. They can easily be mapped into pre-
designed fields and models.
• Semi-Structured data
– Semi-structured data is information that does not reside in a relational
database but that have some organizational properties that make it easier
to analyze. With some process, you can store them in the relation database
(it could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML/JSON data.
• Unstructured data
– Unstructured data is a data which is not organized in a predefined manner
or does not have a predefined data model, thus it is not a good fit for a
mainstream relational database. So, for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent in
IT systems and is used by organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF, Text, Media logs.
10
Key readings

https://www.snowflake.com/en/fundamentals/understanding-structured-semi-
structured-and-unstructured-data/

https://www.g2.com/articles/structured-vs-unstructured-data

https://en.wikipedia.org/wiki/Data_processing

Data Processing
• Data processing is, generally, "the collection
and manipulation of items of data to produce
meaningful information.” (particularly using
the computing technologies and systems).
– https://en.wikipedia.org/wiki/Data_processing
• Data processing can be considered as two
broad categories
– Transactional processing
– Analytical processing
11

https://techdifferences.com/difference-between-oltp-and-olap.html


https://www.ibm.com/cloud/blog/olap-vs-oltp

Types of Data/Information Processing
Transactional Processing
• Focus on data item processing
(storage, insertion,
modification, deletion),
transmission, and even some
non-analytical query
Analytical Processing
• Focus on queries, calculation,
reporting, analysis, and
decision support
12
For a more detailed comparison of OLTP and OLAP:
https://techdifferences.com/difference-between-oltp-and-olap.html
https://www.ibm.com/cloud/blog/olap-vs-oltp
• Change product price.

Increase customer credit limit.

Import data from another source
• What are the top 10 most
profitable products?

Is there a significant increase of
operational cost?
Notice the difference between these
terms as general concepts vs. as
particular technologies/systems.

http://en.wikipedia.org/wiki/DIKW_Pyramid


https://hbr.org/2010/02/data-is-to-info-as-info-is-not

DIKW and Data Processing
• The DIKW model can be loosely related to
the levels of transactional processing and
analytical processing
13
Transactional
Processing
Analytical
Processing
For more extensive reading: http://en.wikipedia.org/wiki/DIKW_Pyramid
Different opinion: https://hbr.org/2010/02/data-is-to-info-as-info-is-not

https://towardsdatascience.com/is-data-really-the-new-oil-in-the-21st-century-17d014811b88

Data in Today’s World
• Data is the new oil (of the digital economy)
– https://towardsdatascience.com/is-data-really-the-new-oil-in-the-
21st-century-17d014811b88
– Considered as one of important resources and assets, just like
financial resource and human resource.
– Also considered as a product or service
• Big data
– Four V’s: Volume (Scale), Variety, Velocity (Speed), Veracity
(Uncertainty)
• Data technologies and capabilities advancement
– Computing power and storage capacity increases
– New processing techniques such as parallel and distributed
processing, AI, etc.
• Evolving user needs
– Analytical needs
– Communication needs
14

https://sensorline.de/


https://www.datagear.com/

Where is data from?
• Depending on business and industry, data might be from different places, and
may have industry uniqueness in data collection method, technologies, and
tools.
• From direct human inputs
– Survey, registration
• From business operations and human activities
– Business operations of companies, governments, schools, hospitals,
– Generated by human activities like buying, viewing, playing, social, etc.
• From machine/system/app operations: monitored and logged by devices and
sensors
– Power management, using power meters
– Traffic management, using Continuous Count Stations (CCS). https://sensorline.de
– Parking lot, using cameras or sensors
– Scanning, Bar Code, RFID, https://www.datagear.com
• Natural phenomena measured and recorded by devices
– Weather, earth, universe, etc.
15

https://www.linkedin.com/pulse/big-data-silver-bullet-tomas-kratky

Big Data
• Big Data is high volume, high velocity,
and/or high variety information assets that
require new forms of processing to enable
enhanced decision making, insight
discovery and process optimization.
• Basic 4Vs (Gartner)
16
Volume (Scale)
Data volume is increasing exponentially, not linearly Even large amounts
of small data can result into Big Data .
Variety
(Complexity) Various formats, types, and structures. Big data covers non-
structure and various data formats including text, blob, multimedia, etc.
Information and knowledge management is the management of both
structured data (15% of information) and unstructured data (85% of
information), according to the Butler Group.
80 percent of business is conducted on unstructured information (Gartner
Group).
Velocity (Speed)
Data is being generated fast and needs to be processed fast.
Veracity
(Uncertainty)
Uncertainty due to inconsistency, incompleteness, latency, ambiguities,
or approximations.
Opinion: “Big Data is not
a system; it is simply a
way to say that you have
a lot of data.”
https://www.linkedin.com/pulse/big
-data-silver-bullet-tomas-kratky

https://www.ksi.mff.cuni.cz/~svoboda/courses/192-MDK/lectures/MDK-Lecture-01-Introduction.pdf

17
Figures from https://www.ksi.mff.cuni.cz/~svoboda/courses/192-MDK/lectures/MDK-Lecture-01-Introduction.pdf

http://en.wikipedia.org/wiki/Spreadmart

Common Data Use Challenges

Information overloading

too much data and information with varied formats and structure
– difficulty of data organization for effective access and retrieval
– difficult to find useful information (knowledge) from them
– multiple copies of data exists sometimes with conflicts
• Data everywhere
– Data in separate systems and different sources; internal and external
– Problem of spreadmart http://en.wikipedia.org/wiki/Spreadmart
– Over 43 percent of organizations have more than six content stores.
(Forrester Research).
• Difficulty of access
– We may have that data, but we cannot access it (or difficult to get it),
because of technical issues or administrative issues.
• Don’t have that data
– The data is simply not available.
– The collection of data may need additional process and is costly.
18

https://www.techtarget.com/whatis/definition/data-democratization

Evolving User Needs
• Data democratization
– Consumption of data reaches to the general public
– https://www.techtarget.com/whatis/definition/data-
democratization
• Evolving analytical needs
– Need real-time and most recent data
– Business user driven, agile, instant
– Exploratory and interactive
• Prevalence and wide expectation of data
visualizations and interactions
• The growing expectation of sharing and
publication of findings.
19
Fundamental Technologies in Data Processing
• Data storage
– Databases: Systems designed for organized data storage, retrieval, and management, essential for
efficient data processing. Examples include relational databases (like MySQL, PostgreSQL, Microsoft
SQL Server), and NoSQL databases (like MongoDB, Cassandra) for structured and unstructured data
respectively.
– Data Warehouses: Large repositories accumulating data from various sources, optimized for querying
and analyzing vast datasets to support business intelligence and decision-making.
• Data processing and manipulation
– ETL (Extract, Transform, Load) Tools: Essential for extracting data from various sources, transforming
it (cleaning, enriching, reformatting), and loading it into a data warehouse or data lake. Popular ETL tools
include Talend and Azure Data Factory.
– SQL (Structured Query Language): The universal language for managing and querying structured data
in relational databases.
• Data analysis and interpretation
– Business Intelligence (BI) Tools: Software that enables organizations to analyze data, create reports,
dashboards, and visualizations to gain insights and support decision-making. Tableau and Power BI are
prominent examples.
– Machine Learning (ML) & AI Algorithms: These algorithms analyze data to uncover patterns, make
predictions, and automate decision-making, driving automation and deeper insights. Popular ML
languages include Python, R, and SAS.
– Data Visualization Tools: Software like Tableau and Looker used to create visual representations of
data (graphs, charts, dashboards) for easier understanding and communication of insights.
• Cloud Computing: Offers scalable, flexible, and cost-effective solutions for data storage and
analysis, with platforms like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud
Platform.
20

https://towardsdatascience.com/the-future-of-computation-for-machine-learning-and-data-science-fad7062bc27d


https://www.frontierinternet.com/gateway/data-storage-timeline/


https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext

Developments in Data Technology Capabilities

Increasing computing power and capacity (hardware)
– Computing power for data processing increased: https://towardsdatascience.com/the-future-of-
computation-for-machine-learning-and-data-science-fad7062bc27d
– Data storage capacity increased: https://www.frontierinternet.com/gateway/data-storage-timeline/
– Data generation method proliferate; automated data collection devices and sensors, such as IoT
devices.
• Self-service tools expand the user base, and free power users from relying on dedicated IT
personnel
• Cloud-based systems lower the cost of ownership and reach to broader user base.
• Algorithm and techniques advancement make new applications and new insights, such as
parallel processing, machine learning, AI.
21
https://cacm.acm.org/magazines/2011/8/114953-an-
overview-of-business-intelligence-technology/fulltext

https://iabac.org/company/guides-standards

Knowledge Areas
• What specific domains and areas in data
processing?
– DAMA publishes Data Management Body of Knowledge
• Data management and IT
– “At my organization, the business leads the ‘why’ and
‘what’ of Data Management and Governance; IT leads
the ‘how’.”
– Essentially technology areas are consistent with data
management areas, in the sense that these areas are
supported by these technologies.

International Association of Business Analytics
Certification (IABAC) also publishes Data Science
Body of Knowledge
– https://iabac.org/company/guides-standards
22

https://www.dama.org/


https://www.dama-dk.org/onewebmedia/DAMA%20DMBOK2_PDF.pdf


https://www.dama.org/cpages/dmbok2r-wheels

DAMA Wheel
• The Data Management
Association (DAMA) is a non-
profit and vendor-independent
association of business and
technical professionals that is
dedicated to the advancement of
data resource management
(DRM) and information resource
management (IRM).
https://www.dama.org
• DAMA publishes a guidebook
called "The DAMA Guide to the
Data Management Body of
Knowledge" (DAMA-DMBOK). It
defines 11 data management
knowledge areas.
– OVERVIEW OF DMBOK
https://www.dama-
dk.org/onewebmedia/DAMA%20D
MBOK2_PDF.pdf
23
https://www.dama.org/cpages/dmbok2r-wheels

https://www.dama.org/cpages/dmbok-2-image-download

11 Knowledge Areas by DAMA
24
Extended reading: for more details, visit
https://www.dama.org/cpages/dmbok-2-image-download
Data Governance
planning, oversight, and control over management of data and the use of data and
data-related resources. While we understand that governance covers ‘processes’, not
‘things’, the common term for Data Management Governance is Data Governance, and
so we will use this term.
Data Architecture
the overall structure of data and data-related resources as an integral part of the
enterprise architecture
Data Modeling & Design
analysis, design, building, testing, and maintenance (was Data Development in the
DAMA-DMBOK 1st edition)
Data Storage & Operations
structured physical data assets storage deployment and management (was Data
Operations in the DAMA-DMBOK 1st edition)
Data Security
ensuring privacy, confidentiality and appropriate access
Data Integration &
Interoperability
acquisition, extraction, transformation, movement, delivery, replication, federation,
virtualization and operational support ( a Knowledge Area new in DMBOK2)
Documents & Content
storing, protecting, indexing, and enabling access to data found in unstructured
sources (electronic files and physical records), and making this data available for
integration and interoperability with structured (database) data.
Reference & Master Data
Managing shared data to reduce redundancy and ensure better data quality through
standardized definition and use of data values.
Data Warehousing &
Business Intelligence
managing analytical data processing and enabling access to decision support data for
reporting and analysis
Metadata
collecting, categorizing, maintaining, integrating, controlling, managing, and delivering
metadata
Data Quality
defining, monitoring, maintaining data integrity, and improving data quality
25
Four Categories of Job Roles
• Management role
– Focus on the management of data and information assets, setting policies and processes;
usually less technical
• Administration role
– Technical administration of various kinds of data systems, including data storage
(database, data warehouse, etc.), BI systems, reporting system, and other data processing
and analytics applications
– Maintain and monitor the security, integrity, reliability, and performance of systems
– Usually, system specific
• Development role
– Data engineering
• Design data models, systems, architectures
• Build data pipelines and move data
– Application development
• Build reports, dashboards, and other applications that help analysts, managers, and customers.
• Consume data APIs and services
• Analysis role
– Focus on analysis and reporting; build analytical models and present results

Involve various degree of math, statistics, and computing algorithms
– Have strong domain knowledge
26

https://searchdatamanagement.techtarget.com/feature/Data-management-roles-Data-architect-vs-data-engineer-others

More Specific Jobs/Titles
27
Data
developer
Also include BI
analyst, or BI
developer.
https://searchdatamanagement.
techtarget.com/feature/Data-
management-roles-Data-
architect-vs-data-engineer-
others

https://www.oreilly.com/content/data-engineering-a-quick-and-simple-definition/


https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-pipeline-data-warehouse-and-data-engineer-role/

Data Engineer
• The role of the data engineer is mostly to ensure the quality and
availability of the data. This include the following most important
tasks
– Build and maintain data pipeline systems
– Clean and wrangle data into a usable state
– Design/build data storage systems, architectures, and
infrastructures
• Data engineer skills
– Programming
– Knowledge of tools and systems
– Data model, structure, format, architecture
– Relational and non-relational database design
– Data storage system design
– Data/information flow
– SQL, query execution and optimization
28
Reference reading: https://www.oreilly.com/content/data-engineering-a-quick-and-simple-definition/
Extended reading:
https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-pipeline-data-
warehouse-and-data-engineer-role/
Data App Development
• Major types of data driven apps
– Interactive applications that support intensive data query,
browsing, and searching
– Data presentation applications that support reports and
dashboards
– Data analysis tools with statical and mathematical algorithms
• Requirements
– Not only familiar with data knowledges but also skills of
application development, user interface and user experience.
• Skills
– Programming/coding
– UI design/development
– Query, SQL, non-SQL
– Data model
– Data visualization
29
Data Science
• Data Science is multidisciplinary
– Computer Scientists
– Information Technologist
– Statisticians/Mathematicians
– Domain Experts
• Science in Data Science
– Implying scientific methods and more advanced
algorithms
– More exploratory
30

https://www.edocr.com/v/0jmn189y/jgzheng/ksu-bsit-data-concentration-overview


https://www.edocr.com/v/z1dwxbpy/jgzheng/ksu-msit-data-certificate


https://www.kennesaw.edu/ccse/degrees-programs/graduate/ms-information-technology/petition-graduate-cert.php


http://zheng.kennesaw.edu/advising


https://www.edocr.com/user/jgzheng


https://datascience.kennesaw.edu/


http://ccse.kennesaw.edu/cs/programs/cert-hpcc.php


https://advance.cpe.kennesaw.edu/business-analytics-course/

Data Education at KSU

BSIT concentration on “data analytics and technology”
– https://www.edocr.com/v/0jmn189y/jgzheng/ksu-bsit-data-
concentration-overview
• MSIT - Graduate Certificate in Data Analytics and
Intelligent Technology
– https://www.edocr.com/v/z1dwxbpy/jgzheng/ksu-msit-
data-certificate
– https://www.kennesaw.edu/ccse/degrees-
programs/graduate/ms-information-technology/petition-
graduate-cert.php

For more information
– http://zheng.kennesaw.edu/advising
– Lecture notes on Data, BI, and Data Visualization
https://www.edocr.com/user/jgzheng
• Other departments
– School of Data Science and Analytics
https://datascience.kennesaw.edu

IS 8935 Business Intelligence
– Certificate in High Performance Cluster Computing
http://ccse.kennesaw.edu/cs/programs/cert-hpcc.php
– CPE https://advance.cpe.kennesaw.edu/business-
analytics-course/
31

https://grow.google/certificates/data-analytics/


https://cloud.google.com/learn/certification/data-engineer


https://cloud.google.com/learn/certification/cloud-database-engineer


https://www.cloudskillsboost.google/paths/16


https://www.cloudskillsboost.google/paths/18


https://www.cloudskillsboost.google/paths/22


https://learn.microsoft.com/en-us/credentials/browse/?roles=data-analyst%2Cdata-engineer%2Cdata-scientist


https://www.tableau.com/learn/certification

Industry Certifications
• Google
– https://grow.google/certificates/data-analytics/
– https://cloud.google.com/learn/certification/data-engineer
– https://cloud.google.com/learn/certification/cloud-database-
engineer
– https://www.cloudskillsboost.google/paths/16
– https://www.cloudskillsboost.google/paths/18
– https://www.cloudskillsboost.google/paths/22
• Microsoft
– https://learn.microsoft.com/en-
us/credentials/browse/?roles=data-analyst%2Cdata-
engineer%2Cdata-scientist
• Tableau
– https://www.tableau.com/learn/certification
32

https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/


https://www.youtube.com/watch?v=u9DoQ9gY4z4


https://www.youtube.com/watch?v=MFUyQsJyKgg


https://online.hbs.edu/blog/post/data-literacy


https://www.snowflake.com/en/fundamentals/understanding-structured-semi-structured-and-unstructured-data/


https://www.dama-dk.org/onewebmedia/DAMA%20DMBOK2_PDF.pdf


https://www.oreilly.com/content/data-engineering-a-quick-and-simple-definition/


https://www.youtube.com/watch?v=_DGn-7134i0


https://www.youtube.com/watch?v=bxIF9X9k2IE


https://www.edureka.co/blog/data-analyst-vs-data-engineer-vs-data-scientist/


https://www.youtube.com/watch?v=ioZNNfxXXqo

Core Readings
• DIKW pyramid: https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
• Short video lectures on DIKW
– DIKW Pyramid https://www.youtube.com/watch?v=u9DoQ9gY4z4
– DIKW Pyramid with Sample Data https://www.youtube.com/watch?v=MFUyQsJyKgg
• Data literacy https://online.hbs.edu/blog/post/data-literacy
• Data structure
https://www.snowflake.com/en/fundamentals/understanding-structured-semi-structured-and-
unstructured-data/
• Overview of DMBOK V2 – 11 knowledge areas: https://www.dama-
dk.org/onewebmedia/DAMA%20DMBOK2_PDF.pdf
• Data engineering: https://www.oreilly.com/content/data-engineering-a-quick-and-simple-
definition/
• Data Analyst vs Data Engineer vs Data Scientist - from some job and career perspectives
– Data Analyst vs Data Engineer https://www.youtube.com/watch?v=_DGn-7134i0
– Data Analyst vs Data Scientist https://www.youtube.com/watch?v=bxIF9X9k2IE
– Data Analyst vs Data Engineer vs Data Scientist: Skills, Responsibilities, Salary
https://www.edureka.co/blog/data-analyst-vs-data-engineer-vs-data-scientist/ - from some job and career
perspectives - with a nice video https://www.youtube.com/watch?v=ioZNNfxXXqo)
33

https://www.g2.com/articles/structured-vs-unstructured-data


https://www.alation.com/blog/structured-unstructured-semi-structured-data/


https://www.ibm.com/think/topics/structured-vs-unstructured-data


https://towardsdatascience.com/is-data-really-the-new-oil-in-the-21st-century-17d014811b88/


https://techcrunch.com/2021/05/02/data-was-the-new-oil-until-the-oil-caught-fire/


https://www.youtube.com/watch?v=sHPY8zIhy60&list=RDCMUCrR22MmDd5-cKP2jTVKpBcQ&index=13


https://www.simplilearn.com/why-and-how-data-science-matters-to-business-article


https://towardsdatascience.com/programming-languages-for-data-scientists-afde2eaf5cc5


https://www.guru99.com/data-science-tutorial.html


https://www.dataversity.net/ten-myths-about-data-science/


https://www.mastersindatascience.org/careers/data-scientist/


https://www.simplilearn.com/tutorials/data-science-tutorial/what-is-data-science


https://www.simplilearn.com/data-analyst-vs-data-scientist-article


https://ischoolonline.berkeley.edu/data-science/what-is-data-science/


https://www.discoverdatascience.org/career-information/


https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-pipeline-data-warehouse-and-data-engineer-role/


https://searchdatamanagement.techtarget.com/feature/Data-management-roles-Data-architect-vs-data-engineer-others


https://dzone.com/articles/five-data-tasks-that-keep-data-engineers-awake-at


https://www.investopedia.com/articles/professionals/121515/data-analyst-career-path-qualifications.asp


https://blog.udacity.com/2014/12/data-analyst-vs-data-scientist-vs-data-engineer.html

Additional Good Resource
• More about structured data

https://www.g2.com/articles/structured-vs-unstructured-data

https://www.alation.com/blog/structured-unstructured-semi-structured-data/

https://www.ibm.com/think/topics/structured-vs-unstructured-data
• Opinions

Is Data Really the New Oil in the 21st Century? | Towards Data Science

https://techcrunch.com/2021/05/02/data-was-the-new-oil-until-the-oil-caught-fire/

Data governance https://www.youtube.com/watch?v=sHPY8zIhy60&list=RDCMUCrR22MmDd5-
cKP2jTVKpBcQ&index=13

Data science
– Why Data Science Matters and How It Powers Business Value https://www.simplilearn.com/why-and-how-data-science-
matters-to-business-article
– Programming Languages for Data Scientists https://towardsdatascience.com/programming-languages-for-data-scientists-
afde2eaf5cc5
– Data Science Tutorial for Beginners https://www.guru99.com/data-science-tutorial.html

https://www.dataversity.net/ten-myths-about-data-science/

https://www.mastersindatascience.org/careers/data-scientist/

https://www.simplilearn.com/tutorials/data-science-tutorial/what-is-data-science

https://www.simplilearn.com/data-analyst-vs-data-scientist-article

https://ischoolonline.berkeley.edu/data-science/what-is-data-science/
• More about jobs and careers

https://www.discoverdatascience.org/career-information/
– Data engineer: https://www.altexsoft.com/blog/datascience/what-is-data-engineering-explaining-data-pipeline-data-
warehouse-and-data-engineer-role/

https://searchdatamanagement.techtarget.com/feature/Data-management-roles-Data-architect-vs-data-engineer-others

https://dzone.com/articles/five-data-tasks-that-keep-data-engineers-awake-at
– Data analyst: https://www.investopedia.com/articles/professionals/121515/data-analyst-career-path-qualifications.asp

https://blog.udacity.com/2014/12/data-analyst-vs-data-scientist-vs-data-engineer.html
34