What is Data?
Data (Images, text, files etc) is stored as Bits and Bytes in computers. Data is essentially a collection of raw bits and bytes.
Let’s just say you have a set of numbers:
21.5 112 61
21.4 135 66
21.6 213 73
These numbers, if presented to us as is has no significance. However, if you suddenly get a piece of information saying these columns represent the BMI, height and weight respectively, suddenly this data would get processed in our minds and would have a certain meaning to it. This processed form of raw information (data - which btw could be anything from text, to images, to anything else in its raw form) is what we call Information.
Another example:
Let’s just say you have a business like Amazon.com, you have a rating and review system, and your data recieved from users in this case is the Ratings and Reviews, and the processed data ie Information is what Amazon.com do on its backend (average review sentiment, customer(s) age etc), which can be used to make desicions about their recommendation system, all done by converting raw data into Information.
Difference between processing and not processing data:
Let’s say person A and person B have a restraunt, and persons A does not collect any sort of feedback data and B collects data about how frequently a dish is sold. This is then processed into Information about how dishes 1, 2 and 3 don’t sell well. Person B uses that data, which is processed to then make a desicion to not order as many raw materials and person B ends up saving cost, but person A does not.
What is a Database?
Database is essentially a system where data is placed from where data can be retrieved, updated, and managed.
To make use of real data, we make use of a DBMS (Database Management System)
What is a DBMS?
A DBMS on a high level has 2 parts, the DB (Database), where you store all data required.
A MS, the management system which is a set of programs to perform CRUD operations on a DB (Create, Read, Update, Delete).
In systems earlier before MySQL, PostgreSQL, the DBMS was essentially the Middleware/Interface between the users using an app, and the database itself, and this middleware was satisfied by making API calls to the DBMS which has access to the database, but today, the DB and DBMS are a part of the same entity, as seen in applications like MySQL.
Now, let’s build context towards why a DBMS is used today?
Earlier, before DBMS existed, we used to use manual File Systems. What is the problem with File Systems?
The biggest problem with File Systems is that a lot of work done to be able to handle data is extremely manual and time consuming. Let’s consider a few examples:
Data Redundancy: Let’s just say you have one filesystem created to handle users in a bank creating a Savings account. And now later, you create another filesystem to be able to handle users creating a Current account, and what you will notice is that when users are creating their savings AND current account, they would have their address, name and other details in both file systems, causing Data Redundancy. And also, adding additional functionality to be able to update fields across all file systems on any Update functionality performed is extremely manual and hard to implement across larger file systems.
Data Accessibility: Once a file system is created, once again, if the file system is very large, the task to add accessing functionality into the database for each field, based on parameters like sorting, searching functionalities etc, all of this needs to be implemented manually, cant just be done using queries like they are done in databases today, causing time-consuming operations for accessibility.
And also, if different file systems have different extensions (one of them is .dat, another one is .txt), then the data becomes even harder to be accessible due to no uniformity across larger file systems.
Atomicity: Some transactions/operations are meant to happen in a single shot with no breaks in between: for example if you debit money, it should be credited somewhere, these transactions are meant to happen instantly one after the other, but this is very hard to maintain in a multiple file system type architecture.
Concurrent Access Anolmolies: For example, if you’re debiting money from your phone, and your family member is debiting money in person, these operations are concurrent, and to manage this concurrent debiting such that one transaction happens only after the other etc, is all hard to maintain in a file system.
Security Problems: Maintaining user authorization allowing only certain users access is extremely hard on a file system.
All these above are issues that are resolved by using a Database Management System.
Abstraction:
Abstraction in the context we are in is the concept where certain intricate details that are irrelavant to a user are hidden from the user. Let’s take an example of a general explanation of what abstraction is:
When you drive a car and want to turn left/right, you only just turn the steering wheel right/left, and don’t really need to know much about the internal physics of how the car turns left.
Similarly, let’s just say for amazon.com, when a certain member of a department is accesing the DB, all they want to see is a certain view of data that is relevent to them. For instance, none of the departments in all probability need to see if the data in a certain field is encrypted, or its datatype etc. When accessing a database, all they want to see is a view of the DB that is relevant to them, so what abstraction allows us to do is:
Abstract out the tiny irrelevent details like datatype of fields from the database when viewed by its user(s), in this case the employees.
Further could abstract out irrelevant fields for certain teams to modify the view seen by users. For example, the logistics team at amazon.com probably does not care about the products a user bought, and hence the products field is abstracted out from the DB view shown to the users in the logistics team, but wouldnt be abstracted out for users from for example: the Customer Service team.
The Three Schema Architecture:
The main objective of a DBMS is to provide users an abstract view of data, ie, the system hides details about how data is stored an manipulated.
To simplify user interaction, this abstraction is provided across 3 levels which forms the Three Schema Architecture which ill discuss down below.
The whole point of the architecutre is to provide a personalized view of the Database, even though the data is only stored once.
1) Physical Level:
This is the lowest level of abstraction.
The internal schema represents the physical implementation of the database on the storage media. It describes how the data is physically stored, indexed, and accessed. It includes details such as file organization, storage structures, indexing methods, and access paths.
This is to define algorithms and ways allowing us the efficient retrieval of data.
2) Conceptual/Logical Level:
The data at the physical level is then converted into conceptual level where it now starts to look like a tabular format, giving context to what are data is about, all the fields present etc.
This process of converting data from the physical to logical level is known as Conceptual/Internal mapping.
Essentially, the conceptual schema provides context into the design of a DB at a conceptual level, describing what data is stored in the DB, and what relationships exist among several tables in the DB.
The logical Level does not care about the way data is stored at the Physical Level.
Goal: to make the DB easy to use.
3) View/External Level:
This is the highest level of abstraction that aims to simplify user interaction with the system of providing different view to different end user.
Each view schema would describe a certain part that a particular user group would be interested in, and the rest is left hidden.
At the External Level, the DB might also have certain schemas called as sub-schemas. The sub-schema is used to describe the different view of a database.
At the views level, there is also a mechanism for authorization, not allowing all users to access all parts of the DB.
Instance of a Database:
All rows and fields of a certain DB at a specific instance of time is known as the Instance of that database at that point of time.
Btw, Schema essentially refers to the design of a certain Database.
And, there are 3 types of Schemas:
Physical, Logical and View level Schemas, the ones we’ve studied about above.
The Default Schema of a Database is also termed as the Logical Level Schema.
This DB/Logical level schema consists of:
Attributes of a table.
Consistency Constraints (NOT NULL, PRIMARY KEY etc.)
Relationships it has with another table.
The Logical Schema is the most important level since that is what is used to create Applications.
To form relationships between multiple tables in a Database, for example the possible relationship(s) between a the Database for a Shopping Application which would have a Users table, a Products table, a Carts table, an Orders table etc, we use Data Models.
The Data Model helps underline the structure of the Database, which helps in describing relationships, constraints across tables etc.
Examples: ER model, Relational Model, OO Model (Object Oriented), Object Relational Data Model etc.
What are Database Languages?
DB languages exist to interact with the DB in different ways.
There are 2 types of Database Languages:
DDL (Data Definition Language) - DDLs are used to specify the DB schemas.
DML (Data Manipulation Language) - DMLs are used to manipulate data in the DB.
Typically, in DBs today like SQL and PostgreSQL, usually consist of both DDL and DML built in.
DDL helps us specify constraints and conditions to be taken care of when updating DBs (Eg: Create Table)
DML helps us manipulate (perform CRUD (Create, Read, Update, Delete) operations) in the DB. (Eg: select * from students)
Query Language: A part of DML helping us write commands to retrieve/manipulate data in the DB.
To interact with a DB from a host language (JS, Python, etc), languages usually have some sort of Package that helps us interact with the DB. (Java - JDBC, C/C++ - ODBC).
DBA (Database Administrator)
The person or entity that has complete control of the DB itself and the programs that govern the retrieval and manipulation of data.
Functions of a DBA:
The DBA works on the Conceptual/Logical Level and a little bit on the Internal Level of schemas, and the end user works on the External Level.
Defines the Schema for each DB and the tables.
Storage Structure and Access Methods (Defining what algorithm would be used to retrieve/insert data etc.
Authorization Control
Routine Maintenance of the Database:
Period Backups
Security Patches
DBMS Application Architecture:
Client Machine:
The place where the end-user works from. For example: When we open Instagram, we get to see all the posts of the people we follow from the database.
Server Machine:
The place where the DB actually runs from.
T1 Architecture:
It refers to a client-server model where the entire database system is installed and operates on a single machine or device. In this architecture, both the client application and the database management system (DBMS) are hosted on the same physical or virtual machine.
Often used in simple apps like text-editor, calculators, note-taking apps etc, which require simple personal databases. In the tier one archi
T2 Architecture:
Client Tier: The client tier includes the user interface (client application) responsible for interacting with the database.
Server Tier: The server tier consists of the database server that manages data storage, query processing, and other database operations. The server responds to requests from the client tier and handles database-related tasks.
T3 Architecture:
(Used most widely on a large scale)
Presentation Tier: The presentation tier represents the user interface or client-side components responsible for presenting data to users and capturing their input.
Application Tier: The application tier, also known as the business logic tier, handles application-specific logic, processing user requests, and managing data flow between the presentation tier and the data tier.
Data Tier: The data tier includes the database server or data storage layer, responsible for storing and managing the actual database. It handles data retrieval, storage, and manipulation operations.
T2 and T3 architecture is the one used in common recent web applications, even the ones made locally.
ENTITY RELATIONSHIP MODEL:
ER model is a data model which works on the Logical/Conceptual Level of schemas, used to visualize relationships between tables in a DB.
Entity: An entity is an object which is distinguishable from all other objects.
Example: A student in a school, a customer in a shop, etc.
2 types of Entities:
Strong Entity: It always consists of a primary key, and depends on no other entities. It is independent in nature.
Weak Entity: It usually consists of a foreign key, and depends on other strong entities.
Entities are often defined by a set of attributes:
Eg: Student - ID, Name, Address, Age, etc.
Every table’s Unique Attribute is commonly known as the Primary Key.
The ER diagram acts as the blueprint of the DB.
Each attribute of an entity can have one/more constraints:
You could assign an attribute to NOT NULL/only NUMERIC etc.
You could assign an attribute as the PRIMARY KEY
You could assign an attribute to be in one of the Domains.
Types of Attributes:
Simple Type: The attributes that cannot be divided further. Example: Customer’s account Number.
Composite Type: The attributes that can be divided into further attributes. Example: Customer’s name into first name, middle name, and last name or Address into city, state, etc.
SingleValued Attribute: The attributes which only take in a single value with no spaces. Example: first name, student ID, etc.
MultiValued Attribute: The attributes which only have more than one value. Example: Phone Number, full name, etc. Limit constraints for values can be applied.
Derived Attribute: The attributes which are formed from other attributes. For example: if you have an attribute DOB, you can derive the age of the user.
NULL Value - An attribute that does not have a value. Sometimes, NULL values end up representing DB inconsistencies. But also, sometimes they’re that way intentionally. For example a user not having any Middle Name. But if there is no first name mentioned, in that case, the NULL value is represented as unknown, which represents a DB inconsistency. The NULL value can also be represented as Not-Known, which is a case of for example the salary of an employee not being known.
Degree of Relations:
Unary Relation: The employees’ DB would also have one employee who would also be a manager for a certain team.
Binary Relation: Involves 2 entities in a Relationship.
Ternary Relation: Involves 3 entities in a Relationship → For example The relationship between an Order placed in a shopping store is determined by the User who has ordered it (Users DB) and the Product that has been ordered (Products DB)