Database Basic Notes

Basic Concepts

Feature

Massive
Persistent
Safe
Multi-user
Convenient
Efficient
Reliable

Common Words

create/drop(from)/insert into/delete from/update
restricts
sub-queries
views(shorthand for queries)
left/right join on ...
primary/foreign key
references

(id, name, birth, major, grade) is not normalized, because grade is not relevant to student id (id, name, birth) + (id, major, grade) is normalized (name, os, lang) is not normalized, because os isn't relevant to lang (name, os) + (name, lang) is normalized Data-intensive applications may not use DBMS/Query Language at all e.g Hadoop, all operations on data stores in files

Data Format

XML

single root element
matched tags with proper nesting
unique attributes within elements

DTD

Document Type Definition:

similar grammar to regular expression(*?)
ID/IDRef should be unique
CDATA: character data
(#PCDATA): parsed character data (plain text between tags)

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE Bookstore [
    <!ELEMENT Bookstore (Book)*>
    <!ELEMENT Book (Title, Authors, Remark?)>
    <!ATTLIST Book  ISBN ID #REQUIRED
                    Price CDATA #REQUIRED
                    Authors IDREFS #REQUIRED>
    <!ELEMENT Title (#PCDATA)>
    <!ELEMENT Remark (#PCDATA | BookRef)*>
    <!ELEMENT BookRef EMPTY>
    <!ATTLIST BookRef book IDREF #REQUIRED>
    <!ELEMENT Authors (Author)*>
    <!ELEMENT Author (First_Name, Last_Name)>
    <!ATTLIST Author Ident ID #REQUIRED>
    <!ELEMENT First_Name (#PCDATA)>
    <!ELEMENT Last_Name (#PCDATA)>
]>

<Bookstore>
    <Book ISBN="ISBN-0-23-23333-233" Price="233" Authors="Sabertazimi">
        <Title>Kind of sword</Title>
        <Authors>
            <Author Ident="Sabertazimi">
                <First_Name>Yilong</First_Name>
                <Last_Name>Liu</Last_Name>
            </Author>
        </Authors>
    </Book>
</Bookstore>

Tools: XML Copy Editor, XML Linter:

xmllint --valid --noout Bookstore.xml

XSD

XML Schema Definition:

xmllint -schema Bookstore.xsd -noout Bookstore.xml

JSON

JavaScript Object Notation:

serializing data objects in files
human-readable data
semi-structured data
number/boolean/string/array/object(empty or key-value pair) recursive constructs

Relational Algebra

Operators

select operator σ(sigma): σ(sID < 100 ^ sAge > 20)Table_Name set constraints
project operator π(pi) : π(sID, GPA)Table_Name select certain columns
cross-product operator x: Table1 x Table2, m tuples(rows) x n tuples(rows) => m*n tuples(rows)
natural join operator ∞: σ(E1.A1 = E2.A1 ^ E1.A2 = E2.A2 ...) (E1 x E2)
theta join operator ∞(condition): σ(condition) (E1 x E2), call condition as ϴ
difference operator -: matching schemas => change rows/tuples
union/intersection operator ∪ / ∩: matching schemas => change rows/tuples
rename operator ρ: change schemas(attributes name), different schemas <=> same schemas (union/intersection/self-join)
assign statement :=
tree notation

π(sID, GPA) (σ(sID < 100 ^ GPA > 3.7) Student)

Higher-Level Database Design Models

Higher-Level Database Design Models -Translator-> Relational implemented by RDBMS

UML

Unified Modeling Language: PlantUML

Classes

for data modeling:

add PK(primary key)
drop methods

-----------
| student |
|---------|
|sID   PK |
|sName    |
|GPA      |
|---------|
|<methods>|
-----------

Associations

relationships between objects of 2 classes):

one to one: 1..1 --- 1..1.
many to one: * --- 1..1
one to many: 1..1 --- *.
many to many: * --- *.

-----------                   ---------
| student |                   |college|
|---------|                   |       |
|sID   PK |x..y   Apply   m..n|       |
|sName    |-------------------|       |
|GPA      |                   |       |
|---------|                   |       |
|<methods>|                   |       |
-----------                   ---------

Associations Classes

classes store information of relationship edge between 2 data classes
unnecessary if 0..1 or 1..1

c1 * --- 1..1 c2
information of relationship edge can stored in c1
owing to every object of c1 only associated with 1 object of c2

Subclasses

children classes

Composition and Aggregation

Entity Relationship Model

SQL

select ... from ... where
insert into ... ...
delete from ... where ...
update ... set ... = ... where ...

DROP VIEW IF EXISTS Standings;
DROP VIEW IF EXISTS Count;
DROP VIEW IF EXISTS Wins;
DROP TABLE IF EXISTS Matches;
DROP TABLE IF EXISTS Players;

-- Players Table
CREATE TABLE Players (
  id SERIAL primary key,
  name varchar(255)
);

-- Matches Table
CREATE TABLE Matches (
  id SERIAL primary key,
  player int references Players(id),
  opponent int references Players(id),
  result int
);

-- Wins View shows number of wins for each Player
CREATE VIEW Wins AS
  SELECT Players.id, COUNT(Matches.opponent) AS n
  FROM Players
  LEFT JOIN (SELECT * FROM Matches WHERE result>0) as Matches
  ON Players.id = Matches.player
  GROUP BY Players.id;

-- Count View shows number of matches for each Player
CREATE VIEW Count AS
  SELECT Players.id, Count(Matches.opponent) AS n
  FROM Players
  LEFT JOIN Matches
  ON Players.id = Matches.player
  GROUP BY Players.id;

-- Standings View shows number of wins and matches for each Player
CREATE VIEW Standings AS
  SELECT Players.id,Players.name,Wins.n as wins,Count.n as matches
  FROM Players,Count,Wins
  WHERE Players.id = Wins.id and Wins.id = Count.id;

Relational Design

Decomposition

start with mega-relations: including all attributes
decompose into smaller relations(BCNF/4NF)

functional dependencies

A -> B => 1-1/n-1 mapping
key sets: closure of sets contains all attributes

assuming relation R(A, B, C, D, ..., G) and closure of A, B {A, B} + A->C->D, B->E->F, F->G => {A, B}+ = {A, B, C, ..., G} then, {A, B} is a key if there no exists such closure, then treat all-attributes as a key

BCNF

boyce-codd normal form:

for each A -> B having A is super key && B isn't key
not exists A -> B -> C
here's the algorithm:

/*
 * @brief fixed point algorithm just like most algorithms from compiler
 *
 * by decomposing to transform non-key dependent attributes to key dependent attributes
 */

compute FDs for R
compute key for R using its FDs

while (there is relation R' aren't in BCNF) {
    pick any R' with A -> B that violates BCNF (A is not its key)
    decompose R' into R1(A, B) and R2(A, rest)
    compute FDs for R1 and R2
    compute keys for R1 and R2 using their FDs
}

Multi Valued Dependencies

A -> B && rest attributes => A ->> B.
A ->> B (1-n mapping), A ->> C (1-n mapping), no B -> C/C ->> B, B * C redundant tuples/rows.
A ->>B && A ->>C => A ->> B∩C.
A ->>B && B ->>C => A ->> C-B.

4NF

Forth normal form:

if A ->> B then A is key && B isn't key
here's the algorithm:

/*
 * @brief fixed point algorithm just like most algorithms from compiler
 *
 * by decomposing to transform non-key dependent attributes to key dependent attributes
 */

compute FDs and MVDs for R
compute key for R using its FDs

while (there is relation R' aren't in 4NF) {
    pick any R' with A ->> B that violates 4NF(A is not its key)
    decompose R' into R1(A, B) and R2(A, rest)
    compute FDs and MVDs for R1 and R2
    compute keys for R1 and R2 using their FDs
}

Normalized Design

every row has the same number of columns
every row has a unique key(PRIMARY KEY)
everything in a row is all relevant to unique key
everything in a row is all relevant to each other

Indexes

primary mechanism to improve performance of database
persist data structures stored in database (hash tables/B trees/B+ trees)
trade off: scale of database and workload(query/update rate) as input of physical design advisors

CREATE INDEX IndexName on T(A)
CREATE INDEX IndexName on T(A1, A2, ..., An)
CREATE UNIQUE INDEX IndexName on T(A)
DROP INDEX IndexName

Transactions

a sequence of one/more SQL operations treated as a unit
target: concurrency and failures recovery

Transaction Standard

all or nothing(atomicity)
transaction begins automatically on first SQL statement
on "commit": old transaction ends, new one begins
on session termination: current transaction ends
"AutoCommit" turns each statement into transaction

ACID Properties

Atomicity(Logging)
Consistency
Isolation: guarantee serializability(Locking)
Durability(Logging)

Isolation Level

weaker isolation level: read uncommitted < read committed < repeatable read < serializable

increased concurrency + decreased overhead = increased performance
weaker consistency guarantees
some system default: repeatable read

SET TRANSACTION READ ONLY;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Integrity Constraints

CREATE TABLE TableName (
    ... PRIMARY KEY,
    ... UNIQUE,
    ... CHECK (Condition),
    ... references TableName(ForeignKey),
    ... references TableName(ForeignKey) ON DELETE/UPDATE RESTRICT/SET NULL/CASCADE,
    ... ,
    PRIMARY KEY (Attr1, Attr2, ...),
    UNIQUE (Attr1, Attr2, ...),
    CHECK (Condition),
    FOREIGN KEY (Attr1, Attr2, ...) references
      TableName(Bttr1, Bttr2, ...) [ ON ... (default RESTRICT) ]
);

CREATE ASSERTION AssertionName
CHECK (Condition);

Triggers (DBMS Level Constraints)

CREATE TRIGGER TriggerName
BEFORE|AFTER|INSTEAD OF Events(INSERT/UPDATE OF/DELETE ON TableName)
[ referencing-variables ]
[ FOR EACH ROW ]
WHEN ( Condition )
[ BEGIN ]
Action
[ END ];

CREATE TRIGGER Cascade
After DELETE ON S
REFERENCING OLD ROW AS O
FOR EACH ROW
DELETE FROM R WHERE A = O.B (R.A = S.B)

CREATE TRIGGER Cascade
After DELETE ON S
REFERENCING OLD TABLE AS OT
DELETE FROM R WHERE A IN (SELECT B FROM BT)

Views

logical layer: hiding data from users
modularity and reuse of query

VIEW ViewName = VIEWQUERY (R1, R2, ..., Rn)

CREATE VIEW ViewName (T1, T2, ..., Tn) AS
< Query >

Modifications on Views

owing to views are logical layer, it's senseless to modify data on views

Implements Modification with Triggers

CREATE TRIGGER TriggerName
INSTEAD OF DELETE/UPDATE OF/INSERT ON ViewName
[ referencing-variables ]
[ FOR EACH ROW ]
WHEN ( Condition )
[ BEGIN ]
Action
[ END ];

SQL Standard - Updatable Views

SELECT (no DISTINCT) on single table T
no GROUP BY/HAVING or Aggregation
attributes can't be NULL/default values
sub-queries cant' refer to table T

CREATE VIEW
...
WITH CHECK OPTION;

MongoDB Basic Notes

MongoDB Set Up

MongoDB Installation

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse"
\ | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install -y mongodb-org mongodb-org-server
\ mongodb-org-shell mongodb-org-mongos mongodb-org-tools

MongoDB Not Upgrade

echo "mongodb-org hold" | sudo dpkg --set-selections
echo "mongodb-org-server hold" | sudo dpkg --set-selections
echo "mongodb-org-shell hold" | sudo dpkg --set-selections
echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
echo "mongodb-org-tools hold" | sudo dpkg --set-selections

MongoDB Control

sudo service mongod start
sudo service mongod stop
sudo service mongod restart

MongoDB Uninstall

sudo service mongod stop
sudo apt-get purge mongodb-org*
sudo rm -r /var/log/mongodb
sudo rm -r /var/lib/mongodb

Shell Instruction

create and drop

create

use test
show dbs

drop

use dbToDrop
db.dropDatabase()

query

db.collection.find().pretty()

insert

db.collection.insert();

information

database

db.getName()
db.stats()
db.version()
db.getMongo()

collection

db.getCollectionNames()
db.printCollectionStats()

Database Basic Notes

Basic Concepts​

Feature​

Common Words​

Data Format​

XML​

DTD​

XSD​

JSON​

Relational Algebra​

Operators​

Higher-Level Database Design Models​

UML​

Classes​

Associations​

Associations Classes​

Subclasses​

Composition and Aggregation​

Entity Relationship Model​

SQL​

Relational Design​

Decomposition​

functional dependencies​

BCNF​

Multi Valued Dependencies​

4NF​

Normalized Design​

Indexes​

Transactions​

Transaction Standard​

ACID Properties​

Isolation Level​

Integrity Constraints​

Triggers (DBMS Level Constraints)​

Views​

Modifications on Views​

Implements Modification with Triggers​

SQL Standard - Updatable Views​

MongoDB Basic Notes​

MongoDB Set Up​

MongoDB Installation​

MongoDB Not Upgrade​

MongoDB Control​

MongoDB Uninstall​

Shell Instruction​

create and drop​

create​

drop​

query​

insert​

information​

database​

collection​