Database Basic Notes
Basic Concepts
Feature
- Massive
- Persistent
- Safe
- Multi-user
- Convenient
- Efficient
- Reliable
Common Words
- create/drop(from)/insert into/delete from/update
- restricts
- sub-queries
- views(shorthand for queries)
- left/right join on ...
- primary/foreign key
- references
(id, name, birth, major, grade) is not normalized, because grade is not relevant to student id (id, name, birth) + (id, major, grade) is normalized (name, os, lang) is not normalized, because os isn't relevant to lang (name, os) + (name, lang) is normalized Data-intensive applications may not use DBMS/Query Language at all e.g Hadoop, all operations on data stores in files
Data Format
XML
- single root element
- matched tags with proper nesting
- unique attributes within elements
DTD
Document Type Definition:
- similar grammar to regular expression(
*?
) - ID/IDRef should be unique
- CDATA: character data
- (#PCDATA): parsed character data (plain text between tags)
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Bookstore [
<!ELEMENT Bookstore (Book)*>
<!ELEMENT Book (Title, Authors, Remark?)>
<!ATTLIST Book ISBN ID #REQUIRED
Price CDATA #REQUIRED
Authors IDREFS #REQUIRED>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Remark (#PCDATA | BookRef)*>
<!ELEMENT BookRef EMPTY>
<!ATTLIST BookRef book IDREF #REQUIRED>
<!ELEMENT Authors (Author)*>
<!ELEMENT Author (First_Name, Last_Name)>
<!ATTLIST Author Ident ID #REQUIRED>
<!ELEMENT First_Name (#PCDATA)>
<!ELEMENT Last_Name (#PCDATA)>
]>
<Bookstore>
<Book ISBN="ISBN-0-23-23333-233" Price="233" Authors="Sabertazimi">
<Title>Kind of sword</Title>
<Authors>
<Author Ident="Sabertazimi">
<First_Name>Yilong</First_Name>
<Last_Name>Liu</Last_Name>
</Author>
</Authors>
</Book>
</Bookstore>
Tools: XML Copy Editor, XML Linter:
xmllint --valid --noout Bookstore.xml
XSD
XML Schema Definition:
xmllint -schema Bookstore.xsd -noout Bookstore.xml
JSON
JavaScript Object Notation:
- serializing data objects in files
- human-readable data
- semi-structured data
- number/boolean/string/array/object(empty or key-value pair) recursive constructs
Relational Algebra
Operators
- select operator σ(sigma):
σ(sID < 100 ^ sAge > 20)Table_Name
set constraints - project operator π(pi) :
π(sID, GPA)Table_Name
select certain columns - cross-product operator x: Table1 x Table2,
m tuples(rows) x n tuples(rows) =>
m*n
tuples(rows) - natural join operator ∞: σ(E1.A1 = E2.A1 ^ E1.A2 = E2.A2 ...) (E1 x E2)
- theta join operator ∞(condition): σ(condition) (E1 x E2), call condition as ϴ
- difference operator -: matching schemas => change rows/tuples
- union/intersection operator ∪ / ∩: matching schemas => change rows/tuples
- rename operator ρ: change schemas(attributes name), different schemas <=> same schemas (union/intersection/self-join)
- assign statement :=
- tree notation
π(sID, GPA) (σ(sID < 100 ^ GPA > 3.7) Student)
Higher-Level Database Design Models
Higher-Level Database Design Models -Translator->
Relational implemented by RDBMS
UML
Unified Modeling Language: PlantUML
Classes
for data modeling:
- add PK(primary key)
- drop methods
-----------
| student |
|---------|
|sID PK |
|sName |
|GPA |
|---------|
|<methods>|
-----------
Associations
relationships between objects of 2 classes):
- one to one:
1..1 --- 1..1
. - many to one:
* --- 1..1
- one to many:
1..1 --- *
. - many to many:
* --- *
.
----------- ---------
| student | |college|
|---------| | |
|sID PK |x..y Apply m..n| |
|sName |-------------------| |
|GPA | | |
|---------| | |
|<methods>| | |
----------- ---------
Associations Classes
- classes store information of relationship edge between 2 data classes
- unnecessary if 0..1 or 1..1
c1 * --- 1..1 c2
information of relationship edge can stored in c1
owing to every object of c1 only associated with 1 object of c2
Subclasses
children classes
Composition and Aggregation
Entity Relationship Model
SQL
- select ... from ... where
- insert into ... ...
- delete from ... where ...
- update ... set ... = ... where ...
DROP VIEW IF EXISTS Standings;
DROP VIEW IF EXISTS Count;
DROP VIEW IF EXISTS Wins;
DROP TABLE IF EXISTS Matches;
DROP TABLE IF EXISTS Players;
-- Players Table
CREATE TABLE Players (
id SERIAL primary key,
name varchar(255)
);
-- Matches Table
CREATE TABLE Matches (
id SERIAL primary key,
player int references Players(id),
opponent int references Players(id),
result int
);
-- Wins View shows number of wins for each Player
CREATE VIEW Wins AS
SELECT Players.id, COUNT(Matches.opponent) AS n
FROM Players
LEFT JOIN (SELECT * FROM Matches WHERE result>0) as Matches
ON Players.id = Matches.player
GROUP BY Players.id;
-- Count View shows number of matches for each Player
CREATE VIEW Count AS
SELECT Players.id, Count(Matches.opponent) AS n
FROM Players
LEFT JOIN Matches
ON Players.id = Matches.player
GROUP BY Players.id;
-- Standings View shows number of wins and matches for each Player
CREATE VIEW Standings AS
SELECT Players.id,Players.name,Wins.n as wins,Count.n as matches
FROM Players,Count,Wins
WHERE Players.id = Wins.id and Wins.id = Count.id;
Relational Design
Decomposition
- start with mega-relations: including all attributes
- decompose into smaller relations(BCNF/4NF)
functional dependencies
- A -> B => 1-1/n-1 mapping
- key sets: closure of sets contains all attributes
assuming relation R(A, B, C, D, ..., G) and closure of A, B {A, B} +
A->C->D, B->E->F, F->G
=> {A, B}+ = {A, B, C, ..., G} then, {A, B} is a key if there no exists such closure, then treat all-attributes as a key
BCNF
boyce-codd normal form:
- for each A -> B having A is super key && B isn't key
- not exists A -> B -> C
- here's the algorithm:
/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/
compute FDs for R
compute key for R using its FDs
while (there is relation R' aren't in BCNF) {
pick any R' with A -> B that violates BCNF (A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs for R1 and R2
compute keys for R1 and R2 using their FDs
}
Multi Valued Dependencies
A -> B && rest attributes
=>A ->> B
.A ->> B
(1-n mapping),A ->> C
(1-n mapping), noB -> C
/C ->> B
,B * C
redundant tuples/rows.A ->>B && A ->>C
=>A ->> B∩C
.A ->>B && B ->>C
=>A ->> C-B
.
4NF
Forth normal form:
- if A ->> B then A is key && B isn't key
- here's the algorithm:
/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/
compute FDs and MVDs for R
compute key for R using its FDs
while (there is relation R' aren't in 4NF) {
pick any R' with A ->> B that violates 4NF(A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs and MVDs for R1 and R2
compute keys for R1 and R2 using their FDs
}
Normalized Design
- every row has the same number of columns
- every row has a unique key(PRIMARY KEY)
- everything in a row is all relevant to unique key
- everything in a row is all relevant to each other
Indexes
- primary mechanism to improve performance of database
- persist data structures stored in database (hash tables/B trees/B+ trees)
- trade off:
scale of database
andworkload(query/update rate)
as input of physical design advisors
CREATE INDEX IndexName on T(A)
CREATE INDEX IndexName on T(A1, A2, ..., An)
CREATE UNIQUE INDEX IndexName on T(A)
DROP INDEX IndexName
Transactions
- a sequence of one/more SQL operations treated as a unit
- target: concurrency and failures recovery
Transaction Standard
- all or nothing(atomicity)
- transaction begins automatically on first SQL statement
- on "commit": old transaction ends, new one begins
- on session termination: current transaction ends
- "AutoCommit" turns each statement into transaction
ACID Properties
- Atomicity(Logging)
- Consistency
- Isolation: guarantee serializability(Locking)
- Durability(Logging)
Isolation Level
weaker isolation level: read uncommitted < read committed < repeatable read < serializable
- increased concurrency + decreased overhead = increased performance
- weaker consistency guarantees
- some system default: repeatable read
SET TRANSACTION READ ONLY;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Integrity Constraints
CREATE TABLE TableName (
... PRIMARY KEY,
... UNIQUE,
... CHECK (Condition),
... references TableName(ForeignKey),
... references TableName(ForeignKey) ON DELETE/UPDATE RESTRICT/SET NULL/CASCADE,
... ,
PRIMARY KEY (Attr1, Attr2, ...),
UNIQUE (Attr1, Attr2, ...),
CHECK (Condition),
FOREIGN KEY (Attr1, Attr2, ...) references
TableName(Bttr1, Bttr2, ...) [ ON ... (default RESTRICT) ]
);
CREATE ASSERTION AssertionName
CHECK (Condition);
Triggers (DBMS Level Constraints)
CREATE TRIGGER TriggerName
BEFORE|AFTER|INSTEAD OF Events(INSERT/UPDATE OF/DELETE ON TableName)
[ referencing-variables ]
[ FOR EACH ROW ]
WHEN ( Condition )
[ BEGIN ]
Action
[ END ];
CREATE TRIGGER Cascade
After DELETE ON S
REFERENCING OLD ROW AS O
FOR EACH ROW
DELETE FROM R WHERE A = O.B (R.A = S.B)
CREATE TRIGGER Cascade
After DELETE ON S
REFERENCING OLD TABLE AS OT
DELETE FROM R WHERE A IN (SELECT B FROM BT)
Views
- logical layer: hiding data from users
- modularity and reuse of query
VIEW ViewName = VIEWQUERY (R1, R2, ..., Rn)
CREATE VIEW ViewName (T1, T2, ..., Tn) AS
< Query >
Modifications on Views
owing to views are logical layer, it's senseless to modify data on views
Implements Modification with Triggers
CREATE TRIGGER TriggerName
INSTEAD OF DELETE/UPDATE OF/INSERT ON ViewName
[ referencing-variables ]
[ FOR EACH ROW ]
WHEN ( Condition )
[ BEGIN ]
Action
[ END ];
SQL Standard - Updatable Views
- SELECT (no DISTINCT) on single table T
- no GROUP BY/HAVING or Aggregation
- attributes can't be NULL/default values
- sub-queries cant' refer to table T
CREATE VIEW
...
WITH CHECK OPTION;
MongoDB Basic Notes
MongoDB Set Up
MongoDB Installation
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse"
\ | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install -y mongodb-org mongodb-org-server
\ mongodb-org-shell mongodb-org-mongos mongodb-org-tools
MongoDB Not Upgrade
echo "mongodb-org hold" | sudo dpkg --set-selections
echo "mongodb-org-server hold" | sudo dpkg --set-selections
echo "mongodb-org-shell hold" | sudo dpkg --set-selections
echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
echo "mongodb-org-tools hold" | sudo dpkg --set-selections
MongoDB Control
sudo service mongod start
sudo service mongod stop
sudo service mongod restart
MongoDB Uninstall
sudo service mongod stop
sudo apt-get purge mongodb-org*
sudo rm -r /var/log/mongodb
sudo rm -r /var/lib/mongodb
Shell Instruction
create and drop
create
use test
show dbs
drop
use dbToDrop
db.dropDatabase()
query
db.collection.find().pretty()
insert
db.collection.insert();
information
database
db.getName()
db.stats()
db.version()
db.getMongo()
collection
db.getCollectionNames()
db.printCollectionStats()