CSE 232 - Databases Systems

Lecture 1

Lecture 2 - Database Storage and Hardware Design

Peculiarities of storage and algorithms

Ex: 2-phase merge sort

With ram buffer of size \(M\) can sort up to \(M^2 / B\) records (at least one block from each ram buffer)

Horizontal Placement of SQL data in blocks

Lecture 3 - Indexing

Given condition on attribute, find the qualified records

Topics

Terms and Distinctions

Can have multi-level indexes

Pointers - a block pointer, value of record attribute, then a value representing the block and/or record location.

What if a key isn’t unique?

Insertion in sparse index: if no new blocked is created for the key, do nothing

Duplicate values and secondary indexes

Why bucket+record pointers is useful

B+Trees

Lecture 4 - Indexing Pt. 2

Hashing Schemes

Extensible Hashing

Linear Hashing

\(h(k)[i] \leq m \text{ bucket } h(k)[i]\)

Lecture 5 - Indexing Pt. 3 - Hashing

Bitmap Indices

name dep year
aaron suits 4
helen pens 3
jack PCs 4
yannis pens 1

then our bitmaps indices would be

dept bitmap
PCs 0010
pens 0101
suits 1000
year bitmap
1 0001
2 0000
3 0100
4 1010

Bitmap example 2:

Pens bitmap: 01000001 Sequence: [1, 5] Encoding: 01110101

Proving 2*nlog(m)

Lecture 6 - Query Processing

Example

SELECT B, D
FROM R,S
WHERE R.A = "c" and S.E = 2 AND R.C = S.C

How to execute this query?

Use relational algebra to describe the plan: e.g.

Above is very inefficient…

query optimization steps

Algebraic Operators

Predicates (\(\sigma\))

Projections (SET vs BAG): \(\Pi\)

Cartesian Product: \(\times\)

Algebraic Operators: A Bag Version

The Extended Projection

Example

SELECT 2*A, as D, B, C as CPRIME
FROM T

would be converted into

\[\pi_{2A\rightarrow D,B,C\rightarrow CPRIME} T\]

Cartesian Products -> Joins

Example of a theta join

SELECT T.A
FROM T,S
WHERE T.A=S.B

equates to

\[\pi_{T.A} (T \theta_{T.A=S.B} S)\]

Grouping and Aggregation: \(\gamma\)

Example 2:

SELECT Dept, AVG(Salary) AS AvgSal
FROM Employee
GROUP BY Dept
HAVING SUM(Salary) > 100
\[\pi_{\text{Dept,AvgSal}} \sigma_{\text{SumSal > 100}}\gamma_{\text{Dept,AVG(SALARY)->AvgSal,SUM(Salary)->SumSal}}(\text{Employee})\]

Sorting and Lists

Lecture 6 - Relational Algebra Optimization

Commutativity and Associativity

Rewriting of algebraic expressions for logical connectives

instructor not clear on this section. refer to slides

Pushing Selection Through Binary Operators: Union and Difference

Pushing Selection Through Cartesian Product and Join

Pushing Projections Through Binary Operators

Pushing Projections Through Join and Cartesian Product

Deriving some rules

What are always “good” transformations?

Examples

The Bottom Line

Algorithms for Relation Operators

Pipelining can greatly reduce cost.

Iteration Join

for r in R1:
  for s in R2:
    if r.C = s.C, output (r, s)

Lecture 7 - Joins with Indexes

Disk Oriented Computation Model

Lecture 8 - Estimating the Cost of a Query Plan

Requires

Estimating result size: need the following statistics

Ex:

Given \(W = R1\times R2\)

Size estimate for $$W = \sigma_{Z=val}(R)

Calculating Results with Inequalities

What about a query \(W = \sigma_{z \geq 15}R\)?

Size estimate of a join…

Computing T(W) when A of R1 is a foreign key of R2

\[T(W) = \frac{T(R2)}{V(R2, A)}T(R1)\]

Case: A 3-way join for 3 tables which share the same attribute?

Example:

R1, R2, R3

Join: \(R1\bowtie R2\bowtie R3 \rightarrow (R1\bowtie R2) \bowtie R3\)

Lecture 9 - Plan Enumeration

Practice problems

Given the relations R(A, B, C) and S(C, D, E) give 6 valid logical query plans

SELECT B, C, D
FROM R, A
where R.C=S.C AND R.A=5;

Given the relations R, S, and T, Give an algebraic expression that only contains the join operator

SELECT *
FROM R, S, T
where R.A=S.A AND S.B=T.B;

Given the plans from the above problem, give plans for executing the query.

Arranging the Join Order: Wong-Yussefi Algorithm

Lecture 10 - Failure Recovery

Failure Model

Storage Hierarchy

Lecture 11 - Transactions Cont’d

Intro to Concurrency Control

Lecture 12 - Transaction Concurrency Control Cont’d

Exercise: What is P(S) for $$S=w_3(A)w_2(C)r_1(A)w_1(B)r_1(C)w_2(A)r_4(A)w_4(D)

Lecture 13 - Concurrency Control Cont’d

  Shared Exclusive Increment
Shared T F F
Exclusive F F F
Increment F F T
Parent Lock (table) Child Locked In (tuple)
IS IS, S
IX IS, S, IX, X, SIX
S [S, IS] not necessary
SIX X, IX, [SIX]
X None

Rules:

Handling Inserts and Deletes

Lecture 14 - More Concurrency Control and Transactions

Concurrency Control and Recovery

Lecture 15 - Virtual and Materialized Views

Virtual view

CREATE VIEW V as
SELECT G, SUM(A) as S
FROM R
GROUP BY G

Materialized View

CREATE MATERIALIZED VIEW V as
SELECT G, SUM(A) as S
FROM R
GROUP BY G
  Virtual View Materialized View
When updating R Database does nothing Ideally, DB refreshes V to reflect changes of R
When querying V optimize and run query that view represents simply use the view as the base table

Lecture 15 - Query Processing …Again

Example

Given a query

SELECT *
FROM R, S, T, U
WHERE R.A = S.A AND R.B = S.B and T.C=U.C

Algorithm would

Local suboptimality of the basic approach, and the Selinger improvement

CSE 232 - Databases Systems - zac blanco