TTMsql

Tiktok Interview Prepare

OLTP 和 OLAP

OLTP

Online Transaction Processing

Data processing system for managing daily business operations

Support a large number of concurrency of transactional applications (in the case of simultaneous multi-user operation)

OLAP

Online Analytical Processing

数据仓库中支持复杂的分析操作,使得用户可以通过各种视图(如汇总、分类、聚合)快速、有效地访问数据进行分析和决策。

Complex analytical operations

Various views (such as summary, classification, aggregation)

to access data and make decisions.

Eg:

Suppose we have a large online retail store that needs to process thousands of customer orders every day. This system needs to not only be able to quickly process these orders, but also manage inventory, process payments and update customer account information.

RDBMS

Relational Database Management System

Table

Primary Key

Foreign Key

Relations

SQL

Select

update

insert

delete

create table

Alter table

transaction management

ACID

atomicity

consistency

isolation

durability

Concurrency control

When multiple users access the database at the same time, the relational database manages concurrent operations through locking mechanisms and transaction isolation levels to prevent data conflicts and inconsistencies.

optimzer

The goal of the optimizer is to minimize query response time and resource usage.

Data integrity

entity integrity: primary key

reference integrity: foreign key

user defination integrity: check constrains

Middleware

the software layer beteween clients and servers

focus on simplify and manage data transmission, business logic, message services, etc.

Types Middleware

Communication middleware

消息队列(如RabbitMQ、Apache Kafka)

provide Asynchronous communication mechanism

Database middleware

Provision of database connectivity services

Connection Pool

Optimize database access and manage multi-user database access

Transaction middleware

Manage transactions across multiple databases, messaging services and other resources.

Make sure the ACID attribution of transaction

Object Request Broker (ORB)

in OOD, ORB support remote interaction among Object.

it allows the Object request in different internet

eg:

在分布式计算环境中,当一个客户端应用程序想要调用远程服务器上的对象方法时,ORB充当中介。客户端不需要知道服务具体在哪台服务器上,也不需要知道如何在网络上找到它和如何与之通信。客户端只需要知道它需要调用的方法和必要的参数。

Application Server

Provide an execution environment for enterprise-level applications

Web Middleware

handle http request, manage web content display and interaction

eg: web service, CMS system

Sharding

What is database sharding?

the data are divided into mutiple parts or pieces.

There are two kind of sharding:

horizontal sharding

Divide the rows in the table into multiple fragments,

and each fragment stores different rows of the table

but maintains the same table structure.

Eg:

we have a table including all the user in the world

We can do horizontal sharding based on user’s location

the user in China will be stored in the China database.

the user in USA will be stored in USA database/

vertical sharding

split the table by columns

with different segements storing different columns of the table.

Eg:

we have a user information table

we store the basic information of user and description information into two parts.

bastic information: name, address

description informationL performance, historical records

How does database shrding works?

  1. Chose sharding key
  2. Data allocation
  3. Query Routing
  4. Independent operation

Eg:

we are working in a large retail company.

the sales table record all the sales info of stores.

in order to optimization the speed and report generate speed, we want to sharde our database.

  1. Range partitioning

    Based on sale date to partition

    The sales record for 2020 is stored in the sales_2020 partition

    Sales records for 2021 are stored in the sales_2021 section

  2. List partitioning

    Based on sale region to partition

    The sales records of the eastern region are stored in the sales_east section.

    The sales records of the western region are stored in the sales_west section.

  3. Hash partitioning

    using hash function to evenly distribute the data to mutiple partition.

    Hash the sales table according to the hash value of customer_id

    When inserting a record, the system applies a hash function to customer_id.

    According to the hash results, the record is assigned to a specific partition.

How to make sure the data is evenly distributed in database?

we can use hash cycle to solve it.

Consistent hash sharding method —> hash cycle

Eg:

In an e-commercial platform

we want to do sharding by order_id

The order number is increasing sequentially.

Steps:

  1. Create a hash cycle

assume the range is beteween 0 and 2^32 - 1.

  1. Node mapping

map each sharding node to the hash ring.

the sharding node is calculated its hash value by the hash function.

  1. Data allocation

for each user_id, if we use hash function to calcuate a hash value.

The data for user_id will be stored on the first node found clockwise on the hash ring.

It means that each node is responsible for all hash values between its position on the ring

and the postion of the next node.

example

we have thre sharding node: node1, node2, node3.

node1 in 150

node2 in 500

Node3 in 1000

when a new user coming,

we assume the user_id is 12345

we get the hash value is 650

it will store in the clockwise closest node on the hash ring —> node3

Sharding

Sharding is the process of dividing data horizontally into subsets,

each stored on a different database server or slice.

Replication

copying data from a database server to one or more servers

Cluster

A collection of database servers

work together and appear as a single database system externally.

Working Experience in Database Sharding

Situation:

previous work

participant in a database optimization project of large e-commerical platform.

Task:

the data searching speed becomes slow when facing dramatic increase.

Especially,

During peak sales, the database performance will become the bottleneck.

We want to increase our database handling ability and query response time.

My role:

Design and implement the database shiding strategy

ensure the smooth execution of the whole process

Sharding strategy: horizontial sharding based on users’ geographical location.

we use mysql database, it has its own sharding function.

we also need to do data migration.

  1. set up mutiple copies for each slice

  2. use master-slave replication to ensure data consistency and avaliability.

  3. implement a monitoring system to monitor the performance and health status of each slice.

1
2
3
4
5
6
7
8
Situation
我在一家e-commerce platforms工作,
the growing of he user transaction volume
database performance becoming the bottlenecks

特别是在During peak sales,
Query latency growth 和Insufficient processing capacity
成为了主要挑战。
1
2
3
4
5
6
7
8
Task
1. database sharding
设计和实施一个有效的database sharding strategy,
Processing capacity和Query response time

2. data migration
ensure the saft migration
and set up fault recovery.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Action
1. sharding strategy desgin:
do horizontal segmentation based on location

Based on User's Geographic Location to do Horizontal Segmentation.
Accroding to users' sign in location ---> distribute the data into diff database center.

for the detail part:
use Mysql partition function to do sharding
> we have a table, user_id, name, country_code and other fields.
> based on country_code, we use sql to allocate the data into different areas.

2. data migration and redundancy
shard the data into diff pieces
configure copys of each pieces in different locations

and implement master-slave replication

3. deploy monitor system and automatic failover mechanism.
supervise the health of each node
Timely recovery in case of node problems
1
2
3
4
5
6
7
8
9
10
11
12
Result
response time decrease 50% than before
especiallu in peek hour, the response time reduce from 200 miliseonds to less than 100 milliseconds.

The feedback from our cutomers
customer satisfaction improve from 7 to 8 out of 10
based on the most recent customer satisfaction survey.

The whole view structure of Distributed System
How to target problem
do relevant research
Choose the right solution based on current resources and customer requirements

Transaction mechanism

MVCC

Query optimization

Master-slave replication

Backup and recovery

Storage engine

HA(High Availability)

Raft Project

隶属于 MIT 6824 Distributed System

what is Distribution System? —> How to distribute data into diff meachines/services.

当数据量 和 计算量 都非常大的时候, 我们需要将计算机 Scale.

vertical and horizontial

vertical: improve the performance of single PC

CPU cache

Instruction prediction

the speed of chip execution instructions

improve RAM capacity

improve hard disk read and write speed

what is RAM capacity?

Random Access Memory

store data for CPU in operation system.

DRAM: maintained by periodically refreshing the charge(电荷).

SRAM: quicker than DRAM, not need refreshing the charge.

How to improve the hard disk read and write speed?

  1. upgrade to SSD(固态硬盘) solid state disk (compare to HDD)

  2. using RAID configuration

    RAID0 和 RAID1

    RAID0, 图书馆存储一本书,书的每一页都放在不同的书架上。当需要借阅的时候,图书管理员会同时从多个书架上获取所有page。 可以加速查找和组装书。因为可以有多个图书管理员。

    RAID1, 图书馆的每本书都放在 不同的位置。 但是都有一个copy放在不同位置。当书本丢失可以复制。

  3. Optimize disk defragmentation

horizontal: Spreading the computation over many computers ----> Dstribution System

tips

vertical optimization

The CMU textbook CSAPP provides a number of examples.

horizontial optimization

use NGINX to do loadbalance

based on Map Reduce to do distribute calculation

Redis Cluster

ZooKeeper

Etcd: KV store

what is etcd?

Key-value distributed store.

used in Raft, for the log replication.

the information table in a large shopping mail.

Whenever a new store opens or the old store closes, the map and list of the information desk will be updated.

Pre-Knowledge

what is distribute System?

what is redis cluster?

using sharding and replication to distribute data into diff nodes. —> improve performance and data redundancy.

what is zookeeper?

waht is etcd

Real Question

Raft and Pixo protocols —> Consensus Agreement

They are all algorithms used to solve problems in distributed systems.

确保 多个分布式节点 能够 就 某个值 一致

Consensus Agreement 共识协议

Raft ----> Russia 集权

Mast and slave structure

Leader selection

Log copying: once the leader has been selected, all the modifcation will be send to master firstly. and leader will add these changes into their logs. and then send repelica to others.

Consistency confirmation: when most of slaves finish logs copy from the master. the leader notifies all servers to submit the entities.

Paxos —> America 分权

perfect in theory but complicated to implement.

there gonna be four basic components:

propose

Each node can propose an porposal

promise

After receiving proposal, the node will check the serial number of proposal

if the serial number is the largest number they have even seen

they will send promise to the proposed node

accept

After receving the promise of most mode, the proposed node would send accept to other nodes.

Learn

Once the propose is accept by most modes, others node will update its own state

and when node updating its own state, we call it learn.

Using Two Example to Exlpain Raft Protocol and Paxos Protocol

Eg:

Raft Protocol

Online Voting System

we have an online voting system

users can vote on different issues

this system consists mutiple servers to ensure reliability and availability.

Leader Selection:

when the system starts, on of the five servers trigger the leader election through a timeout(超时器).

the server would become a candiate.

sending requests votes to other servers.

Handling Voting:

Every vote submmited by users would be send to leader firstly.

the leader will record the vote as a log and send the log to other servers.

Log Replication and Submition:

when most servers have confired it has record the logs, the leader will record this log submmited status and apply it in System.

the leader will notifies other servers this commit and

Other servers will update their status.

there are also other detial conception in Raft.

Heartbeat Mechanism:

int Raft, the leader will send heartbeat signals to all followers.

the heartbeat usually be empty additional log entries.

Election Trigger:

if the follower does not recevie the leader’s heartbeat for a certain time(election time out)

they would turn their status into a candidate and start a new roud of election.

Term Number:

the candidate will increase their term number,

which is used to prevent the old leader from regaining power.

request Voting:

the vote that candidate send to other services will include current term number and candidate’s information(latest date of updating logs and term number)

Voting Strategy:

each server can only vote once in a term.

they will only vote for candidate whose log is at least as new as themself.

New:

the update and integrity of the log.

Election Result:

once canciadte recevie most vote from other service(including itself)

it will become the new leader

New leader will immedicately send heartbeat mechanism to others and prevent others from initiating elections.

Paxos Protocol

Eg:

Distributed Bookkeeping System 分布式记账系统

we have five node: A, B, C, D, E.

the new record: “Transfer 100 from account1 to account2”

Prepare Phase:

Assume it is node A to start the new proposal

it generate a proposal number n

n is greater than the previous proposal number

Promise Phase:

after other nodes recevied n, they will compare it with the largest known proposal number.

if n is the largest number, node send commit message to proposal node A.

Accept Phase:

if proposal node A receive most promise from the nodes group, it will make the final decision based on the content of proposal.

and A send acceptance request to other nodes and finalized the content.

Learn Phase:

accept the request and update in the account books.

The Different beteween Raft and Paxos

Raft

simplified leader election

one leader and many slaves

heatbeat

failure recovery: reelection

Paxos

Muti-stage consensus process

decentralization

example to undersand the different beteween Raft and Paxos

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Raft 协议的类比:选举班长方式
假设一班学生需要组织一次班级野餐。在Raft的模式下,班级首先需要选举出一个班长:

领导者选举:如果当前班长不可用(比如生病了),班级会快速举行一次新的班长选举。每个想成为班长的学生可以自荐。
统一指挥:一旦选出新班长,这位班长将负责所有野餐的安排,包括日期、地点、食物等。
执行决策:班长定期“报告”(心跳)给同学们关于准备进展的最新情况,确保大家知道正在发生什么,并保持班级的秩序。
简单直接:整个过程中,班长的角色非常清晰,所有决策和沟通都通过班长来协调和更新。
Paxos 协议的类比:全体学生协商方式
在Paxos的模式下,组织野餐的方式更加民主和分散:

多个提议:任何学生都可以提出野餐的提议,比如野餐的地点或日期。
承诺阶段:当一个学生提出提议时,他们需要从大多数同学那里获取支持(承诺)。同学们承诺支持的提议通常是基于提议的详细程度和合理性。
选择最终提议:一旦一个提议获得了足够的支持,该提议就会被接受。如果出现了更好的提议,同学们可能会改变主意,支持新的提议。
逐步达成共识:这个过程可能需要多轮讨论和投票,直到所有人都同意一个最终的计划。
1
2
3
Raft 类似于有一个明确的领导者(班长),他或她负责做出决策并保持班级的信息同步。这使得决策过程更快,但依赖于领导者的能力和可用性。

Paxos 更像是一个无领导的协商过程,每个人都可以提出建议,并通过一系列的讨论和修改逐步达成共识。这可能更复杂,但在没有明显领导者的情况下提供了灵活性和鲁棒性。

What is the difference beteween key and index?

key is only one unique identifier

primary key

foreign key

index is used to find target quickly

example to understand the key and index?

we have a student table, including student id, student name, class and birthdate.

the key can be student id, make sure each student is identiftied unique.

the index can be based on class or birthdate.

if we often use class field, we can chose class to build index.

when we need to search all the students belongs to 10 class, it gonna be quicker.

How does index to achieve quick search and query?

using B tree and Hash Index

B tree is a self balance tree

we normal use B+ tree as the index in database.

举个例子解释一下

Books tables

Book_id

Title

Autohr

Genre(type类型)

Public_date(出版日期)

without index, if we want to search a book name:

we need a full table scan

when using index

1
2
3
4
5
                [60]
/ \
[50] [70, 80]
/ \ / \
[30, 40, 45] [55, 58] [65, 68] [75, 90]

we have a student table, the field including id, name, grade and score.

Internal node: the root node of a tree is 60.

it split the tree into two parts.

the left subtree is small than 60.

the right subree is equal or greater than or equal 60.

when searching a student who’s score is 68.

from root start, we compare 68 and 60 —> 68 is greater than 60 —> go right

in the next layer, we compare 68 with 70 and 80 —> 68 is greater than 70 and 68 is smaller than 80 —> choose the middle one

in the leaft node find 68.

we have an online library system

we want to use B+ tree to implement the index of our database.

We have a reservations table

revervation_id

customer_name

restaurant_name

reservation_date

Num_of_Guests

using index

for example, we want to search a restaurant which name is Olive Garden

without index in the restaurant_name

We need to iterate through the whole table

setting restaurant_name as the index

B+tree structure

we do not need to iterate through the who table, we can jump into the record including ‘Olive Garden’ through index

search all the record, whose date is 2023-07-15

start from root node of B+Tree

chosing suitable sub branch

untill find the 2023-07-15 leaf node

in the leaf node, it including all the record information related with the 2023-07-15

the leaf node are composed by two parts:

Key: the reservation date

Value: the pointer, which pointed to a complete row of data

summary

within th helper of B plus tree, the Mysql Database can quickly find the actually row data by using the pointer in the leaf node tree.

why we need primary key in mysql DataBase?

  1. unique identifier

    it is the unique idetifier in talbe

    we can quickly find the record in our table.

  2. data integrity

    make sure there is not repeatable row in table

  3. build the connection among relational database

    in RD, primary key is used to establish a relationship with foreign key in other tables.

    These relationships enable data to be effectively associated with different tables.

  4. Query performance optimization

    Most database management System would create index for primary key.

DataStructure of DataBase

LSM tree

Log-Structured Merge-tree

commonly used in Apache Cassandra, RocksDB, LevelDB.

focus on handling large amount of writing

Used in database and index system

Core ideas of LSMT

  1. optimization of write

Put all the insertion, update and delete operation records into a structure in memory(memTable)

  1. compaction

when memTable achieve to a certain size, it will be converted into a unchangeable disk.

Call SSTable(Sorted Strings Table)

SSTable will be merged and compressed. it called compaction.

memTable data structure: jump table or red-black tree

What is jump table

also called branch table

by using if else condition to hdanling the data

create a arry, which strore mutiple function

when the new request/transaction coming in

according to the state in the array to call corresponding function to handle data.

eg:

it looks like a hashtable

key is the status

value is the function’s address

Real example:

1
2
3
4
5
6
order_status_to_function = {
'placed': handle_placed,
'shipped': handle_shipped,
'delivered': handle_delivered,
'canceled': handle_canceled
}
1
2
3
4
5
6
7
def update_order_status(order_id, status):
# 调用对应状态的处理函数
if status in order_status_to_function:
handler = order_status_to_function[status]
handler(order_id)
else:
print("Unknown status")

What is black-red tree

High speed response —> 可以确保 insert data 到 MemTable中,每次时间复杂度都为 log(n)

Black Red Tree

self balance tree

  1. each node either red or black
  2. root node is black
  3. all leaf node is red
  4. the children of red must be black
  5. the total num of black node beteween a random node and each leaf node is equal.
Black Red Tree

节点颜色:每个节点要么是红色,要么是黑色。

根节点是黑色:红黑树的根节点总是黑色的。

所有叶子节点都是黑色:这里的叶子节点是指树尾端的空(NIL)节点,它们都是黑色的。

红色节点的子节点必须是黑色:如果一个节点是红色的,那么它的两个子节点都必须是黑色的。这条规则防止了路径上出现连续的红色节点。

从任一节点到其每个叶子的所有路径都包含相同数目的黑色节点:这个属性保证了没有任何一条路径会比其他路径长出两倍以上,因此树大致是平衡的。

Eg:

1
2
3
4
5
      40(B)
/ \
20(B) 60(B)
/ \ / \
10(R) 30(R) 50(R) 70(R)
  • The path from the root node to the leaf node:
    • 40 -> 20 -> 10 (black -> black -> red)

    • 40 -> 20 -> 30 (black -> black -> red)

    • 40 -> 60 -> 50 (black -> black -> red)

    • 40 -> 60 -> 70 (black -> black -> red)

  • The path from the node 20 to the leaf node:
  • 20 -> 10: on this path, only node 20 is black.
  • 20 -> 30: again, only node 20 is black.

B+ tree

Self-balance tree

focus on frequently used database.

Logs stroage

Time serious database

NoSQL database

Cassandra and RocksDB

Summary

frequently writen —> use LSM Tree(Logs structure Merge Tree)

frequently read —> use B+ Tree

LLM combined with DataBase

Mainly used for enhancing user interface data analysis, and system maintance

User Interface

users can use natural language to do complex searching without using SQL

Eg:

Which region grew the fastest in sales in the last quarter?

The model is responsible for analyzing this problem and converting it into an appropriate database query.

Data Consistency Monitor and Control

eg:

for a Ditributed Database System

we have 3 different Node:

Node1, Node2, Node3

we want these tree node remian synchronized.

we can design a LLM. it receive snapshots from three nodes regually.

Analyse the snapshots and compare data diffreences among them.

Step 1: Data collection

Each node regularly generates data snapshots containing a summary of transaction records and sends these data snapshots to the central monitoring system.

Step 2: Data analysis

In the central monitoring system, a well-trained large language model has been deployed. This model is specially trained to understand and analyze database status reports and log files.

  • Model task: Compare data snapshots generated by different nodes and identify the problem of inconsistent data.
  • Analysis method: The model will check the key fields, such as customer ID, transaction amount and timestamp, and compare the record differences between different nodes.

Step 3: Report Generation

Once the model detects data inconsistency, it will automatically generate a detailed report.

  • Report content

  • There are data inconsistencies between which nodes.

  • Analysis of specific examples and possible causes of inconsistent data.

  • Provide repair suggestions based on past experience of successfully solving similar problems.

Step 4: Response and Optimization

  • Response measures: The database administrator takes measures according to the report of the model, such as manually triggering data synchronization or adjusting the configuration of automatic synchronization.

  • Continuous optimization: The model learns and adjusts according to the results of each event to improve the accuracy and usefulness of future reports.

Sample report

Assume that the model finds inconsistencies between North American and Asian nodes in processing the transaction records of a high-frequency customer. The report may include:

  • Problem description: The North American node shows that the customer’s transaction is $1,000, while the Asian node shows $1,200.

  • Analysis: Preliminary analysis shows that time difference and concurrent transactions may lead to update delays.

  • Recommendation: Immediately check the last synchronization time of the two nodes, confirm the network connection status, and manually trigger a forced synchronization.

In this way, the large language model helps to automate and simplify the process of monitoring and maintaining the consistency of distributed databases, reduces the need for manual intervention, and accelerates the diagnosis and response of problems. This not only improves the overall reliability of the database system, but also optimizes the work efficiency of the operation and maintenance team.

C++技术栈