CHEATSHEET January 20, 2022

doris

Words count 12k Reading time 11 mins. Read count 0

架构

FE(Frontend) 1-5台(分为 Follower 和 Observer),存储元数据,包括日志和 image,通常从几百 MB 到几个 GB 不等。

BE(Backend) 10-100台,存放用户数据。3副本。

Broker 是用于访问外部数据源(如 hdfs)的进程。通常,在每台机器上部署一个 broker 实例即可。

一台机器上可以部署多个 BE 实例,但是只能部署一个 FE。
多个FE所在服务器的时钟必须保持一致(允许最多5秒的时钟偏差)

Setup

1
2
3
4
5
6
cat /proc/cpuinfo | grep avx2

# FE数据独立存储
mkdir -p <doris_meta_created>
ln -s <doris_meta_original>fe/doris-meta <doris_meta_created>

常用配置

http://:8010

${DORIS_HOME}/doris-meta
storage_root_path = /data/soft/doris/storage
sys_log_dir = ${DORIS_HOME}/log

BE 节点需要先在 FE 中添加,才可加入集群(默认9030)Doris 内置 root 和 admin 用户

常用命令

连接到 FE:
mysql -h -P 8035 -uroot -p

use example_db;

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
user_id LARGEINT NOT NULL COMMENT “用户id”,
last_visit_date DATETIME REPLACE DEFAULT “1970-01-01 00:00:00” COMMENT “用户最后一次访问时间”,
cost BIGINT SUM DEFAULT “0” COMMENT “用户总消费”
)
AGGREGATE KEY(user_id)
DISTRIBUTED BY HASH(user_id)

insert into example_db.expamle_tbl values (1, ‘1970-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1971-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1969-01-01 00:00:00’,5)

select * from example_db.expamle_tbl;

spark-shell –master local[2] –jars /data/tmp/doris-spark-connector-3.0.3-2.12-1.0.0-SNAPSHOT.jar

常用操作

broker load

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
LOAD LABEL test_up984948464479_bmp$ymd
(
DATA INFILE("hdfs://cluster2/tmp/*")
INTO TABLE tbl
FORMAT AS "parquet"
(seqid, userid, serviceid, startdate, enddate, updatetime)
SET(
seqid=seqid,
userid=userid,
serviceid=serviceid,
startdate=startdate,
enddate=enddate,
updatetime=updatetime)
)
WITH BROKER 'hdfs_broker'(
"username"="hadoop",
"password"="")
PROPERTIES("max_filter_ratio" = "1");

export

1
2
3
4
5
6
7
8
9
// set enable_parallel_outfile = true; --并行得到的数据不对!!@202202
select seqid from fclean where enddate <'202203' into outfile "hdfs://tmp/"
FORMAT AS CSV
properties ("hdfs.fs.defaultFS" = "hdfs://host:port", "hdfs.hdfs_user" = "hadoop")

// better
select seqid from fclean limit 3 into outfile "hdfs://cluster2/tmp/123"
FORMAT AS CSV/parquet
properties ("broker.name" = "hdfs_broker")

batch delete

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
LOAD LABEL bmp.delete_16491519487 (
DELETE DATA INFILE("hdfs://cluster2/tmp/*")
INTO TABLE fclean
COLUMNS TERMINATED BY "\t"
(seqid)
SET
(
seqid=seqid
)
) WITH BROKER 'hdfs_broker'(
"username"="hadoop",
"password"="")
PROPERTIES (
"timeout" = "3600"
);

check load state

https://doris.apache.org/zh-CN/administrator-guide/http-actions/fe/get-load-state.html

1
2

curl -u username:passwd http://ip:8010/api/bmp/get_load_state?label=<label_name>

特殊说明

  • 按条件删除数据
    • delete条件只能是key列
    • 折衷方案
      • 非key列允许为null
      • ALTER TABLE fclean2 ENABLE FEATURE “BATCH_DELETE”
      • 根据条件导出待删除数据的key列
      • 通过导入方式删除

数据检查

SHOW FRONTENDS;
SHOW BACKENDS;
SHOW BROKER;
SHOW PROC ‘/statistic’;
SHOW PROC ‘/statistic/124226’;

ALTER TABLE tbl MODIFY PARTITION p1 SET(“replication_num” = “2”);

ADMIN SHOW REPLICA DISTRIBUTION FROM t_bms_service_sub;
ADMIN SHOW REPLICA STATUS FROM bmp.t_bms_service_sub WHERE STATUS != “OK”;
| TabletId | ReplicaId | BackendId | Version | LastFailedVersion | LastSuccessVersion | CommittedVersion | SchemaHash | VersionNum | IsBad | State | Status |

SHOW TABLE STATUS from bmp;
SHOW TABLET FROM tbl1;
SHOW TABLET 29502553;

Tablet 对应的目录,通常位于 data/shard_id/tablet_id/schema_hash
BE 端引入了 RocksDB,用于存储 tablet 的元信息

查看 Tablet Meta

http://{host}:{port}/api/meta/header/{tablet_id}/{schema_hash}
./lib/meta_tool –root_path=/data/soft/doris/storage –operation=get_meta –tablet_id=228348 –schema_hash=2083337138

palo studio

palo是doris前身在百度内部的名称

palo-studio可看作doris+metabase
http://palo.baidu.com/docs/%E7%94%9F%E6%80%81%E5%B7%A5%E5%85%B7/Palo-Studio/%E5%AE%89%E8%A3%85%E9%83%A8%E7%BD%B2

问题

主机负载过大时 doris进程被kill => 内存常驻占用

1
2
3
4
5
[Fri May 13 09:19:06 2022] Out of memory: Kill process 19651 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19651 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48302876kB, file-rss:0kB, shmem-rss:0kB
[Fri May 13 09:19:06 2022] [19659] 1538 19651 18348102 12076299 34582 0 0 palo_be
[Fri May 13 09:19:06 2022] Out of memory: Kill process 19659 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19659 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48305196kB, file-rss:0kB, shmem-rss:0kB
0%