Contents
  1. 1. 架构
  2. 2. 常用配置
  3. 3. 常用命令
  4. 4. 常用操作
    1. 4.1. broker load
    2. 4.2. export
    3. 4.3. batch delete
    4. 4.4. check load state
  5. 5. 特殊说明
  6. 6. 数据检查
    1. 6.1. 查看 Tablet Meta
    2. 6.2. palo studio
  7. 7. 问题
    1. 7.1. 主机负载过大时 doris进程被kill => 内存常驻占用

架构

FE(Frontend) 1-5台(分为 Follower 和 Observer),存储元数据,包括日志和 image,通常从几百 MB 到几个 GB 不等。

BE(Backend) 10-100台,存放用户数据。3副本。

Broker 是用于访问外部数据源(如 hdfs)的进程。通常,在每台机器上部署一个 broker 实例即可。

一台机器上可以部署多个 BE 实例,但是只能部署一个 FE。
多个FE所在服务器的时钟必须保持一致(允许最多5秒的时钟偏差)

常用配置

http://:8010

${DORIS_HOME}/doris-meta
storage_root_path = /data/soft/doris/storage
sys_log_dir = ${DORIS_HOME}/log

BE 节点需要先在 FE 中添加,才可加入集群(默认9030)Doris 内置 root 和 admin 用户

常用命令

连接到 FE:
mysql -h -P 8035 -uroot -p

use example_db;

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
user_id LARGEINT NOT NULL COMMENT “用户id”,
last_visit_date DATETIME REPLACE DEFAULT “1970-01-01 00:00:00” COMMENT “用户最后一次访问时间”,
cost BIGINT SUM DEFAULT “0” COMMENT “用户总消费”
)
AGGREGATE KEY(user_id)
DISTRIBUTED BY HASH(user_id)

insert into example_db.expamle_tbl values (1, ‘1970-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1971-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1969-01-01 00:00:00’,5)

select * from example_db.expamle_tbl;

spark-shell –master local[2] –jars /data/tmp/doris-spark-connector-3.0.3-2.12-1.0.0-SNAPSHOT.jar

常用操作

broker load

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
LOAD LABEL test_up984948464479_bmp$ymd
(
DATA INFILE("hdfs://cluster2/tmp/*")
INTO TABLE tbl
FORMAT AS "parquet"
(seqid, userid, serviceid, startdate, enddate, updatetime)
SET(
seqid=seqid,
userid=userid,
serviceid=serviceid,
startdate=startdate,
enddate=enddate,
updatetime=updatetime)
)
WITH BROKER 'hdfs_broker'(
"username"="hadoop",
"password"="")
PROPERTIES("max_filter_ratio" = "1");

export

1
2
3
4
5
6
7
8
9
// set enable_parallel_outfile = true; --并行得到的数据不对!!@202202
select seqid from fclean where enddate <'202203' into outfile "hdfs://tmp/"
FORMAT AS CSV
properties ("hdfs.fs.defaultFS" = "hdfs://host:port", "hdfs.hdfs_user" = "hadoop")

// better
select seqid from fclean limit 3 into outfile "hdfs://cluster2/tmp/123"
FORMAT AS CSV/parquet
properties ("broker.name" = "hdfs_broker")

batch delete

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
LOAD LABEL bmp.delete_16491519487 (
DELETE DATA INFILE("hdfs://cluster2/tmp/*")
INTO TABLE fclean
COLUMNS TERMINATED BY "\t"
(seqid)
SET
(
seqid=seqid
)
) WITH BROKER 'hdfs_broker'(
"username"="hadoop",
"password"="")
PROPERTIES (
"timeout" = "3600"
);

check load state

https://doris.apache.org/zh-CN/administrator-guide/http-actions/fe/get-load-state.html

1
2

curl -u username:passwd http://ip:8010/api/bmp/get_load_state?label=<label_name>

特殊说明

  • 按条件删除数据
    • delete条件只能是key列
    • 折衷方案
      • 非key列允许为null
      • ALTER TABLE fclean2 ENABLE FEATURE “BATCH_DELETE”
      • 根据条件导出待删除数据的key列
      • 通过导入方式删除

数据检查

SHOW FRONTENDS;
SHOW BACKENDS;
SHOW BROKER;
SHOW PROC ‘/statistic’;
SHOW PROC ‘/statistic/124226’;

ALTER TABLE tbl MODIFY PARTITION p1 SET(“replication_num” = “2”);

ADMIN SHOW REPLICA DISTRIBUTION FROM t_bms_service_sub;
ADMIN SHOW REPLICA STATUS FROM bmp.t_bms_service_sub WHERE STATUS != “OK”;
| TabletId | ReplicaId | BackendId | Version | LastFailedVersion | LastSuccessVersion | CommittedVersion | SchemaHash | VersionNum | IsBad | State | Status |

SHOW TABLE STATUS from bmp;
SHOW TABLET FROM tbl1;
SHOW TABLET 29502553;

Tablet 对应的目录,通常位于 data/shard_id/tablet_id/schema_hash
BE 端引入了 RocksDB,用于存储 tablet 的元信息

查看 Tablet Meta

http://{host}:{port}/api/meta/header/{tablet_id}/{schema_hash}
./lib/meta_tool –root_path=/data/soft/doris/storage –operation=get_meta –tablet_id=228348 –schema_hash=2083337138

palo studio

palo是doris前身在百度内部的名称

palo-studio可看作doris+metabase
http://palo.baidu.com/docs/%E7%94%9F%E6%80%81%E5%B7%A5%E5%85%B7/Palo-Studio/%E5%AE%89%E8%A3%85%E9%83%A8%E7%BD%B2

问题

主机负载过大时 doris进程被kill => 内存常驻占用

1
2
3
4
5
[Fri May 13 09:19:06 2022] Out of memory: Kill process 19651 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19651 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48302876kB, file-rss:0kB, shmem-rss:0kB
[Fri May 13 09:19:06 2022] [19659] 1538 19651 18348102 12076299 34582 0 0 palo_be
[Fri May 13 09:19:06 2022] Out of memory: Kill process 19659 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19659 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48305196kB, file-rss:0kB, shmem-rss:0kB