Contents

1. 架构
2. Setup
3. 常用配置
4. 常用命令
5. 常用操作
6. 特殊说明
7. 数据检查
1. 7.1. 查看 Tablet Meta
2. 7.2. palo studio
8. 问题
1. 8.1. 主机负载过大时 doris进程被kill => 内存常驻占用

架构

FE(Frontend) 1-5台（分为 Follower 和 Observer），存储元数据，包括日志和 image，通常从几百 MB 到几个 GB 不等。

BE(Backend) 10-100台，存放用户数据。3副本。

Broker 是用于访问外部数据源（如 hdfs）的进程。通常，在每台机器上部署一个 broker 实例即可。

一台机器上可以部署多个 BE 实例，但是只能部署一个 FE。
多个FE所在服务器的时钟必须保持一致（允许最多5秒的时钟偏差）

Setup

cat /proc/cpuinfo | grep avx2

# FE数据独立存储
mkdir -p <doris_meta_created>  
ln -s <doris_meta_original>fe/doris-meta <doris_meta_created>

常用配置

http://:8010

${DORIS_HOME}/doris-meta
storage_root_path = /data/soft/doris/storage
sys_log_dir = ${DORIS_HOME}/log

BE 节点需要先在 FE 中添加，才可加入集群（默认9030）Doris 内置 root 和 admin 用户

常用命令

连接到 FE：
mysql -h -P 8035 -uroot -p

use example_db;

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
user_id LARGEINT NOT NULL COMMENT “用户id”,
last_visit_date DATETIME REPLACE DEFAULT “1970-01-01 00:00:00” COMMENT “用户最后一次访问时间”,
cost BIGINT SUM DEFAULT “0” COMMENT “用户总消费”
)
AGGREGATE KEY(user_id)
DISTRIBUTED BY HASH(user_id)

insert into example_db.expamle_tbl values (1, ‘1970-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1971-01-01 00:00:00’,10)
insert into example_db.expamle_tbl values (1, ‘1969-01-01 00:00:00’,5)

select * from example_db.expamle_tbl;

spark-shell –master local[2] –jars /data/tmp/doris-spark-connector-3.0.3-2.12-1.0.0-SNAPSHOT.jar

常用操作

broker load

LOAD LABEL test_up984948464479_bmp$ymd
     (
     DATA INFILE("hdfs://cluster2/tmp/*")
     INTO TABLE tbl
     FORMAT AS "parquet"
     (seqid, userid, serviceid, startdate, enddate, updatetime)
     SET(
       seqid=seqid,
       userid=userid,
       serviceid=serviceid,
       startdate=startdate,
       enddate=enddate,
       updatetime=updatetime)
     )
     WITH BROKER 'hdfs_broker'(
         "username"="hadoop",
         "password"="")
     PROPERTIES("max_filter_ratio" = "1");

export

// set enable_parallel_outfile = true; --并行得到的数据不对！！@202202
select seqid from fclean where enddate <'202203'  into outfile "hdfs://tmp/"
FORMAT AS CSV
properties ("hdfs.fs.defaultFS" = "hdfs://host:port", "hdfs.hdfs_user" = "hadoop")

// better
select seqid from fclean limit 3  into outfile "hdfs://cluster2/tmp/123"
FORMAT AS CSV/parquet
properties ("broker.name" = "hdfs_broker")

batch delete

LOAD LABEL bmp.delete_16491519487 (
DELETE DATA INFILE("hdfs://cluster2/tmp/*")
INTO TABLE fclean
COLUMNS TERMINATED BY "\t"
(seqid)
SET
(
seqid=seqid
)
) WITH BROKER 'hdfs_broker'(
"username"="hadoop",
"password"="")
PROPERTIES (
"timeout" = "3600"
);

check load state

https://doris.apache.org/zh-CN/administrator-guide/http-actions/fe/get-load-state.html

1 2	curl -u username:passwd http://ip:8010/api/bmp/get_load_state?label=<label_name>

特殊说明

按条件删除数据
- delete条件只能是key列
- 折衷方案
  - 非key列允许为null
  - ALTER TABLE fclean2 ENABLE FEATURE “BATCH_DELETE”
  - 根据条件导出待删除数据的key列
  - 通过导入方式删除

数据检查

SHOW FRONTENDS;
SHOW BACKENDS;
SHOW BROKER;
SHOW PROC ‘/statistic’;
SHOW PROC ‘/statistic/124226’;

ALTER TABLE tbl MODIFY PARTITION p1 SET(“replication_num” = “2”);

SHOW TABLE STATUS from bmp;
SHOW TABLET FROM tbl1;
SHOW TABLET 29502553;

Tablet 对应的目录，通常位于 data/shard_id/tablet_id/schema_hash
BE 端引入了 RocksDB，用于存储 tablet 的元信息

查看 Tablet Meta

http://{host}:{port}/api/meta/header/{tablet_id}/{schema_hash}
./lib/meta_tool –root_path=/data/soft/doris/storage –operation=get_meta –tablet_id=228348 –schema_hash=2083337138

palo studio

palo是doris前身在百度内部的名称

palo-studio可看作doris+metabase
http://palo.baidu.com/docs/%E7%94%9F%E6%80%81%E5%B7%A5%E5%85%B7/Palo-Studio/%E5%AE%89%E8%A3%85%E9%83%A8%E7%BD%B2

问题

主机负载过大时 doris进程被kill => 内存常驻占用

[Fri May 13 09:19:06 2022] Out of memory: Kill process 19651 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19651 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48302876kB, file-rss:0kB, shmem-rss:0kB
[Fri May 13 09:19:06 2022] [19659]  1538 19651 18348102 12076299   34582        0             0 palo_be
[Fri May 13 09:19:06 2022] Out of memory: Kill process 19659 (palo_be) score 183 or sacrifice child
[Fri May 13 09:19:06 2022] Killed process 19659 (palo_be), UID 1538, total-vm:73392408kB, anon-rss:48305196kB, file-rss:0kB, shmem-rss:0kB

Manhua

Never Say Die

doris

架构