Contents
  1. 1. hadoop伪分布配置
    1. 1.1. start
  2. 2. spark
  3. 3. carbon

与虚拟机完全通过xshell/moba等工具交互

环境变量可选写入/etc/profile或者~/.profile

hadoop伪分布配置

https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

etc/hadoop/core-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

etc/hadoop/hdfs-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

hadoop-env.sh

1
export JAVA_HOME=/opt/soft/jdk

start

/opt/soft/hadoop/sbin/start-dfs.sh
/opt/soft/hadoop/sbin/start-yarn.sh

50070 -> 9870

spark

使用derby无需配置,固定启动路径即可

carbon

spark-shell –jars apache-carbondata-2.3.0-bin-spark3.1.1-hadoop2.7.2.jar
spark-sql –conf spark.sql.extensions=org.apache.spark.sql.CarbonExtensions –jars apache-carbondata-2.3.0-bin-spark3.1.1-hadoop2.7.2.jar

spark-submit
–class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
–num-executors 3
–driver-memory 20G
–executor-memory 250G
–executor-cores 32
apache-carbondata-2.3.0-bin-spark3.1.1-hadoop2.7.2.jar

/beeline -u jdbc:hive2://:port

测试表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cd carbondata
cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF

CREATE TABLE IF NOT EXISTS test_table (
id string,
name string,
city string,
age Int)
STORED AS carbondata;


LOAD DATA INPATH '/local-path/sample.csv' INTO TABLE test_table;

select * from test_table;

数据存储路径:/opt/soft/spark-warehouse