Article January 23, 2024

hadoop3

Words count 7.2k Reading time 7 mins.

123456789#JAVAexport JAVA_HOME=/data/soft/jdkexport PATH=$JAVA_HOME/bin:$PATHexport CLASSPATH=$CLASS... Read article

Article January 23, 2024

hadoop3

Words count 7.2k Reading time 7 mins.

123456789#JAVAexport JAVA_HOME=/data/soft/jdkexport PATH=$JAVA_HOME/bin:$PATHexport CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib#hadoopexport HADOOP_HOME=/data/soft/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATHexport HADOOP_C... Read article

Article January 23, 2024

yarn

Words count 90 Reading time 1 mins.

调度模式 常驻 还是 任务 队列 资源配额 任务管理 yarn黑入-学习手动向yarn申请资源并调度脚本运行 Read article

Article January 23, 2024

JIT

Words count 633 Reading time 1 mins.

JIT的动机基于“二八定律”,20%的热点代码占据了程序80%的执行时间 即使开启了JIT,也少不了代码编译和字节码解释的过程。JIT处理的是热点代码(hotspot code,或叫热门代码)。 热点代码就是频繁执行的代码块,比如循环里面的代码。JIT有一套逻辑判断是否热点代码。 Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)“mixed mode”代表是混合模式,也就是先解释执行,并逐步将热点代码代替为机器代码。... Read article

Article January 23, 2024

hdfs

Words count 45 Reading time 1 mins.

写盘策略节点-磁盘 偏好位置 IO隔离 Read article

Article January 23, 2024

copy-on-write

Words count 628 Reading time 1 mins.

写时复制(Copy-on-write,简称COW) 性能优化策略 if you modify the second variable, Swift takes a full copy at that point so that only the second variable is modified by delaying the copy operation until it’s actually needed, 如果有多个调用者(callers)同时请求相同资源(如内存或磁盘上的数据存储),... Read article

Article January 23, 2024

spark

Words count 120 Reading time 1 mins.

多counter性能问题 - whole stage codegen,生成的代码长度超过JIT 开窗counter - spark context 外包一层 任务调度本地性 任务数量 Read article

Article January 23, 2024

protobuf-storetypes

Words count 16 Reading time 1 mins.

信息传递数据结构 Read article

Article January 23, 2024

api

Words count 1.3k Reading time 1 mins.

api.java该包用于java编程时 JavaDoubleRDD将scala Double 转java Double 注意一行代码:import java.lang.{Double => JDouble}scala语法起了个类的别名 而java也有类似的,如import com.example.Calendar as MyCalendar broadcastkeep a read-only variable cached on each machine rather ... Read article

Article January 23, 2024

other_tools

Words count 1.7k Reading time 2 mins.

小数据量对比 alluxio - 适合异地多机房,网络带宽足够大,机器学习这种数据短期重复使用的 ignite - 小数据量,本地计算 IQ presto carbondata FromcarbonToSpark尽量上位原理 不讲细节。 不提carbon,但是讲其中的相关内容?+scala语法 scala CarbonSession。scala @deprecate @transient @ 默认参数 => 函数 lazy方法 val var Option[] asInstanc... Read article

Article January 23, 2024

Features

Words count 1.7k Reading time 2 mins.

bloom filter-属于datamap 空间换时间 先介绍minmax, 当数据比较离散时,导致minmax效果不大 当数据比较集中时, 某列不是sort column时 参数配置是个比较困难的问题对比 32k 超长字符串 carbon设计问题,原本用short存储字符长度,只为兼容,在spark里的表示都是string use an integer instead of short to store the length of bytes content 相似问题:snappy压缩时by... Read article
Load more
0%