数据库同步

JDBC

默认情况下都是单线程，速度慢；
可优化为并行，但过大的并行会对数据源库造成压力；

Datax

对Oracle支持通过对splitPk进行sample查询后，根据网络限制计算得到的分片任务数，计算得各个任务的上下界作为where条件来并行

Seatunel

目前从源码看是不支持jdbc并行抽数

Spark

 def jdbc(
  url: String,
  table: String,
  columnName: String,    # 根据该字段分区，需要为整形，比如id等
  lowerBound: Long,      # 分区的下界
  upperBound: Long,      # 分区的上界
  numPartitions: Int,    # 分区的个数
  connectionProperties: Properties): DataFrame


val predicates =
  Array(
    "2019-08-02" -> "2019-09-01",
    "2019-09-02" -> "2019-10-01",
    "2019-10-02" -> "2019-11-01"
  ).map {
    case (start, end) =>
      s"cast(txntime as date) >= '$start' AND cast(txntime as date) <= '$end'"

  def jdbc(
      url: String,
      table: String,
      predicates: Array[String],
      connectionProperties: Properties): DataFrame

Flink

使用NumericBetweenParametersProvider设置步长和上下界

Manhua

Never Say Die

synchronization

数据库同步

JDBC

Datax

Seatunel

Spark

Flink