Flink的Checkpoint

API

使用StreamExecutionEnvironment.enableCheckpointing方法来设置开启checkpoint；具体可以使用enableCheckpointing(long interval)，或者enableCheckpointing(long interval, CheckpointingMode mode)；interval用于指定checkpoint的触发间隔(单位milliseconds)，而CheckpointingMode默认是CheckpointingMode.EXACTLY_ONCE，也可以指定为CheckpointingMode.AT_LEAST_ONCE
也可以通过StreamExecutionEnvironment.getCheckpointConfig().setCheckpointingMode来设置CheckpointingMode，一般对于超低延迟的应用(大概几毫秒)可以使用CheckpointingMode.AT_LEAST_ONCE，其他大部分应用使用CheckpointingMode.EXACTLY_ONCE就可以
checkpointTimeout用于指定checkpoint执行的超时时间(单位milliseconds)，超时没完成就会被abort掉
minPauseBetweenCheckpoints用于指定checkpoint coordinator上一个checkpoint完成之后最小等多久可以出发另一个checkpoint，当指定这个参数时，maxConcurrentCheckpoints的值为1
maxConcurrentCheckpoints用于指定运行中的checkpoint最多可以有多少个，用于包装topology不会花太多的时间在checkpoints上面；如果有设置了minPauseBetweenCheckpoints，则maxConcurrentCheckpoints这个参数就不起作用了(大于1的值不起作用)
enableExternalizedCheckpoints用于开启checkpoints的外部持久化，但是在job失败的时候不会自动清理，需要自己手工清理state；ExternalizedCheckpointCleanup用于指定当job canceled的时候externalized checkpoint该如何清理，DELETE_ON_CANCELLATION的话，在job canceled的时候会自动删除externalized state，但是如果是FAILED的状态则会保留；RETAIN_ON_CANCELLATION则在job canceled的时候会保留externalized checkpoint state
failOnCheckpointingErrors用于指定在checkpoint发生异常的时候，是否应该fail该task，默认为true，如果设置为false，则task会拒绝checkpoint然后继续运行

实例

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// start a checkpoint every 1000 ms
env.enableCheckpointing(1000);

// advanced options:

// set mode to exactly-once (this is the default)
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

// checkpoints have to complete within one minute, or are discarded
env.getCheckpointConfig().setCheckpointTimeout(60000);

// make sure 500 ms of progress happen between checkpoints
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);

// allow only one checkpoint to be in progress at the same time
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);

// enable externalized checkpoints which are retained after job cancellation
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

// This determines if a task will be failed if an error occurs in the execution of the task’s checkpoint procedure.
env.getCheckpointConfig().setFailOnCheckpointingErrors(true);

FlinkKafkaConsumer011

API

setStartFromGroupOffsets()【默认消费策略】
默认读取上次保存的offset信息如果是应用第一次启动，读取不到上次的offset信息，则会根据这个参数auto.offset.reset的值来进行消费数据
setStartFromEarliest() 从最早的数据开始进行消费，忽略存储的offset信息
setStartFromLatest() 从最新的数据进行消费，忽略存储的offset信息
setStartFromSpecificOffsets(Map<KafkaTopicPartition, Long>)

当checkpoint机制开启的时候，KafkaConsumer会定期把kafka的offset信息还有其他operator的状态信息一块保存起来。当job失败重启的时候，Flink会从最近一次的checkpoint中进行恢复数据，重新消费kafka中的数据。
为了能够使用支持容错的kafka Consumer，需要开启checkpoint env.enableCheckpointing(5000); // 每5s checkpoint一次
Kafka Consumers Offset 自动提交有以下两种方法来设置，可以根据job是否开启checkpoint来区分:
(1) Checkpoint关闭时：可以通过下面两个参数配置
enable.auto.commit
auto.commit.interval.ms
(2) Checkpoint开启时：当执行checkpoint的时候才会保存offset，这样保证了kafka的offset和checkpoint的状态偏移量保持一致。可以通过这个参数设置
setCommitOffsetsOnCheckpoints(boolean)
这个参数默认就是true。表示在checkpoint的时候提交offset, 此时，kafka中的自动提交机制就会被忽略

实例

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
    <version>1.7.1</version>
</dependency>

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
    <version>1.7.1</version>
</dependency>

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.11.0.1</version>
</dependency>

 public class StreamingKafkaSource {

    public static void main(String[] args) throws Exception {
        //获取Flink的运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //checkpoint配置
        env.enableCheckpointing(5000);
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
        env.getCheckpointConfig().setCheckpointTimeout(60000);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

        //设置statebackend
        //env.setStateBackend(new RocksDBStateBackend("hdfs://hadoop100:9000/flink/checkpoints",true));


        String topic = "kafkaConsumer";
        Properties prop = new Properties();
        prop.setProperty("bootstrap.servers","SparkMaster:9092");
        prop.setProperty("group.id","kafkaConsumerGroup");

        FlinkKafkaConsumer011<String> myConsumer = new FlinkKafkaConsumer011<>(topic, new SimpleStringSchema(), prop);

        myConsumer.setStartFromGroupOffsets();//默认消费策略

        DataStreamSource<String> text = env.addSource(myConsumer);

        text.print().setParallelism(1);

        env.execute("StreamingFromCollection");
    }
}

FlinkKafkaProducer011

API

Kafka Producer的容错-Kafka 0.9 and 0.10
如果Flink开启了checkpoint，针对FlinkKafkaProducer09和FlinkKafkaProducer010 可以提供 at-least-once的语义，还需要配置下面两个参数:
setLogFailuresOnly(false)
setFlushOnCheckpoint(true)
注意：建议修改kafka 生产者的重试次数retries【这个参数的值默认是0】
Kafka Producer的容错-Kafka 0.11，如果Flink开启了checkpoint，针对FlinkKafkaProducer011 就可以提供 exactly-once的语义,但是需要选择具体的语义
Semantic.NONE
Semantic.AT_LEAST_ONCE【默认】
Semantic.EXACTLY_ONCE

实例

public class StreamingKafkaSink {
    public static void main(String[] args) throws Exception {
    //获取Flink的运行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();


    //checkpoint配置
    env.enableCheckpointing(5000);
    env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
    env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
    env.getCheckpointConfig().setCheckpointTimeout(60000);
    env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
    env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

    //设置statebackend

    //env.setStateBackend(new RocksDBStateBackend("hdfs://SparkMaster:9000/flink/checkpoints",true));


    DataStreamSource<String> text = env.socketTextStream("SparkMaster", 9001, "\n");

    String brokerList = "SparkMaster:9092";
    String topic = "kafkaProducer";

    Properties prop = new Properties();
    prop.setProperty("bootstrap.servers",brokerList);

    //第一种解决方案，设置FlinkKafkaProducer011里面的事务超时时间
    //设置事务超时时间
    //prop.setProperty("transaction.timeout.ms",60000*15+"");

    //第二种解决方案，设置kafka的最大事务超时时间,主要是kafka的配置文件设置。

    //FlinkKafkaProducer011<String> myProducer = new FlinkKafkaProducer011<>(brokerList, topic, new SimpleStringSchema());

    //使用EXACTLY_ONCE语义的kafkaProducer
    FlinkKafkaProducer011<String> myProducer = new FlinkKafkaProducer011<>(topic, new KeyedSerializationSchemaWrapper<String>(new SimpleStringSchema()), prop, FlinkKafkaProducer011.Semantic.EXACTLY_ONCE);
    text.addSink(myProducer);

    env.execute("StreamingFromCollection");

  }
}

Flink1.7.1与Kafka0.11.0.1

Flink1.7.1与Kafka0.11.0.1

Flink的Checkpoint

API

实例

FlinkKafkaConsumer011

API

实例

FlinkKafkaProducer011

API

实例