问题：

替换的组通过运算符在Reactor

阙阳夏

2023-03-14

这是这个问题的后续问题。answers中提出的解决方案是使用groupBy运算符。这通常是好的，但正如其文档中所提到的，不建议与大量不同的键一起使用，比如说数以万计的键。

data
  .groupBy(Data::getPointID)
  .flatMap(sameIdFlux -> sameIdFlux
    .concatMap(processor::process)
  )
  .subscribe();

每个群体在本质上都有无限的元素，这些元素可能随时到达。此外，我需要限制并发处理的组的数量。据我所知，如果我使用上面的代码，要么我会达到开放组的隐式限制，新组不会被打开（处理），要么Ï最终会达到内存不足，因为即使是长时间不活动的组也不会关闭（想想删除的实体），因此会白白消耗一些内存开销。

是否有一些运算符/模式可以用于实现相同的行为而不会遇到上述问题？我最初试图用一些合理的持续时间来关闭每个组，但当一个组关闭并且相同的Id到达时，我对比赛条件持开放态度，因此它们将被并行处理，这是不理想的。

编辑：我正在进行更多的研究并尝试更多的方法，目前我最大的问题似乎是如何正确管理背压/正确限制最大并发而不限制组数本身。数据生成通常是线性的，但有时会产生我需要相应限制的大峰值。

朱欣荣

2023-03-14

我对Spring通量和Reactor项目是陌生的，所以我不知道有什么现成的模式可以解决你的问题。但是，您可以创建自己的模式来限制使用groupBy操作符创建的组的数量。

在下面的示例中，我使用了int partition=I%numberOfPartitions的模式灵感来自Apache Flink的这篇博客文章，它决定了分割流的分区数。

    public Flux<GroupedFlux<Integer, Data>> createFluxUsingGroupBy(List<String> dataList, int numberOfPartitions, int maxCount) {
        return Flux
                .fromStream(IntStream.range(0, maxCount)
                        .mapToObj(i -> {
                            int randomPosition = ThreadLocalRandom.current().nextInt(0, dataList.size());
                            int partition = i % numberOfPartitions;
                            return new Data(i, dataList.get(randomPosition), partition);
                        })
                )
                .delayElements(Duration.ofMillis(10))
                .log()
                .groupBy(Data::getPartition);
    }
........
@lombok.Data
@AllArgsConstructor
@NoArgsConstructor
public class Data {
    private Integer key;
    private String value;
    private Integer partition;
}

当我使用numberOfPartitions=3执行它时，无论我使用的键是什么，我都会有从0到2（3个分区）的分区。

    @Test
    void testFluxUsingGroupBy() {
        int numberOfPartitions = 3;
        int maxCount = 100;
        Flux<GroupedFlux<Integer, Data>> dataGroupedFlux = fluxAndMonoTransformations.createFluxUsingGroupBy(expect, numberOfPartitions, maxCount);
        StepVerifier.create(dataGroupedFlux)
                .expectNextCount(numberOfPartitions)
                .verifyComplete();
    }

这是日志：

10:43:02.168 [Test worker] INFO reactor.Flux.ConcatMap.1 - onSubscribe(FluxConcatMap.ConcatMapImmediate)
10:43:02.179 [Test worker] INFO reactor.Flux.ConcatMap.1 - request(256)
10:43:02.291 [parallel-1] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=0, value=Spring, partition=0))
10:43:02.362 [parallel-1] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.375 [parallel-2] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=1, value=Scala, partition=1))
10:43:02.377 [parallel-2] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.388 [parallel-3] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=2, value=reactive programming, partition=2))
10:43:02.389 [parallel-3] INFO reactor.Flux.ConcatMap.1 - request(1)
10:43:02.400 [parallel-4] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=3, value=java with lambda, partition=0))
10:43:02.411 [parallel-1] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=4, value=Spring, partition=1))
10:43:02.422 [parallel-2] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=5, value=java 8, partition=2))
10:43:02.433 [parallel-3] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=6, value=java with lambda, partition=0))
10:43:02.444 [parallel-4] INFO reactor.Flux.ConcatMap.1 - onNext(Data(key=7, value=java with lambda, partition=1))
...

在没有私有整数密钥的情况下增强该解决方案在数据对象上可用，我可以基于哈希生成分区。我使用了另一个参数，即并行度。如果您使用并行度X将值保存在存储器上，并且之后读取相同的值但使用不同的并行度，则基本上是用于恢复操作X可以保留同一组上的值。所以我使用了int partition=（getDifferentHashCode（value）*parallelism）%numberOfPartitions 这也是受到我提到的博客帖子的启发。我更喜欢这种方法。

    public Flux<GroupedFlux<Integer, Data>> createFluxUsingHashGroupBy(List<String> dataList, int numberOfPartitions, int parallelism, int maxCount) {
        return Flux
                .fromStream(IntStream.range(0, maxCount)
                        .mapToObj(i -> {
                            int randomPosition = ThreadLocalRandom.current().nextInt(0, dataList.size());
                            String value = dataList.get(randomPosition);
                            int partition = (getDifferentHashCode(value) * parallelism) % numberOfPartitions;
                            return new Data(i, value, partition);
                        })
                )
                .delayElements(Duration.ofMillis(10))
                .log()
                .groupBy(Data::getPartition);
    }

    public int getDifferentHashCode(String value) {
        int hash = 7;
        for (int i = 0; i < value.length(); i++) {
            hash = hash * 31 + value.charAt(i);
        }
        return hash;
    }

单元测试：

    @Test
    void testFluxUsingHashGroupBy() {
        int numberOfPartitions = 3;
        int parallelism = 2;
        int maxCount = 100;
        Flux<GroupedFlux<Integer, Data>> dataGroupedFlux = fluxAndMonoTransformations.createFluxUsingHashGroupBy(expect, numberOfPartitions, parallelism, maxCount);
        StepVerifier.create(dataGroupedFlux)
                .expectNextCount(numberOfPartitions)
                .verifyComplete();
    }

关于背压问题，我认为它可能会出现在另一个SO问题中。

替换的组通过运算符在Reactor

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档