问题：

借用的值活得不够长（BufReader行（）到String的迭代器）

阴福

2023-03-14

使用以下示例代码：

use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;

type BoxIter<T> = Box<Iterator<Item=T>>;

fn tokens_from_str<'a>(text: &'a str) 
-> Box<Iterator<Item=String> + 'a> {
    Box::new(text.lines().flat_map(|s|
        s.split_whitespace().map(|s| s.to_string())
    ))
}

// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) 
-> BoxIter<BoxIter<String>>
where P: AsRef<Path> {
    let reader = reader_from_path(path_arg);
    let iter = reader.lines()
        .filter_map(|result| result.ok())
        .map(|s| tokens_from_str(&s));
    Box::new(iter)
}

fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
    let path = path_arg.as_ref();
    let file = File::open(path).unwrap();
    BufReader::new(file)
}

我收到以下编译器错误消息：

rustc 1.18.0 (03fc9d622 2017-06-06)
error: `s` does not live long enough
  --> <anon>:23:35
   |
23 |         .map(|s| tokens_from_str(&s));
   |                                   ^- borrowed value only lives until here
   |                                   |
   |                                   does not live long enough
   |
   = note: borrowed value must be valid for the static lifetime...

我的问题是：

如何解决这个问题（如果可能，在不改变函数签名的情况下？）

对更好的函数参数和返回值有什么建议吗？

共有3个答案

张姚石

2023-03-14

这里的问题是您确实使用to_string（）将每个项目转换为拥有的值，这是懒惰地完成的。因为它是懒惰的，所以使用to_string之前的值（a

这里最简单的解决方案是删除迭代器这一部分的延迟计算，并在分配行后立即分配所有令牌。这不会那么快，而且会涉及额外的分配，但与您当前的功能相比变化最小，并保持相同的签名：

// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<BoxIter<String>>
where
    P: AsRef<Path>
{
    let reader = reader_from_path(path_arg);
    let iter = reader.lines()
        .filter_map(|result| result.ok())
        .map(|s| {
            let collected = tokens_from_str(&s).collect::<Vec<_>>();

            Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
        });

    Box::new(iter)
}

此解决方案适用于任何较小的工作负载，并且它只会同时为该行分配大约两倍的内存。会有性能损失，但除非您有10mb的行，否则这可能无关紧要。

如果您选择了这个解决方案，我建议您将令牌的函数签名从_path更改为直接返回一个BoxIter

pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<String>
where
    P: AsRef<Path>
{
    let reader = reader_from_path(path_arg);
    let iter = reader.lines()
        .filter_map(|result| result.ok())
        .flat_map(|s| {
            let collected = tokens_from_str(&s).collect::<Vec<_>>();

            Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
        });

    Box::new(iter)
}

原始代码不起作用，因为您试图将借词返回到一个不返回的字符串。

我们可以通过返回字符串来解决这个问题——只是隐藏在不透明的API后面。这与breeden的解决方案非常相似，但在执行上略有不同。

use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;

type BoxIter<T> = Box<Iterator<Item=T>>;

/// Structure representing in our code a line, but with an opaque API surface.
pub struct TokenIntermediate(String);

impl<'a> IntoIterator for &'a TokenIntermediate {
    type Item = String;
    type IntoIter = Box<Iterator<Item=String> + 'a>;

    fn into_iter(self) -> Self::IntoIter {
        // delegate to tokens_from_str
        tokens_from_str(&self.0)
    }
}

fn tokens_from_str<'a>(text: &'a str) -> Box<Iterator<Item=String> + 'a> {
    Box::new(text.lines().flat_map(|s|
        s.split_whitespace().map(|s| s.to_string())
    ))
}

// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn token_parts_from_path<P>(path_arg: P) -> BoxIter<TokenIntermediate>
where
    P: AsRef<Path>
{
    let reader = reader_from_path(path_arg);
    let iter = reader.lines()
        .filter_map(|result| result.ok())
        .map(|s| TokenIntermediate(s));

    Box::new(iter)
}

fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
    let path = path_arg.as_ref();
    let file = File::open(path).unwrap();
    BufReader::new(file)
}

正如您所注意到的，tokens\u from_str没有区别，而tokens\u from_path只返回这个不透明的TokenIntermediate结构。这将与原始解决方案一样可用，它所做的只是将中间字符串值的所有权推给调用者，以便调用者可以迭代其中的令牌。

储臻

2023-03-14

免责声明：框架挑战

处理大文件时，最简单的解决方案是使用内存映射文件。

也就是说，你告诉操作系统你希望整个文件在内存中可以被访问，并且由它来处理文件的部分在内存中的分页和从内存中的分页。

一旦关闭，您的整个文件就可以作为访问

它可能并不总是最快的解决方案；这当然是最简单的。

弓胜泫

2023-03-14

一个问题是.split_whitespace（）接受引用，并且不拥有其内容。因此，当您尝试使用拥有的拥有对象构造SplitWhitesspace对象时（当您调用. map（|s|tokens_from_str（

use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use std::iter::IntoIterator;
use std::str::SplitWhitespace;

pub struct SplitWhitespaceOwned(String);

impl<'a> IntoIterator for &'a SplitWhitespaceOwned {
    type Item = &'a str;
    type IntoIter = SplitWhitespace<'a>;
    fn into_iter(self) -> Self::IntoIter {
        self.0.split_whitespace()
    }
}

// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> Box<Iterator<Item = SplitWhitespaceOwned>>
    where P: AsRef<Path>
{
    let reader = reader_from_path(path_arg);
    let iter = reader
        .lines()
        .filter_map(|result| result.ok())
        .map(|s| SplitWhitespaceOwned(s));
    Box::new(iter)
}

fn reader_from_path<P>(path_arg: P) -> BufReader<File>
    where P: AsRef<Path>
{
    let path = path_arg.as_ref();
    let file = File::open(path).unwrap();
    BufReader::new(file)
}

fn main() {
    let t = tokens_from_path("test.txt");

    for line in t {
        for word in &line {
            println!("{}", word);
        }
    }
}

类似资料：

迭代两个长度不同的向量

我有两个长度不同的< code>Vec,例如: 我想成对迭代它们，打印：我可以使用< code>Iterator::zip来获得在< code>xs和< code>ys中具有匹配元素的对: 但是对于“不匹配”位，我需要复杂的代码来检查长度并从其余部分中提取一部分。我想要一个完全基于迭代器的解决方案，所以我尝试了：这是不正确的，因为第一个循环跳过了在另一个列表中没有匹配的第一个元素(x=4)。
MapStruct：嵌套的可迭代到不可迭代的映射？

我发现了这个关于使用限定符进行可迭代到不可迭代映射的示例：但是，如果我想从emails集合的第一个元素中提取一个特定的字段，例如，就像我处理code一样？例如，我希望编写如下的映射：
j〇uery源码中值得借鉴的？

本文向大家介绍j〇uery源码中值得借鉴的？相关面试题，主要包含被问及j〇uery源码中值得借鉴的？时的应答技巧和注意事项，需要的朋友参考一下使用模块化思想，模块间保持独立，不会导致多个开发人员合作时产生的冲突。 1．在设计程序时，要结构清晰，髙内聚，低耦合。 2．利用多态的方式，实现方法的重载，提髙代码的复用率 3．jQuery的链式调用以及回溯 4．jQuery.fn.extend与;jQu
如何在包含少于迭代器值的数组中进行迭代

标题有点混乱，很抱歉，所以我有一个太多的数组，其中一个数组比第二个数组包含更多所以我想做的是例如这只会注销第一个一次，但我想要的是日志y和x长度一样多，希望这是足够清楚的
将借来值的向量收集到借来特征的vec中

有可能收集< code>Vec吗这是一个示例，基于属于特征问题的对象的向量：此操作失败，出现
为什么我得到列表中每个值的多次迭代[重复]

我有一个应用程序，我正在做一些数据库操作。根据这些价值，我也在做一些计算。但是我得到了列表中每个值的多次迭代我得到多个时间每个值为什么是这样。列表大小为5 这是我的代码，我在排队时得到了那个例外但我不明白我错在哪里。谁来帮忙

借用的值活得不够长（BufReader行（）到String的迭代器）

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档