问题：

处理 Jekyll 内容以将任何帖子标题的首次出现替换为具有该标题的帖子的超链接

西门飞翮

2023-03-14

我正在构建一个 Jekyll ruby 插件，它将用链接到同名帖子 URL 的超链接替换帖子副本文本内容中任何单词的第一次出现。

我已经让它工作了，但我无法找出process_words方法中的两个问题：

如何在帖子的主要内容副本文本中只搜索帖子标题，而不是帖子或目录之前的元标记（也是在主要帖子副本文本之前生成的）？我无法让它与Nokigiri一起使用，尽管这似乎是这里的首选工具。
如果帖子的网址不在post.data['url']，它在哪里？
另外，有没有更有效、更清洁的方法呢？

当前代码有效，但会替换第一个匹配项，即使它是 HTML 属性的值，如锚点或元标记。

我们有一个包含 3 篇文章的博客：

爱好
食物
自行车

而在“爱好”帖子正文中，我们有一个句子，每个单词都是第一次出现在帖子中，如下所示：

I love mountain biking and bicycles in general.

插件将处理该句子并将其输出为：

I love mountain biking and <a href="https://example.com/link/to/bicycles/">bicycles</a> in general.

# _plugins/hyperlink_first_word_occurance.rb
require "jekyll"
require 'uri'


module Jekyll

    # Replace the first occurance of each post title in the content with the post's title hyperlink
    module HyperlinkFirstWordOccurance
        POST_CONTENT_CLASS = "page__content"
        BODY_START_TAG = "<body"
        ASIDE_START_TAG = "<aside"
        OPENING_BODY_TAG_REGEX = %r!<body(.*)>\s*!
        CLOSING_ASIDE_TAG_REGEX = %r!</aside(.*)>\s*!

        class << self
            # Public: Processes the content and updates the 
            # first occurance of each word that also has a post
            # of the same title, into a hyperlink.
            #
            # content - the document or page to be processes.
            def process(content)
                @title = content.data['title']
                @posts = content.site.posts

                content.output = if content.output.include? BODY_START_TAG
                                    process_html(content)
                                else
                                    process_words(content.output)
                                end
            end


            # Public: Determines if the content should be processed.
            #
            # doc - the document being processes.
            def processable?(doc)
                (doc.is_a?(Jekyll::Page) || doc.write?) &&
                    doc.output_ext == ".html" || (doc.permalink&.end_with?("/"))
            end


            private

            # Private: Processes html content which has a body opening tag.
            #
            # content - html to be processes.
            def process_html(content)
            content.output = if content.output.include? ASIDE_START_TAG
                    head, opener, tail = content.output.partition(CLOSING_ASIDE_TAG_REGEX)
                            else
                    head, opener, tail = content.output.partition(POST_CONTENT_CLASS)
                            end
                body_content, *rest = tail.partition("</body>")

                processed_markup = process_words(body_content)

                content.output = String.new(head) << opener << processed_markup << rest.join
            end

            # Private: Processes each word of the content and makes
            # the first occurance of each word that also has a post
            # of the same title, into a hyperlink.
            #
            # html = the html which includes all the content.
            def process_words(html)
                page_content = html
                @posts.docs.each do |post|
                    post_title = post.data['title'] || post.name
                    post_title_lowercase = post_title.downcase
                    if post_title != @title
                        if page_content.include?(" " + post_title_lowercase + " ") ||
                            page_content.include?(post_title_lowercase + " ") ||
                            page_content.include?(post_title_lowercase + ",") ||
                            page_content.include?(post_title_lowercase + ".")
                            page_content = page_content.sub(post_title_lowercase, "<a href=\"#{ post.url }\">#{ post_title.downcase }</a>")
                        elsif page_content.include?(" " + post_title + " ") ||
                            page_content.include?(post_title + " ") ||
                            page_content.include?(post_title + ",") ||
                            page_content.include?(post_title + ".")
                            page_content = page_content.sub(post_title, "<a href=\"#{ post.data['url'] }\">#{ post_title }</a>")
                        end
                    end
                end
                page_content
            end
        end
    end
end


Jekyll::Hooks.register %i[posts pages], :post_render do |doc|
  # code to call after Jekyll renders a post
  Jekyll::HyperlinkFirstWordOccurance.process(doc) if Jekyll::HyperlinkFirstWordOccurance.processable?(doc)
end

用@Keith Mifsud的建议更新了我的代码。现在使用侧边栏的 side 元素或 page__content 类来选择要处理的正文内容。

还改进了检查和替换正确的术语。

PS：我从我的插件开始的代码库示例是 Mifsud 的 jekyll-target-blank 插件@Keith

共有1个答案

云开诚

2023-03-14

这段代码看起来很熟悉:)我建议您查看 Rspecs 测试文件以测试您的问题：https://github.com/keithmifsud/jekyll-target-blank

我会尝试回答你的问题，对不起，在撰写本文时我无法自己测试这些问题。

如何在帖子的主要内容副本文本中只搜索帖子标题，而不是帖子或目录之前的元标记（也是在主要帖子副本文本之前生成的）？我无法让它与Nokigiri一起使用，尽管这似乎是这里的首选工具。

您的要求是：

1）忽略

这似乎已经在 process_html（）方法中实现。此方法声明body_content的唯一进程，它应该按原样工作。你有测试吗？您如何调试它？相同的字符串拆分在我的插件中有效。即仅处理正文内的内容。

2）忽略目录（TOC）中的内容。我建议你通过进一步拆分body_content变量来扩展 process_html（）方法。搜索目录的开始和结束标记之间的内容（通过 id、css 类等）并将其排除，然后将其添加回字符串之前或之后的位置process_words。

3）是否使用诺基亚插件？这个插件非常适合解析 html。我认为您正在解析字符串，然后创建 html。所以香草Ruby和URI插件应该就足够了。如果需要，您仍然可以使用它，但它不会比在 ruby 中拆分字符串更快。

如果帖子的网址不在post.data['url']，它在哪里？

我认为您应该有一种方法来获取所有帖子标题，然后将“单词”与数组匹配。您可以从文档本身 doc.site.posts 获取所有帖子集合，并且每个帖子返回标题。process_words（）方法可以检查每个工作，以查看它是否与数组中的项目匹配。但是，如果标题由多个单词组成怎么办？

另外，有没有更有效、更清洁的方法呢？

目前为止，一切都好。我将从修复问题开始，然后重构速度和编码标准。

我再次建议您使用测试来帮助您解决此问题。

让我知道我是否可以提供更多帮助:)

类似资料：

将带有ACF的帖子标题返回到REST-API

有没有办法使用ACF to REST-API插件返回自定义帖子类型的帖子标题？当我在帖子类型上运行查询时，我得到的只是帖子ID，然后是我在其中设置的任何字段，但如果我可以像我只是使用ACF而不通过REST-API调用时一样抓取它，我希望不必在ACF中创建另一个名为“标题”的字段。这是我对名为“book”的自定义帖子类型的GET请求：该帖子类型包含作者姓名字段和用于将书籍段落与书籍相关联的关系字
在Spring Boot如何处理带有多部分/形式数据但没有内容处理标题的帖子

我正在使用Sprin引导，我需要处理这种类型的请求：控制器：在控制器中没有错误，但MuiltipartFiles的数量为0 没有Content-Distion Spring无法识别MultipartFile。有人知道怎么处理吗？谢谢。
Bitfinex API帖子标头

我无法正确设置Bitfinex API的标头(https://www.bitfinex.com/pages/api). 我对未经身份验证的Get调用没有问题，但我无法使经过身份验证的Post调用工作。我正在处理的一个示例调用是“余额”的帖子。我希望使用API的人能帮助我解决我做错的事情。以下是我目前正在生成的一些示例输入和输出（当然是假键）：私钥： API密钥：有效载荷： Base64负载：
wordpress帖子内容副本

我已经制作了一个包含四个不同类别的主页，我认为它运行得很好，但是现在所有的帖子都和第一篇帖子有相同的内容。链接和特色图片都很好，但文本不知何故被覆盖了。在此屏幕上，所有文本均相同： http://imagizer.imageshack.us/v2/800x600q90/713/m1j6.jpg 编辑：所以这适用于
如何在帖子标题下放置特色图片

我找到了这个解决方案，可以让特色图片默认位于文章标题下方。wordpress在帖子标题下添加特色图片我把这个php放在哪里？它是否适用于所有职位？
在wordpress中的帖子标题旁边添加视频/图库图标

我想在标题旁边添加一个视频图标/图库图标，其中包含视频/照片库。我在最新的wordpress上制作我自己的自定义主题。目前我正在使用对于标题。我在另一个主题（不是我的）之前有视频图标的标题。我在functions.php找到的密码这使得文章标题上的span类包含
如何在wordpress中使用WP_query搜索帖子标题和标签？

我目前正在实现一个搜索功能来从wordpress搜索帖子我想通过标签、标题来搜索文章，并以JSON的形式返回我发现了一个将结果生成为JSON（JSON API）的插件 http://wordpress.org/plugins/json-api/other_notes/#2.1.-Core-controller-methods 在该插件中，有一个get_posts函数，可以返回支持WP_quer
Wordpress设置图像下一个新帖子标题

我有代码：此代码列出了特定类别的最后5篇文章。看起来是这样的：它还提供了特定帖子发布的时间，正如你从上面的图片中看到的。我想做的是在x小时/天的帖子旁边有图像，它看起来像这样：但是我没有想到正确的方法去做，甚至从我应该开始，我也没有找到任何接近我需要的东西，也许有人可以给我一个例子，从我应该开始？谢谢。

处理 Jekyll 内容以将任何帖子标题的首次出现替换为具有该标题的帖子的超链接

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档