sitemap-generator-cli

Creates an XML-Sitemap by crawling a given site.
授权协议 MIT License
开发语言 JavaScript
所属分类 应用工具、 终端/远程登录
软件类型 开源软件
地区 不详
投 递 者 翟奕
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Sitemap Generator CLI

Create xml sitemaps from the command line.

Generates a sitemap by crawling your site. Uses streams to efficiently write the sitemap to your drive. Is cappable of creating multiple sitemaps if threshold is reached. Respects robots.txt and meta tags.

Table of contents

Install

This module is available on npm.

npm install -g sitemap-generator-cli
# or execute it directly with npx (since npm v5.2)
npx sitemap-generator-cli https://example.com

Usage

The crawler will fetch all folder URL pages and file types parsed by Google. If present the robots.txt will be taken into account and possible rules are applied for each URL to consider if it should be added to the sitemap. Also the crawler will not fetch URL's from a page if the robots meta tag with the value nofollow is present and ignore them completely if noindex rule is present. The crawler is able to apply the base value to found links.

sitemap-generator [options] <url>

When the crawler finished the XML Sitemap will be built and saved to your specified filepath. If the count of fetched pages is greater than 50000 it will be splitted into several sitemap files and create a sitemapindex file. Google does not allow more than 50000 items in one sitemap.

Example:

sitemap-generator http://example.com

Options

sitemap-generator --help

  Usage: cli [options] <url>

  Options:

    -V, --version                           output the version number
    -f, --filepath <filepath>               path to file including filename (default: sitemap.xml)
    -m, --max-entries <maxEntries>          limits the maximum number of URLs per sitemap file (default: 50000)
    -d, --max-depth <maxDepth>              limits the maximum distance from the original request (default: 0)
    -q, --query                             consider query string
    -u, --user-agent <agent>                set custom User Agent
    -v, --verbose                           print details when crawling
    -c, --max-concurrency <maxConcurrency>  maximum number of requests the crawler will run simultaneously (default: 5)
    -r, --no-respect-robots-txt             controls whether the crawler should respect rules in robots.txt
    -l, --last-mod                          add Last-Modified header to xml
    -g, --change-freq <changeFreq>          adds a <changefreq> line to each URL in the sitemap.
    -p, --priority-map <priorityMap>        priority for each depth url, values between 1.0 and 0.0, example: "1.0,0.8 0.6,0.4"
    -x, --proxy <url>                       Use the passed proxy URL
    -h, --help                              output usage information

filepath

Path to file to write including the filename itself. Path can be absolute or relative. Default is sitemap.xml.

Examples:

  • sitemap.xml
  • mymap.xml
  • /var/www/sitemap.xml
  • ./sitemap.myext

maxConcurrency

Sets the maximum number of requests the crawler will run simultaneously (default: 5).

maxEntries

Define a limit of URLs per sitemap files, useful for site with lots of urls. Defaults to 50000.

maxDepth

Set a maximum distance from the original request to crawl URLs, useful for generating smaller sitemap.xml files. Defaults to 0, which means it will crawl all levels.

noRespectRobotsTxt

Controls whether the crawler should respect rules in robots.txt.

query

Consider URLs with query strings like http://www.example.com/?foo=bar as individual sites and add them to the sitemap.

user-agent

Set a custom User Agent used for crawling. Default is Node/SitemapGenerator.

verbose

Print debug messages during crawling process. Also prints out a summery when finished.

last-mod

add Last-Modified header to xml

change-freq

adds a line to each URL in the sitemap.

priority-map

add priority for each depth url, values between 1.0 and 0.0, example: "1.0,0.8 0.6,0.4"

License

MIT © Lars Graubner

  • URL Last Change Change Frequency Priority 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 202

  • 原来看到Google的Sitemap,不是很在意。虽然觉得这确实是搜索引擎偷懒的好办法,不用bot辛苦地去每个页面搜索了。但是要让用户主动提交内容,要有很大的号召力才行,否则很难成为标准。另外,创建Sitemap还挺麻烦的,一般的小站长估计不太容易学会。 事实上,Google的Sitemap也还不是很流行,特别在国内。 不过最近Google、微软与Yahoo最近达成协议,将使用统一的Sitemap

  • Sitemap(站点地图)是一种文件,站长可通过该文件列出网站上的网页,将网站内容的组织结构告知搜索引擎。神马等搜索引擎网页抓取工具会读取此文件,以便更加智能地抓取网站内容 什么是sitmap,sitemap是干嘛的,怎么生成 方法/步骤 Sitemap(站点地图)是一种文件,站长可通过该文件列出网站上的网页,将网站内容的组织结构告知搜索引擎。神马等搜索引擎网页抓取工具会读取此文件,以便更加智能地

  • 介绍: https://support.google.com/webmasters/answer/75712?hl=zh-Hans 实例: http://www.thepaper.cn/sitemap.xml <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.th

  • http://blog.csdn.net/thinkpadleo/ 12 Always 1.0 http://blog.csdn.net/ 12 Always 1.0 http://blog.csdn.net/thinkpadleo/ 12 Always 1.0 http://blog.csdn.net/thinkpadleo/contact.aspx 12 Always 1.0 http://b

 相关资料
  • 该插件用于为你的网站创建站点地图,以便搜索引擎爬虫可以更准确地对您的网站进行爬取。 安装 npm Yarn npm install --save @docusaurus/plugin-sitemap yarn add @docusaurus/plugin-sitemap tip 如果你已经安装了 @docusaurus/preset-classic,则不需要再安装此插件。你也可以通过 cl

  • Laravelium Sitemap package Laravelium Sitemap generator for Laravel. Notes Dev Branches are for development and are UNSTABLE (use on your own risk)! Installation Run the following command and provide

  • sitemap.xml 站点描述文件生成器,只需要配置站点及其需过滤的目录即可。 该组件是 果凡网 www.gonvan.com 基于htmlParser编写,可直接部署到站点下。 <dependency>   <groupId>com.gonvan</groupId>   <artifactId>sitemap</artifactId>   <version>0.1.1.RELEASE</ver

  • What is sitemap-php ? sitemap-php 是一个轻量级、简单快速生成网站地图的开源项目,由北京米扑科技有限公司(mimvp.com)开发分享。 通过简单的配置定义,一个函数createSitemap(),可自动生成sitemap.xml、sitemap.html等网站地图文件, 自动生成的xml、html文件,支持Google、Bing、Baidu等主流搜索引擎收录。 F

  • 常规函数只会返回一个单一值(或者不返回任何值)。 而 Generator 可以按需一个接一个地返回(“yield”)多个值。它们可与 iterable 完美配合使用,从而可以轻松地创建数据流。 Generator 函数 要创建一个 generator,我们需要一个特殊的语法结构:function*,即所谓的 “generator function”。 它看起来像这样: function* gene

  • generator(生成器)是ES6标准引入的新的数据类型。一个generator看上去像一个函数,但可以返回多次。 ES6定义generator标准的哥们借鉴了Python的generator的概念和语法,如果你对Python的generator很熟悉,那么ES6的generator就是小菜一碟了。如果你对Python还不熟,赶快恶补Python教程!。 我们先复习函数的概念。一个函数是一段完整