Create xml sitemaps from the command line.
Generates a sitemap by crawling your site. Uses streams to efficiently write the sitemap to your drive. Is cappable of creating multiple sitemaps if threshold is reached. Respects robots.txt and meta tags.
This module is available on npm.
npm install -g sitemap-generator-cli
# or execute it directly with npx (since npm v5.2)
npx sitemap-generator-cli https://example.com
The crawler will fetch all folder URL pages and file types parsed by Google. If present the robots.txt
will be taken into account and possible rules are applied for each URL to consider if it should be added to the sitemap. Also the crawler will not fetch URL's from a page if the robots meta tag with the value nofollow
is present and ignore them completely if noindex
rule is present. The crawler is able to apply the base
value to found links.
sitemap-generator [options] <url>
When the crawler finished the XML Sitemap will be built and saved to your specified filepath. If the count of fetched pages is greater than 50000 it will be splitted into several sitemap files and create a sitemapindex file. Google does not allow more than 50000 items in one sitemap.
Example:
sitemap-generator http://example.com
sitemap-generator --help
Usage: cli [options] <url>
Options:
-V, --version output the version number
-f, --filepath <filepath> path to file including filename (default: sitemap.xml)
-m, --max-entries <maxEntries> limits the maximum number of URLs per sitemap file (default: 50000)
-d, --max-depth <maxDepth> limits the maximum distance from the original request (default: 0)
-q, --query consider query string
-u, --user-agent <agent> set custom User Agent
-v, --verbose print details when crawling
-c, --max-concurrency <maxConcurrency> maximum number of requests the crawler will run simultaneously (default: 5)
-r, --no-respect-robots-txt controls whether the crawler should respect rules in robots.txt
-l, --last-mod add Last-Modified header to xml
-g, --change-freq <changeFreq> adds a <changefreq> line to each URL in the sitemap.
-p, --priority-map <priorityMap> priority for each depth url, values between 1.0 and 0.0, example: "1.0,0.8 0.6,0.4"
-x, --proxy <url> Use the passed proxy URL
-h, --help output usage information
Path to file to write including the filename itself. Path can be absolute or relative. Default is sitemap.xml
.
Examples:
sitemap.xml
mymap.xml
/var/www/sitemap.xml
./sitemap.myext
Sets the maximum number of requests the crawler will run simultaneously (default: 5).
Define a limit of URLs per sitemap files, useful for site with lots of urls. Defaults to 50000.
Set a maximum distance from the original request to crawl URLs, useful for generating smaller sitemap.xml
files. Defaults to 0, which means it will crawl all levels.
Controls whether the crawler should respect rules in robots.txt.
Consider URLs with query strings like http://www.example.com/?foo=bar
as individual sites and add them to the sitemap.
Set a custom User Agent used for crawling. Default is Node/SitemapGenerator
.
Print debug messages during crawling process. Also prints out a summery when finished.
add Last-Modified header to xml
adds a line to each URL in the sitemap.
add priority for each depth url, values between 1.0 and 0.0, example: "1.0,0.8 0.6,0.4"
URL Last Change Change Frequency Priority 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 2021-03-12 04:25:36 Always 1.0 202
原来看到Google的Sitemap,不是很在意。虽然觉得这确实是搜索引擎偷懒的好办法,不用bot辛苦地去每个页面搜索了。但是要让用户主动提交内容,要有很大的号召力才行,否则很难成为标准。另外,创建Sitemap还挺麻烦的,一般的小站长估计不太容易学会。 事实上,Google的Sitemap也还不是很流行,特别在国内。 不过最近Google、微软与Yahoo最近达成协议,将使用统一的Sitemap
Sitemap(站点地图)是一种文件,站长可通过该文件列出网站上的网页,将网站内容的组织结构告知搜索引擎。神马等搜索引擎网页抓取工具会读取此文件,以便更加智能地抓取网站内容 什么是sitmap,sitemap是干嘛的,怎么生成 方法/步骤 Sitemap(站点地图)是一种文件,站长可通过该文件列出网站上的网页,将网站内容的组织结构告知搜索引擎。神马等搜索引擎网页抓取工具会读取此文件,以便更加智能地
介绍: https://support.google.com/webmasters/answer/75712?hl=zh-Hans 实例: http://www.thepaper.cn/sitemap.xml <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.th
http://blog.csdn.net/thinkpadleo/ 12 Always 1.0 http://blog.csdn.net/ 12 Always 1.0 http://blog.csdn.net/thinkpadleo/ 12 Always 1.0 http://blog.csdn.net/thinkpadleo/contact.aspx 12 Always 1.0 http://b
该插件用于为你的网站创建站点地图,以便搜索引擎爬虫可以更准确地对您的网站进行爬取。 安装 npm Yarn npm install --save @docusaurus/plugin-sitemap yarn add @docusaurus/plugin-sitemap tip 如果你已经安装了 @docusaurus/preset-classic,则不需要再安装此插件。你也可以通过 cl
Laravelium Sitemap package Laravelium Sitemap generator for Laravel. Notes Dev Branches are for development and are UNSTABLE (use on your own risk)! Installation Run the following command and provide
sitemap.xml 站点描述文件生成器,只需要配置站点及其需过滤的目录即可。 该组件是 果凡网 www.gonvan.com 基于htmlParser编写,可直接部署到站点下。 <dependency> <groupId>com.gonvan</groupId> <artifactId>sitemap</artifactId> <version>0.1.1.RELEASE</ver
What is sitemap-php ? sitemap-php 是一个轻量级、简单快速生成网站地图的开源项目,由北京米扑科技有限公司(mimvp.com)开发分享。 通过简单的配置定义,一个函数createSitemap(),可自动生成sitemap.xml、sitemap.html等网站地图文件, 自动生成的xml、html文件,支持Google、Bing、Baidu等主流搜索引擎收录。 F
常规函数只会返回一个单一值(或者不返回任何值)。 而 Generator 可以按需一个接一个地返回(“yield”)多个值。它们可与 iterable 完美配合使用,从而可以轻松地创建数据流。 Generator 函数 要创建一个 generator,我们需要一个特殊的语法结构:function*,即所谓的 “generator function”。 它看起来像这样: function* gene
generator(生成器)是ES6标准引入的新的数据类型。一个generator看上去像一个函数,但可以返回多次。 ES6定义generator标准的哥们借鉴了Python的generator的概念和语法,如果你对Python的generator很熟悉,那么ES6的generator就是小菜一碟了。如果你对Python还不熟,赶快恶补Python教程!。 我们先复习函数的概念。一个函数是一段完整