静态网页的AngularJS SEO（S3 CDN）

蒋畅

2023-03-14

问题内容：

我一直在寻找方法来改善托管在CDN（如Amazon
S3）上的angularJS应用的SEO（即，没有后端的简单存储）。那里的大多数解决方案（PhantomJS，prerender.io，seo.js等）都依赖后端来识别搜寻?_escaped_fragment_器生成的url，然后从其他地方获取相关页面。即使grunt-
html-snapshot
最终也需要您执行此操作，即使您提前生成快照页面也是如此。

该解决方案基本上依赖于使用cloudflare作为反向代理，这似乎有点浪费，因为其服务提供的大多数安全设备等对于静态站点都是完全冗余的。鉴于我自己在这里建议设置反向代理也似乎有问题，因为它可能需要i）通过一个代理服务器路由所有我需要静态html的AngularJS应用，这可能会影响性能，或者ii）为每个应用设置单独的代理服务器，那么我不妨建立一个后端，这在我正在努力的规模上负担不起。

无论如何，还是在Google更新其抓取工具之前，基本上不可能以出色的SEO静态托管AngularJS应用？

在约翰·康德（JohnConde）的评论后，重新发布在网站管理员上。

问题答案：

以下是有关如何使您的应用程序在存储服务（例如S3）上具有SEO友好性的完整概述，该URL具有漂亮的url（无＃），并且在构建后将通过简单的命令执行所有带有grunt的操作：

grunt seo

这仍然是解决方法的难题，但是它正在工作，并且是您可以做的最好的事情。感谢@ericluwj和他的博客文章启发了我。

总览

目标和网址结构

目标是在您的角度应用程序中为每个状态创建1个html文件。唯一的主要假设是，您可以使用html5history（您应该这样做！）从网址中删除“＃”，并且所有路径都是绝对路径或使用角度状态。有很多文章解释了如何去做。

网址末尾有这样一个斜线 http://yourdomain.com/page1/

我个人确保http://yourdomain.com/page1（不带
斜杠）也到达了目的地，但这不在这里。我还确保每种语言都有不同的状态和不同的URL。

SEO逻辑

我们的目标是当有人通过http请求访问您的网站时：

如果它是搜索引擎搜寻器：将他放在包含所需html的页面上。该页面还包含角度逻辑（例如，启动您的应用程序），但是爬网程序无法读取该内容，因此，他有意卡在您提供给他的html上并将对其进行索引。
对于普通的人类和智能机器：确保激活了Angle，清除生成的html并正常启动您的应用

艰巨的任务

这里我们执行艰巨的任务：

  //grunt plugins you will need:
  grunt.loadNpmTasks('grunt-prerender');
  grunt.loadNpmTasks('grunt-replace');
  grunt.loadNpmTasks('grunt-wait');
  grunt.loadNpmTasks('grunt-aws-s3');

  //The grunt tasks in the right order
  grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
    grunt.task.run([
      'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
    ]);
  });

  grunt.registerTask('seotasks', [
    'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
    'wait', // wait 1.5 sec to make sure that server is launched
    'prerender', //Step 2: create a snapshot of your website
    'replace', //Step 3: clean the mess
    'sitemap', //Create a sitemap of your production environment
    'aws_s3:dev' //Step 4: upload
  ]);

步骤1：使用并发：seo启动本地服务器

首先，我们需要启动本地服务器（例如grunt服务），以便可以拍摄网站快照。

//grunt config
concurrent: {
  seo: [
    'connect:dist:keepalive', //Launching a server and keeping it alive
    'seotasks' //now that we have a running server we can launch the SEO tasks
  ]
}

第2步：使用grunt prerender创建网站快照

grunt-prerender插件使您可以使用PhantomJS拍摄任何网站的快照。在我们的案例中，我们希望对刚启动的localhost网站的所有页面进行快照。

//grunt config
prerender: {
  options: {
    sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
    //As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
    urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
    hashed: true,
    dest: 'dist/SEO/',//where your static html files will be stored
    timeout:5000,
    interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
    phantomScript:'basic',
    limit:7 //# pages processed simultaneously 
  }
}

第三步：用咕replace声替换清理混乱

如果您打开预渲染的文件，它们将适用于搜寻器，但不适用于人类。对于使用chrome的用户，您的指令将加载两次。因此，您需要在激活angular 之前
（即，紧随头部之后）将智能浏览器重定向到您的主页。

//Add the script tag to redirect if we're not a search bot
replace: {
  dist: {
    options: {
      patterns: [
        {
          match: '<head>',
          //redirect to a clean page if not a bot (to your index.html at the root basically).
          replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
          //note: your hashbang (#) will still work.
        }
      ],
      usePrefix: false
    },
    files: [
      {expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''} 
    ]
  }

还要确保在ui-view元素的index.html中包含此代码，该代码会在angular开始之前清除所有生成的html指令。

<div ui-view autoscroll="true" id="ui-view"></div>

<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script> 
  if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>

步骤4：上传至AWS

您首先上传包含构建的dist文件夹。然后，用您预渲染和更新的文件覆盖它。

aws_s3: {
  options: {
    accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
    secretAccessKey: "<%= aws.secret %>", // You can also use env variables
    region: 'eu-west-1',
    uploadConcurrency: 5, // 5 simultaneous uploads
  },
  dev: {
    options: {
      bucket: 'xxxxxxxx'
    },
    files: [
      {expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
      {expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
    ]
  }
}

就是这样，您有解决方案！人类和机器人都将能够阅读您的网络应用

静态网页的AngularJS SEO（S3 CDN）

总览

步骤1：使用并发：seo启动本地服务器

第2步：使用grunt prerender创建网站快照

第三步：用咕replace声替换清理混乱

步骤4：上传至AWS

相关阅读

相关文章

相关问答

相关工具

相关文档