当前位置: 首页 > 工具软件 > SeaweedFS > 使用案例 >




Seaweedfs (https://github.com/chrislusf/seaweedfs)
Seaweedfs 的设计原理是基于 Facebook 的一篇图片存储系统的论文 Facebook-Haystack 说到这个,毛剑也在依这个论文写bfs, 正在开发中,可以跟看从小到大一步步完善的过程。FastDFS与Seaweedfs这类文件系统,就是为了存海量小文件而专门设计的。非常适用做APP的后台文件存储系统。用哪个看个人选择,要我,我会偏向Seaweedfs一些,
它的文件hash值算法是公开的,可以一看. https://github.com/qiniu/qetag 选方案时,看个人考量了。

目前业界最流行的PB级的开源的解决方案,主要有两种: Swift与Ceph。




./weed -h # to check available options

The commands are:

benchmark   benchmark on writing millions of files and read out
backup      incrementally backup a volume to local folder   备份
compact     run weed tool compact on volume file   压缩
filer.copy  copy one or a list of files to a filer folder
fix         run weed tool fix on index file if corrupted
server      start a server, including volume server, and automatically elect a master server
master      start a master server
filer       start a file server that points to a master server
upload      upload one or a list of files
download    download files by file id
shell       run interactive commands, now just echo
version     print SeaweedFS version
volume      start a volume server
export      list or export files from one volume data file
mount       mount weed filer to a directory as file system in userspace(FUSE)

Use “weed help [command]” for more information about a command.

For Logging, use “weed [logging_options] [command]”. The logging options are:
log to standard error as well as files (default true)
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
log to standard error instead of files
-stderrthreshold value
logs at or above this threshold go to stderr
-v value
log level for V logs
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging

weed master -h 提供卷=>位置映射服务和文件id的序号
Example: weed master -port=9333
Default Usage:
-cpuprofile string
cpu profile output file
-defaultReplication string
Default replication type if not specified. (default “000”)
-garbageThreshold string
threshold to vacuum and reclaim spaces (default “0.3”)
-ip string
master | address (default “localhost”)
-ip.bind string
ip address to bind to (default “”)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-mdir string 选项用于配置保存生成的序列文件id的文件夹
data directory to store meta data (default “C:\Users\wangz\AppData\Local\Temp”)
-peers string
other master nodes in comma separated ip:port list, example:,
-port int
http listen port (default 9333)
-pulseSeconds int
number of seconds between heartbeats (default 5)
-secure.secret string
secret to encrypt Json Web Token(JWT)
Preallocate disk space for volumes.
-volumeSizeLimitMB uint
Master stops directing writes to oversized volumes. (default 30000)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
start a master server to provide volume=>location mapping service
and sequence number of file ids


weed volume -h 提供存储空间

Example: weed volume -port=8080 -dir=/tmp -max=5 -ip=server_name -mserver=localhost:9333
Default Usage:
direct cache instead of OS cache, cost more memory.
-dataCenter string
current volume server’s data center name
-dir string
directories to store data files. dir[,dir]… (default “C:\Users\wangz\AppData\Local\Temp”)
-idleTimeout int
connection idle seconds (default 30)
Adjust jpg orientation when uploading. (default true)
-index string
Choose [memory|leveldb|boltdb|btree] mode for memory~performance balance. (default “memory”)
-ip string
ip or server name
-ip.bind string
ip address to bind to (default “”)
-max string
maximum numbers of volumes, count[,count]… (default “7”)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-mserver string
master server location (default “localhost:9333”)
-port int
http listen port (default 8080)
-port.public int
port opened to public
-publicUrl string
Publicly accessible address
-pulseSeconds int
number of seconds between heartbeats, must be smaller than or equal to the master’s setting (default 5)
-rack string
current volume server’s rack name
Redirect moved or non-local volumes. (default true)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
start a volume server to provide storage spaces


weed server -h 这是作为一种方便的方式来启动卷服务器和主服务器。这些服务器与分别启动它们完全相同。
Example: weed server -port=8080 -dir=/tmp -volume.max=5 -ip=server_name
Default Usage:
-cpuprofile string
cpu profile output file
-dataCenter string
current volume server’s data center name
-dir string
directories to store data files. dir[,dir]… (default “C:\Users\wangz\AppData\Local\Temp”)
whether to start filer
-filer.cassandra.keyspace string
keyspace of the cassandra server (default “seaweed”)
-filer.cassandra.server string
host[:port] of the cassandra server
-filer.collection string
all data will be stored in this collection
-filer.confFile string
json encoded filer conf file
-filer.defaultReplicaPlacement string
Default replication type if not specified during runtime.
-filer.dir string
directory to store meta data, default to a ‘filer’ sub directory of what -dir is specified
turn off directory listing
-filer.master string
default to current master server
-filer.maxMB int
split files larger than the limit
-filer.port int
filer server http listen port (default 8888)
-filer.port.public int
filer server public http listen port
whether proxy or redirect to volume server during file GET request
-filer.redis.database int
the database on the redis server
-filer.redis.password string
redis password in clear text
-filer.redis.server string
host:port of the redis server, e.g.,
-garbageThreshold string
threshold to vacuum and reclaim spaces (default “0.3”)
-idleTimeout int
connection idle seconds (default 30)
-ip string
ip or server name (default “localhost”)
-ip.bind string
ip address to bind to (default “”)
-master.defaultReplicaPlacement string
Default replication type if not specified. (default “000”)
-master.dir string
data directory to store meta data, default to same as -dir specified
-master.peers string
other master nodes in comma separated ip:masterPort list
-master.port int
master server http listen port (default 9333)
Preallocate disk space for volumes.
-master.volumeSizeLimitMB uint
Master stops directing writes to oversized volumes. (default 30000)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-pulseSeconds int
number of seconds between heartbeats (default 5)
-rack string
current volume server’s rack name
-secure.secret string
secret to encrypt Json Web Token(JWT)
direct cache instead of OS cache, cost more memory.
Adjust jpg orientation when uploading. (default true)
-volume.index string
Choose [memory|leveldb|boltdb|btree] mode for memory~performance balance. (default “memory”)
-volume.max string
maximum numbers of volumes, count[,count]… (default “7”)
-volume.port int
volume server http listen port (default 8080)
-volume.port.public int
volume server public port
-volume.publicUrl string
publicly accessible address
Redirect moved or non-local volumes. (default true)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
start both a volume server to provide storage spaces
and a master server to provide volume=>location mapping service and sequence number of file ids

This is provided as a convenient way to start both volume server and master server.
The servers are exactly the same as starting them separately.

So other volume servers can use this embedded master server also.

Optionally, one filer server can be started. Logically, filer servers should not be in a cluster.
They run with meta data on disk, not shared. So each filer server is different.

Master Server API

您可以将&pretty=y附加到任何HTTP API,以查看格式化的json输出。
1.分配一个file key
curl http://localhost:9333/dir/assign
指定的数据中心 ?dataCenter=dc1

curl “http://localhost:9333/dir/lookup?volumeId=3&pretty=y
指定集合会让速度更快 curl “http://localhost:9333/dir/lookup?volumeId=3&collection=turbo

curl “http://localhost:9333/vol/vacuum
curl “http://localhost:9333/vol/vacuum?garbageThreshold=0.4” garbageThreshold=0.4是可选的,不会更改默认阈值。您可以使用不同的默认garbageThreshold启动卷主。


specify a specific replication

curl “http://localhost:9333/vol/grow?replication=000&count=4

specify a collection

curl “http://localhost:9333/vol/grow?collection=turbo&count=4

specify data center

curl “http://localhost:9333/vol/grow?dataCenter=dc1&count=4

specify ttl

curl “http://localhost:9333/vol/grow?ttl=5d&count=4

5.Delete Collection
curl “http://localhost:9333/col/delete?collection=benchmark&pretty=y

6.Check System Status
curl “
curl “http://localhost:9333/dir/status?pretty=y


Volume Server API

curl -F file=@/home/chris/myphoto.jpg,01637037d6
{“size”: 43234}

curl -F file=@/home/chris/myphoto.jpg http://localhost:9333/submit

curl -X DELETE,01637037d6


5.Check Volume Server Status
curl “http://localhost:8080/status?pretty=y


Filer Server API

Basic Usage:

curl -F file=@report.js “http://localhost:8888/javascript/
curl “http://localhost:8888/javascript/report.js” # get the file content

curl -F file=@report.js “http://localhost:8888/javascript/new_name.js” # upload the file with a different name
curl -H “Accept: application/json” “http://localhost:8888/javascript/?pretty=y” # list all files under /javascript/
“Directory”: “/javascript/”,
“Files”: [
“name”: “new_name.js”,
“fid”: “3,034389657e”
“name”: “report.js”,
“fid”: “7,0254f1f3fd”
“Subdirectories”: null

curl “http://localhost:8888/javascript/?pretty=y&lastFileName=new_name.js&limit=2
“Directory”: “/javascript/”,
“Files”: [
“name”: “report.js”,
“fid”: “7,0254f1f3fd”

curl -X DELETE “http://localhost:8888/assets/report.js


客户端 https://github.com/linxGnu/goseaweedfs

C:\Users\wangz>weed filer -h
Example: weed filer -port=8888 -dir=/tmp -master=ip:port
Default Usage:
-cassandra.keyspace string
keyspace of the cassandra server (default “seaweed”)
-cassandra.server string
host[:port] of the cassandra server
-collection string
all data will be stored in this collection
-confFile string
json encoded filer conf file
-defaultReplicaPlacement string
default replication type if not specified (default “000”)
-dir string
directory to store meta data (default “C:\Users\wangz\AppData\Local\Temp”)
turn off directory listing
-ip string
filer server http listen ip address
-master string
master server location (default “localhost:9333”)
-maxMB int
split files larger than the limit
-port int
filer server http listen port (default 8888)
-port.public int
port opened to public
whether proxy or redirect to volume server during file GET request
-redis.database int
the database on the redis server
-redis.password string
password in clear text
-redis.server string
host:port of the redis server, e.g.,
-secure.secret string
secret to encrypt Json Web Token(JWT)
start a file server which accepts REST operation for any files.

    //create or overwrite the file, the directories /path/to will be automatically created
    POST /path/to/file
    //get the file content
    GET /path/to/file
    //create or overwrite the file, the filename in the multipart request will be used
    POST /path/to/
    //return a json format subdirectory and files listing
    GET /path/to/

Current <fullpath~fileid> mapping metadata store is local embedded leveldb.
It should be highly scalable to hundreds of millions of files on a modest machine.

Future we will ensure it can avoid of being SPOF.

Filer Setup
weed scaffold -config filer -output="."
weed scaffold filer
enabled = true
dir = “.” # directory to store level db files

weed filer

POST a file and read it back

curl -F "filename=@README.md" “http://localhost:8888/path/to/sources/
curl “http://localhost:8888/path/to/sources/README.md

POST a file with a new name and read it back

curl -F “filename=@Makefile” “http://localhost:8888/path/to/sources/new_name
curl “http://localhost:8888/path/to/sources/new_name

list sub folders and files

visit “http://localhost:8888/path/to/sources/

if lots of files under this folder, here is a way to efficiently paginate through all of them

visit “http://localhost:8888/path/to/sources/?lastFileName=abc.txt&limit=50



