In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
系统调用是程序向操作系统内核请求服务的过程,通常包含硬件相关的服务(例如访问硬盘),创建新进程等。系统调用提供了一个进程和操作系统之间的接口。
syscall包包含一个指向底层操作系统原语的接口。
注意:该软件包已被锁定。标准以外的代码应该被迁移到golang.org/x/sys存储库中使用相应的软件包。这也是应用新系统或版本所需更新的地方。 Signal , Errno 和 SysProcAttr 在 golang.org/x/sys 中尚不可用,并且仍然必须从 syscall 程序包中引用。有关更多信息,请参见 https://golang.org/s/go1.4-syscall。
https://pkg.go.dev/golang.org/x/sys
该存储库包含用于与操作系统进行低级交互的补充Go软件包。
C 语言标准库中不少都是对操作系统提供的系统调用的封装,比如大家耳熟能详的 printf, gets, fopen 等,就分别是对 read, write, open 这些系统调用的封装。使用 ltrace 来追踪调用就可以清楚地看到这一点,例如:
#include <stdio.h>
/* The well-known "Hello World" */
int main(void) {
printf("Hello World!\n");
}
对于上面这段代码编译后使用 ltrace 调试,即可得到如下输出:
name1e5s@asgard:~$ gcc test.c
name1e5s@asgard:~$ ltrace -S ./a.out
SYS_brk(0) = 0x55eb2abba000
SYS_access("/etc/ld.so.nohwcap", 00) = -2
SYS_access("/etc/ld.so.preload", 04) = -2
SYS_openat(0xffffff9c, 0x7f2290c00428, 0x80000, 0) = 3
SYS_fstat(3, 0x7ffd2e03aa20) = 0
SYS_mmap(0, 0x21b06, 1, 2) = 0x7f2290de4000
SYS_close(3) = 0
SYS_access("/etc/ld.so.nohwcap", 00) = -2
SYS_openat(0xffffff9c, 0x7f2290e08dd0, 0x80000, 0) = 3
SYS_read(3, "\177ELF\002\001\001\003", 832) = 832
SYS_fstat(3, 0x7ffd2e03aa80) = 0
SYS_mmap(0, 8192, 3, 34) = 0x7f2290de2000
SYS_mmap(0, 0x3f0ae0, 5, 2050) = 0x7f22907ee000
SYS_mprotect(0x7f22909d5000, 2097152, 0) = 0
SYS_mmap(0x7f2290bd5000, 0x6000, 3, 2066) = 0x7f2290bd5000
SYS_mmap(0x7f2290bdb000, 0x3ae0, 3, 50) = 0x7f2290bdb000
SYS_close(3) = 0
SYS_arch_prctl(4098, 0x7f2290de34c0, 0x7f2290de3e00, 0x7f2290de2988) = 0
SYS_mprotect(0x7f2290bd5000, 16384, 1) = 0
SYS_mprotect(0x55eb28ecf000, 4096, 1) = 0
SYS_mprotect(0x7f2290e06000, 4096, 1) = 0
SYS_munmap(0x7f2290de4000, 137990) = 0
puts("Hello World!" <unfinished ...>
SYS_fstat(1, 0x7ffd2e03b280) = 0
SYS_brk(0) = 0x55eb2abba000
SYS_brk(0x55eb2abdb000) = 0x55eb2abdb000
SYS_write(1, "Hello World!\n", 13Hello World!
) = 13
<... puts resumed> ) = 13
SYS_exit_group(0 <no return ...>
+++ exited (status 0) +++
其中 SYS_ 开头的均为系统调用,可见系统调用几乎是无处不在。
举个最常用的例子, fmt.Println(“hello world”), 这里就用到了系统调用 write, 我们翻一下源码。
func Println(a ...interface{}) (n int, err error) {
return Fprintln(os.Stdout, a...)
}
Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout")
func (f *File) write(b []byte) (n int, err error) {
if len(b) == 0 {
return 0, nil
}
// 实际的write方法,就是调用syscall.Write()
return fixCount(syscall.Write(f.fd, b))
}
strace 是用于查看进程系统调用的工具, 一般使用方法如下:
strace -c 用于统计各个系统调用的次数
[root@localhost ~]# strace -c echo hello
hello
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 1 read
0.00 0.000000 0 1 write
0.00 0.000000 0 3 open
0.00 0.000000 0 5 close
0.00 0.000000 0 4 fstat
0.00 0.000000 0 9 mmap
0.00 0.000000 0 4 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 4 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 36 1 total
[root@localhost ~]#
stace 的实现原理是系统调用 ptrace, 我们来看下 ptrace 是什么。
man page 描述如下:
The ptrace() system call provides a means by which one process (the “tracer”) may observe and control the execution of another process (the “tracee”), and examine and change the tracee’s memory and registers. It is primarily used to implement breakpoint debuggingand system call tracing.
简单来说有三大能力:
追踪系统调用
读写内存和寄存器
向被追踪程序传递信号
ptrace接口:
int ptrace(int request, pid_t pid, caddr_t addr, int data);
request包含:
PTRACE_ATTACH
PTRACE_SYSCALL
PTRACE_PEEKTEXT, PTRACE_PEEKDATA
等
tracer 使用 PTRACE_ATTACH 命令,指定需要追踪的PID。紧接着调用 PTRACE_SYSCALL。
tracee 会一直运行,直到遇到系统调用,内核会停止执行。 此时,tracer 会收到 SIGTRAP 信号,tracer 就可以打印内存和寄存器中的信息了。
接着,tracer 继续调用 PTRACE_SYSCALL, tracee 继续执行,直到 tracee退出当前的系统调用。
需要注意的是,这里在进入syscall和退出syscall时,tracer都会察觉。
了解以上内容后,presenter 现场实现了一个go版本的strace, 需要在 linux amd64 环境编译。
https://github.com/silentred/gosys
// strace.go
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
func main() {
var err error
var regs syscall.PtraceRegs
var ss syscallCounter
ss = ss.init()
fmt.Println("Run: ", os.Args[1:])
cmd := exec.Command(os.Args[1], os.Args[2:]...)
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.Stdin = os.Stdin
cmd.SysProcAttr = &syscall.SysProcAttr{
Ptrace: true,
}
cmd.Start()
err = cmd.Wait()
if err != nil {
fmt.Printf("Wait err %v \n", err)
}
pid := cmd.Process.Pid
exit := true
for {
// 记得 PTRACE_SYSCALL 会在进入和退出syscall时使 tracee 暂停,所以这里用一个变量控制,RAX的内容只打印一遍
if exit {
err = syscall.PtraceGetRegs(pid, ®s)
if err != nil {
break
}
//fmt.Printf("%#v \n",regs)
name := ss.getName(regs.Orig_rax)
fmt.Printf("name: %s, id: %d \n", name, regs.Orig_rax)
ss.inc(regs.Orig_rax)
}
// 上面Ptrace有提到的一个request命令
err = syscall.PtraceSyscall(pid, 0)
if err != nil {
panic(err)
}
// 猜测是等待进程进入下一个stop,这里如果不等待,那么会打印大量重复的调用函数名
_, err = syscall.Wait4(pid, nil, 0, nil)
if err != nil {
panic(err)
}
exit = !exit
}
ss.print()
}
// 用于统计信息的counter, syscallcounter.go
package main
import (
"fmt"
"os"
"text/tabwriter"
"github.com/seccomp/libseccomp-golang"
)
type syscallCounter []int
const maxSyscalls = 303
func (s syscallCounter) init() syscallCounter {
s = make(syscallCounter, maxSyscalls)
return s
}
func (s syscallCounter) inc(syscallID uint64) error {
if syscallID > maxSyscalls {
return fmt.Errorf("invalid syscall ID (%x)", syscallID)
}
s[syscallID]++
return nil
}
func (s syscallCounter) print() {
w := tabwriter.NewWriter(os.Stdout, 0, 0, 8, ' ', tabwriter.AlignRight|tabwriter.Debug)
for k, v := range s {
if v > 0 {
name, _ := seccomp.ScmpSyscall(k).GetName()
fmt.Fprintf(w, "%d\t%s\n", v, name)
}
}
w.Flush()
}
func (s syscallCounter) getName(syscallID uint64) string {
name, _ := seccomp.ScmpSyscall(syscallID).GetName()
return name
}
最后结果:
Run: [echo hello]
Wait err stop signal: trace/breakpoint trap
name: execve, id: 59
name: brk, id: 12
name: access, id: 21
name: mmap, id: 9
name: access, id: 21
name: open, id: 2
name: fstat, id: 5
name: mmap, id: 9
name: close, id: 3
name: access, id: 21
name: open, id: 2
name: read, id: 0
name: fstat, id: 5
name: mmap, id: 9
name: mprotect, id: 10
name: mmap, id: 9
name: mmap, id: 9
name: close, id: 3
name: mmap, id: 9
name: arch_prctl, id: 158
name: mprotect, id: 10
name: mprotect, id: 10
name: mprotect, id: 10
name: munmap, id: 11
name: brk, id: 12
name: brk, id: 12
name: open, id: 2
name: fstat, id: 5
name: mmap, id: 9
name: close, id: 3
name: fstat, id: 5
hello
name: write, id: 1
name: close, id: 3
name: close, id: 3
1|read
1|write
3|open
5|close
4|fstat
7|mmap
4|mprotect
1|munmap
3|brk
3|access
1|execve
1|arch_prctl
尽管 Go 语言具有 cgo 这样的设施可以方便快捷地调用 C 函数,但是其还是自己对系统调用进行了封装,以 amd64 架构为例, Go 语言中的系统调用是通过如下几个函数完成的:
// In syscall_unix.go
func Syscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func Syscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func RawSyscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func RawSyscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
package main
import (
"fmt"
"syscall"
)
func main() {
pid, _, _ := syscall.Syscall(39, 0, 0, 0) // 用不到的就补上 0
fmt.Println("Process id: ", pid)
}
输出如下:
$ go run test.go
Process id: 19184
Golang标准库——syscall
参考URL: https://www.jianshu.com/p/44109d5e045b
Golang 与系统调用
参考URL: https://blog.csdn.net/weixin_33744141/article/details/89033990
[推荐阅读]Go 语言中的系统调用
参考URL: https://zhuanlan.zhihu.com/p/58285124