diff options
author | Dante Catalfamo | 2023-05-22 12:05:56 -0400 |
---|---|---|
committer | Dante Catalfamo | 2023-05-22 12:05:56 -0400 |
commit | 21b9869128b159a2347aaced80922dbebc3d6efb (patch) | |
tree | a260770b79ac5e04e8e0a9d03ac2bfd56f441340 /content | |
parent | 3d4f9b926db8e669066c6cfaf4bd3ab3e766ce6b (diff) | |
download | blog-21b9869128b159a2347aaced80922dbebc3d6efb.tar.gz blog-21b9869128b159a2347aaced80922dbebc3d6efb.tar.bz2 blog-21b9869128b159a2347aaced80922dbebc3d6efb.zip |
openbsd-btrace: remove headings
Diffstat (limited to 'content')
-rw-r--r-- | content/posts/openbsd-looking-at-btrace/index.org | 576 |
1 files changed, 285 insertions, 291 deletions
diff --git a/content/posts/openbsd-looking-at-btrace/index.org b/content/posts/openbsd-looking-at-btrace/index.org index 8ef450b..24b6171 100644 --- a/content/posts/openbsd-looking-at-btrace/index.org +++ b/content/posts/openbsd-looking-at-btrace/index.org @@ -1,4 +1,4 @@ -#+TITLE: Looking at btrace on OpenBSD 7.3 +#+TITLE: Dynamic Tracing on OpenBSD 7.3 #+DATE: 2023-05-21T23:43:05-04:00 #+DRAFT: true #+DESCRIPTION: Taking a look at OpenBSD's dynamic tracing framework @@ -11,306 +11,300 @@ #+ATTR_HTML: :alt Puffy with a stopwatch [[file:cover.png]] -* What it is - - A couple years ago a new device silently entered the OpenBSD source - tree. - - The =dt(4)= device first appearing in OpenBSD 6.7 with [[https://man.openbsd.org/OpenBSD-6.7/dt][very sparse - information]]. In the releases since, it's gained quite a bit of - functionality and now has tools to manipulate it properly. After - playing around with on my test machine running 7.3-current, it's - obvious that a lot of work has gone into it. - - The documentation has gotten better over the last couple years, but - there isn't much info about how to leverage it to do anything useful. - - The [[https://man.openbsd.org/dt][=dt=]] pseudo device driver is connected to =/dev/dt=, which is - used to interact with the device through [[https://man.openbsd.org/ioctl.2][=ioctl=]] calls. =dt= stand - for "dynamic tracer", similar to [[https://en.wikipedia.org/wiki/DTrace][=dtrace=]] and [[https://github.com/iovisor/bpftrace][=bpftrace=]]. It's a - device that lets you interact with probes on various parts of the - system. - -* How to enable it - - By default you can't communicate with the device since it can't be - opened unless the =kern.allowdt= [[https://man.openbsd.org/sysctl.conf][=sysctl=]] flag is set to =1=. - However you can't set this specific =sysctl= flag while the system's - =kern.securelevel= is above =0=, which it is under normal circumstances. - - The [[https://man.openbsd.org/securelevel][=securelevel=]] of the machine is set during the boot process and - can't be lowered once the machine is running. That means you need to - set the flag before the machine's =securelevel= is raised. You can - do this by adding the following line to [[https://man.openbsd.org/sysctl.conf][=/etc/sysctl.conf=]]. +A couple years ago a new device silently entered the OpenBSD source +tree. + +The =dt(4)= device first appearing in OpenBSD 6.7 with [[https://man.openbsd.org/OpenBSD-6.7/dt][very sparse +information]]. In the releases since, it's gained quite a bit of +functionality and now has tools to manipulate it properly. After +playing around with on my test machine running 7.3-current, it's +obvious that a lot of work has gone into it. + +The documentation has gotten better over the last couple years, but +there isn't much info about how to leverage it to do anything useful. + +The [[https://man.openbsd.org/dt][=dt=]] pseudo device driver is connected to =/dev/dt=, which is +used to interact with the device through [[https://man.openbsd.org/ioctl.2][=ioctl=]] calls. =dt= stand +for "dynamic tracer", similar to [[https://en.wikipedia.org/wiki/DTrace][=dtrace=]] and [[https://github.com/iovisor/bpftrace][=bpftrace=]]. It's a +device that lets you interact with probes on various parts of the +system. + +By default you can't communicate with the device since it can't be +opened unless the =kern.allowdt= [[https://man.openbsd.org/sysctl.conf][=sysctl=]] flag is set to =1=. +However you can't set this specific =sysctl= flag while the system's +=kern.securelevel= is above =0=, which it is under normal circumstances. + +The [[https://man.openbsd.org/securelevel][=securelevel=]] of the machine is set during the boot process and +can't be lowered once the machine is running. That means you need to +set the flag before the machine's =securelevel= is raised. You can +do this by adding the following line to [[https://man.openbsd.org/sysctl.conf][=/etc/sysctl.conf=]]. + +#+begin_src conf +kern.allowdt=1 +#+end_src + + +You can see in =/etc/rc= that =/etc/sysctl.conf= is loaded very early +in the boot process, and that the =securelevel= is raised near the end +of the boot. + +#+CAPTION: Taken from =/etc/rc= +#+begin_src shell +50 # Apply sysctl.conf(5) settings. +51 sysctl_conf() { +52 # do not use a pipe as limits would only be applied to the subshell +53 set -- $(stripcom /etc/sysctl.conf) +54 while [[ $# > 0 ]] ; do +55 sysctl "$1" +56 +57 case "$1" in +58 kern.maxproc=*) +59 update_limit -p maxproc +60 ;; +61 kern.maxfiles=*) +62 update_limit -n openfiles +63 ;; +64 esac +65 shift +66 done +67 } + +[...] + +589 [[ -f /etc/rc.securelevel ]] && sh /etc/rc.securelevel +590 +591 # rc.securelevel did not specifically set -1 or 2, so select the default: 1. +592 (($(sysctl -n kern.securelevel) == 0)) && sysctl kern.securelevel=1 + +#+end_src + + +You could also run the machine in =Permanently insecure mode= by +adding ~kern.securelevel=-1~ in =/etc/sysctl.conf=. This would let +you change the =kern.allowdt= flag at runtime, but it's not +recommended because it greatly reduces the security of your system. + +Once you've added the line to your =sysctl.conf= and rebooted you +should be able to open the device as root. + +The main way to interact with the device is through the [[https://man.openbsd.org/btrace][=btrace=]] +command. =btrace= is the "bug tracer" tool, used to run scripts +written in [[https://man.openbsd.org/bt][=bt=]], the bug tracing language. =bt= uses the same syntax +as Linux's =bpftrace= tool. + +Using the =bt= language and =btrace= tool, you can profile system +internals and probe the inner workings of many programs. You can see +which processes are forking, opening files, reading and writing to +file descriptors, using pipes, using chown, pledging, listening on a +socket, etc. You can see the full kernel and userland stack traces, +process ids, function arguments, command names, thread id, cpu id, +and return values. + +You can see the full list of supported probes with =btrace -l=. + +#+CAPTION: A sample of the available probes +#+begin_src +# doas btrace -l +profile:hz:97 +interval:hz:1 +syscall:exit:entry +syscall:exit:return +syscall:fork:entry +syscall:fork:return +syscall:read:entry +syscall:read:return +syscall:write:entry +syscall:write:return +syscall:open:entry +syscall:open:return +syscall:close:entry +syscall:close:return +syscall:getentropy:entry +syscall:getentropy:return +syscall:__tfork:entry +syscall:__tfork:return +syscall:unlink:entry +syscall:unlink:return +syscall:wait4:entry +syscall:wait4:return +syscall:chdir:entry +syscall:chdir:return +syscall:fchdir:entry +syscall:fchdir:return +syscall:mknod:entry +syscall:mknod:return +syscall:chmod:entry +syscall:chmod:return +syscall:chown:entry +syscall:chown:return +[...] +tracepoint:raw_syscalls:sys_enter(register_t) +tracepoint:raw_syscalls:sys_exit(register_t) +tracepoint:uvm:fault(vaddr_t, vm_fault_t, vm_prot_t) +tracepoint:uvm:map_insert(vaddr_t, vaddr_t, vm_prot_t) +tracepoint:uvm:map_remove(vaddr_t, vaddr_t, vm_prot_t) +tracepoint:uvm:malloc(int, void *, size_t, int) +tracepoint:uvm:free(int, void *, size_t) +tracepoint:uvm:pool_get(void *, void *, int) +tracepoint:uvm:pool_put(void *, void *) +tracepoint:vfs:bufcache_rel(long, int, int64_t) +tracepoint:vfs:bufcache_take(long, int, int64_t) +tracepoint:vfs:cleaner(long, int, long, long) +tracepoint:vmm:guest_enter(void *, void *) +tracepoint:vmm:guest_exit(void *, void *, uint64_t) +tracepoint:vmm:inout(void *, uint16_t, uint8_t, uint32_t) +[...] +#+end_src + + +Currently it seems most probes are for system calls =syscall:*= but +there are also a couple trace points for other function calls, like +the scheduler =tracepoint:sched:enqueue=, virtual memory system +=tracepoint:uvm:malloc=, virtual machine manager (vmm) +=tracepoint:vmm:guest_enter=, etc. + +Trace points are often functions that have arguments that you can +inspect as part of your probe. + +You can print values every time a probe is called using =printf=. +You can also collect aggregates values, build simple histograms, +calculate sums, or get min/max values. Check [[https://man.openbsd.org/bt.5][=bt=]] man page for full +syntax. Since it's based on the =bpftrace= syntax you should also be +able to look at those resources for ideas of how to use it. + +The =bt= language is somewhat similar to awk. + +=bt= programs have probes, filters, and actions. + +They're put together like this. + +#+begin_src +PROBE /FILTER/ { ACTION(S) } +#+end_src + +The action is called every time the probe gets activated, and you +can optionally filter the probe to only activate on certain +conditions like when it's a certain PID or thread ID using the +filter format ~/pid == 1234/~ + +There are special =BEGIN= and =END= probes that are called once at +the beginning and end of program execution, which can be used to set +and clear variables. + +Variables are global and take the form of =@var= for a scalars or =@var[key]= for +a maps. There are functions like =clear(@map)= and =delete(@map[key])= that +operate on map values. + +You can also do basic math operations on values. + +Here are a couple examples of how it can be used: + +- Print the process name and paid every time =fork= is called + #+begin_src + doas btrace -e 'syscall:fork:entry { printf("%s[%d]\n", comm, pid) }' + ksh[68490] + ksh[68490] + ksh[68490] + ksh[68490] + sh[83762] + ksh[68490] + ksh[87548] + ksh[87548] + ksh[87548] + sh[33064] + smtpd[32246] + smtpd[33076] + smtpd[33076] + ksh[88916] + smtpd[33076] + smtpd[33076] + smtpd[33076] + ksh[52579] + smtpd[33076] + smtpd[33076] - #+begin_src conf - kern.allowdt=1 #+end_src - - You can see in =/etc/rc= that =/etc/sysctl.conf= is loaded very early - in the boot process, and that the =securelevel= is raised near the end - of the boot. - - #+CAPTION: Taken from =/etc/rc= - #+begin_src shell - 50 # Apply sysctl.conf(5) settings. - 51 sysctl_conf() { - 52 # do not use a pipe as limits would only be applied to the subshell - 53 set -- $(stripcom /etc/sysctl.conf) - 54 while [[ $# > 0 ]] ; do - 55 sysctl "$1" - 56 - 57 case "$1" in - 58 kern.maxproc=*) - 59 update_limit -p maxproc - 60 ;; - 61 kern.maxfiles=*) - 62 update_limit -n openfiles - 63 ;; - 64 esac - 65 shift - 66 done - 67 } - - [...] - - 589 [[ -f /etc/rc.securelevel ]] && sh /etc/rc.securelevel - 590 - 591 # rc.securelevel did not specifically set -1 or 2, so select the default: 1. - 592 (($(sysctl -n kern.securelevel) == 0)) && sysctl kern.securelevel=1 - +- Count how many time processes call =read= (values printed after Ctrl-C) + #+begin_src + doas btrace -e 'syscall:read:entry { @[comm] = count() }' + ^C + @[sshd]: 243 + @[ksh]: 48 + @[less]: 31 + @[cat]: 2 #+end_src - - You could also run the machine in =Permanently insecure mode= by - adding ~kern.securelevel=-1~ in =/etc/sysctl.conf=. This would let - you change the =kern.allowdt= flag at runtime, but it's not - recommended because it greatly reduces the security of your system. - - Once you've added the line to your =sysctl.conf= and rebooted you - should be able to open the device as root. - - The main way to interact with the device is through the [[https://man.openbsd.org/btrace][=btrace=]] - command. =btrace= is the "bug tracer" tool, used to run scripts - written in [[https://man.openbsd.org/bt][=bt=]], the bug tracing language. =bt= uses the same syntax - as Linux's =bpftrace= tool. - -* What it can do - - Using the =bt= language and =btrace= tool, you can profile system - internals and probe the inner workings of many programs. You can see - which processes are forking, opening files, reading and writing to - file descriptors, using pipes, using chown, pledging, listening on a - socket, etc. You can see the full kernel and userland stack traces, - process ids, function arguments, command names, thread id, cpu id, - and return values. - - You can see the full list of supported probes with =btrace -l=. - - #+CAPTION: A sample of the available probes +- See which processes are calling which syscalls, and how many times #+begin_src - # doas btrace -l - profile:hz:97 - interval:hz:1 - syscall:exit:entry - syscall:exit:return - syscall:fork:entry - syscall:fork:return - syscall:read:entry - syscall:read:return - syscall:write:entry - syscall:write:return - syscall:open:entry - syscall:open:return - syscall:close:entry - syscall:close:return - syscall:getentropy:entry - syscall:getentropy:return - syscall:__tfork:entry - syscall:__tfork:return - syscall:unlink:entry - syscall:unlink:return - syscall:wait4:entry - syscall:wait4:return - syscall:chdir:entry - syscall:chdir:return - syscall:fchdir:entry - syscall:fchdir:return - syscall:mknod:entry - syscall:mknod:return - syscall:chmod:entry - syscall:chmod:return - syscall:chown:entry - syscall:chown:return - [...] - tracepoint:raw_syscalls:sys_enter(register_t) - tracepoint:raw_syscalls:sys_exit(register_t) - tracepoint:uvm:fault(vaddr_t, vm_fault_t, vm_prot_t) - tracepoint:uvm:map_insert(vaddr_t, vaddr_t, vm_prot_t) - tracepoint:uvm:map_remove(vaddr_t, vaddr_t, vm_prot_t) - tracepoint:uvm:malloc(int, void *, size_t, int) - tracepoint:uvm:free(int, void *, size_t) - tracepoint:uvm:pool_get(void *, void *, int) - tracepoint:uvm:pool_put(void *, void *) - tracepoint:vfs:bufcache_rel(long, int, int64_t) - tracepoint:vfs:bufcache_take(long, int, int64_t) - tracepoint:vfs:cleaner(long, int, long, long) - tracepoint:vmm:guest_enter(void *, void *) - tracepoint:vmm:guest_exit(void *, void *, uint64_t) - tracepoint:vmm:inout(void *, uint16_t, uint8_t, uint32_t) - [...] + doas btrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm, arg0] = count() }' + ^C + @[sshd, 48]: 354 + @[top, 49]: 292 + @[sshd, 109]: 176 + @[top, 86]: 155 + @[top, 202]: 124 + @[sshd, 3]: 92 + @[sshd, 4]: 92 + @[less, 86]: 82 + @[ksh, 4]: 55 + @[less, 49]: 42 + @[top, 46]: 36 + @[sshd, 54]: 34 + @[top, 169]: 31 + @[ksh, 3]: 28 + @[top, 3]: 27 + @[ksh, 202]: 26 + @[top, 252]: 22 + @[ksh, 46]: 20 + @[top, 253]: 19 + @[ksh, 40]: 18 + @[top, 5]: 18 + @[top, 6]: 18 + @[top, 74]: 18 + @[top, 53]: 16 + @[top, 54]: 15 + @[ksh, 54]: 14 + @[ksh, 99]: 14 + @[less, 159]: 14 + @[less, 74]: 14 + @[top, 159]: 14 + @[top, 4]: 11 + @[ksh, 48]: 10 + @[ksh, 6]: 9 + @[sshd, 73]: 9 + @[ksh, 38]: 8 #+end_src + You can check the syscall numbers here: + https://github.com/openbsd/src/blob/master/sys/kern/syscalls.master - Currently it seems most probes are for system calls =syscall:*= but - there are also a couple trace points for other function calls, like - the scheduler =tracepoint:sched:enqueue=, virtual memory system - =tracepoint:uvm:malloc=, virtual machine manager (vmm) - =tracepoint:vmm:guest_enter=, etc. - - Trace points are often functions that have arguments that you can - inspect as part of your probe. - - You can print values every time a probe is called using =printf=. - You can also collect aggregates values, build simple histograms, - calculate sums, or get min/max values. Check [[https://man.openbsd.org/bt.5][=bt=]] man page for full - syntax. Since it's based on the =bpftrace= syntax you should also be - able to look at those resources for ideas of how to use it. - - The =bt= language is somewhat similar to awk. - - =bt= programs have probes, filters, and actions. - - They're put together like this. +- See a histogram of the size of =read()= calls for a certain process + #+begin_src + doas btrace -e 'syscall:read:return /pid == 5353/ { @readsize = hist(retval) }' + ^C + @readsize: + [0, 1) 3 |@@@@@@@@@@@@@@@@@@@ | + [2, 4) 3 |@@@@@@@@@@@@@@@@@@@ | + [32, 64) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| + #+end_src +- See a histogram of how long it takes for a =read()= to return #+begin_src - PROBE /FILTER/ { ACTION(S) } + doas btrace -e 'syscall:read:entry { @start[tid] = nsecs } syscall:read:return /@start[tid]/ { @times = hist(nsecs - @start[tid]); delete(@start[tid]) } END { clear(@start) }' + ^C@times: + [0] 3 |@@@ | + [1K, 2K) 3 |@@@ | + [2K, 4K) 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| + [4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@ | + [8K, 16K) 8 |@@@@@@@@ | + [16M, 32M) 2 |@@ | + [32M, 64M) 1 |@ | + [64M, 128M) 5 |@@@@@ | + [128M, 256M) 4 |@@@@ | #+end_src - The action is called every time the probe gets activated, and you - can optionally filter the probe to only activate on certain - conditions like when it's a certain PID or thread ID using the - filter format ~/pid == 1234/~ - - There are special =BEGIN= and =END= probes that are called once at - the beginning and end of program execution, which can be used to set - and clear variables. - - Variables are global and take the form of =@var= for a scalars or =@var[key]= for - a maps. There are functions like =clear(@map)= and =delete(@map[key])= that - operate on map values. - - You can also do basic math operations on values. - -* Examples - - - Print the process name and paid every time =fork= is called - #+begin_src - doas btrace -e 'syscall:fork:entry { printf("%s[%d]\n", comm, pid) }' - ksh[68490] - ksh[68490] - ksh[68490] - ksh[68490] - sh[83762] - ksh[68490] - ksh[87548] - ksh[87548] - ksh[87548] - sh[33064] - smtpd[32246] - smtpd[33076] - smtpd[33076] - ksh[88916] - smtpd[33076] - smtpd[33076] - smtpd[33076] - ksh[52579] - smtpd[33076] - smtpd[33076] - - #+end_src - - - Count how many time processes call =read= (values printed after Ctrl-C) - #+begin_src - doas btrace -e 'syscall:read:entry { @[comm] = count() }' - ^C - @[sshd]: 243 - @[ksh]: 48 - @[less]: 31 - @[cat]: 2 - #+end_src - - - See which processes are calling which syscalls, and how many times - #+begin_src - doas btrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm, arg0] = count() }' - ^C - @[sshd, 48]: 354 - @[top, 49]: 292 - @[sshd, 109]: 176 - @[top, 86]: 155 - @[top, 202]: 124 - @[sshd, 3]: 92 - @[sshd, 4]: 92 - @[less, 86]: 82 - @[ksh, 4]: 55 - @[less, 49]: 42 - @[top, 46]: 36 - @[sshd, 54]: 34 - @[top, 169]: 31 - @[ksh, 3]: 28 - @[top, 3]: 27 - @[ksh, 202]: 26 - @[top, 252]: 22 - @[ksh, 46]: 20 - @[top, 253]: 19 - @[ksh, 40]: 18 - @[top, 5]: 18 - @[top, 6]: 18 - @[top, 74]: 18 - @[top, 53]: 16 - @[top, 54]: 15 - @[ksh, 54]: 14 - @[ksh, 99]: 14 - @[less, 159]: 14 - @[less, 74]: 14 - @[top, 159]: 14 - @[top, 4]: 11 - @[ksh, 48]: 10 - @[ksh, 6]: 9 - @[sshd, 73]: 9 - @[ksh, 38]: 8 - #+end_src - - You can check the syscall numbers here: - https://github.com/openbsd/src/blob/master/sys/kern/syscalls.master - - - See a histogram of the size of =read()= calls for a certain process - #+begin_src - doas btrace -e 'syscall:read:return /pid == 5353/ { @readsize = hist(retval) }' - ^C - @readsize: - [0, 1) 3 |@@@@@@@@@@@@@@@@@@@ | - [2, 4) 3 |@@@@@@@@@@@@@@@@@@@ | - [32, 64) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| - #+end_src - - - See a histogram of how long it takes for a =read()= to return - #+begin_src - doas btrace -e 'syscall:read:entry { @start[tid] = nsecs } syscall:read:return /@start[tid]/ { @times = hist(nsecs - @start[tid]); delete(@start[tid]) } END { clear(@start) }' - ^C@times: - [0] 3 |@@@ | - [1K, 2K) 3 |@@@ | - [2K, 4K) 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| - [4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@ | - [8K, 16K) 8 |@@@@@@@@ | - [16M, 32M) 2 |@@ | - [32M, 64M) 1 |@ | - [64M, 128M) 5 |@@@@@ | - [128M, 256M) 4 |@@@@ | - #+end_src - - There are also a couple of examples in =/usr/share/btrace= that use - sampling and command line arguments. - - You can also check out the Linux [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][=bpftrace= guide]] for ideas. +There are also a couple of examples in =/usr/share/btrace= that use +sampling and command line arguments. + +You can also check out the Linux [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][=bpftrace= guide]] for ideas. |