summaryrefslogtreecommitdiffstats
path: root/content/posts/openbsd-looking-at-btrace/index.org
diff options
context:
space:
mode:
Diffstat (limited to 'content/posts/openbsd-looking-at-btrace/index.org')
-rw-r--r--content/posts/openbsd-looking-at-btrace/index.org310
1 files changed, 0 insertions, 310 deletions
diff --git a/content/posts/openbsd-looking-at-btrace/index.org b/content/posts/openbsd-looking-at-btrace/index.org
deleted file mode 100644
index 24b6171..0000000
--- a/content/posts/openbsd-looking-at-btrace/index.org
+++ /dev/null
@@ -1,310 +0,0 @@
-#+TITLE: Dynamic Tracing on OpenBSD 7.3
-#+DATE: 2023-05-21T23:43:05-04:00
-#+DRAFT: true
-#+DESCRIPTION: Taking a look at OpenBSD's dynamic tracing framework
-#+TAGS[]: openbsd btrace
-#+KEYWORDS[]: openbsd btrace
-#+SLUG:
-#+SUMMARY:
-
-#+ATTR_HTML: :title Puffy with a stopwatch
-#+ATTR_HTML: :alt Puffy with a stopwatch
-[[file:cover.png]]
-
-A couple years ago a new device silently entered the OpenBSD source
-tree.
-
-The =dt(4)= device first appearing in OpenBSD 6.7 with [[https://man.openbsd.org/OpenBSD-6.7/dt][very sparse
-information]]. In the releases since, it's gained quite a bit of
-functionality and now has tools to manipulate it properly. After
-playing around with on my test machine running 7.3-current, it's
-obvious that a lot of work has gone into it.
-
-The documentation has gotten better over the last couple years, but
-there isn't much info about how to leverage it to do anything useful.
-
-The [[https://man.openbsd.org/dt][=dt=]] pseudo device driver is connected to =/dev/dt=, which is
-used to interact with the device through [[https://man.openbsd.org/ioctl.2][=ioctl=]] calls. =dt= stand
-for "dynamic tracer", similar to [[https://en.wikipedia.org/wiki/DTrace][=dtrace=]] and [[https://github.com/iovisor/bpftrace][=bpftrace=]]. It's a
-device that lets you interact with probes on various parts of the
-system.
-
-By default you can't communicate with the device since it can't be
-opened unless the =kern.allowdt= [[https://man.openbsd.org/sysctl.conf][=sysctl=]] flag is set to =1=.
-However you can't set this specific =sysctl= flag while the system's
-=kern.securelevel= is above =0=, which it is under normal circumstances.
-
-The [[https://man.openbsd.org/securelevel][=securelevel=]] of the machine is set during the boot process and
-can't be lowered once the machine is running. That means you need to
-set the flag before the machine's =securelevel= is raised. You can
-do this by adding the following line to [[https://man.openbsd.org/sysctl.conf][=/etc/sysctl.conf=]].
-
-#+begin_src conf
-kern.allowdt=1
-#+end_src
-
-
-You can see in =/etc/rc= that =/etc/sysctl.conf= is loaded very early
-in the boot process, and that the =securelevel= is raised near the end
-of the boot.
-
-#+CAPTION: Taken from =/etc/rc=
-#+begin_src shell
-50 # Apply sysctl.conf(5) settings.
-51 sysctl_conf() {
-52 # do not use a pipe as limits would only be applied to the subshell
-53 set -- $(stripcom /etc/sysctl.conf)
-54 while [[ $# > 0 ]] ; do
-55 sysctl "$1"
-56
-57 case "$1" in
-58 kern.maxproc=*)
-59 update_limit -p maxproc
-60 ;;
-61 kern.maxfiles=*)
-62 update_limit -n openfiles
-63 ;;
-64 esac
-65 shift
-66 done
-67 }
-
-[...]
-
-589 [[ -f /etc/rc.securelevel ]] && sh /etc/rc.securelevel
-590
-591 # rc.securelevel did not specifically set -1 or 2, so select the default: 1.
-592 (($(sysctl -n kern.securelevel) == 0)) && sysctl kern.securelevel=1
-
-#+end_src
-
-
-You could also run the machine in =Permanently insecure mode= by
-adding ~kern.securelevel=-1~ in =/etc/sysctl.conf=. This would let
-you change the =kern.allowdt= flag at runtime, but it's not
-recommended because it greatly reduces the security of your system.
-
-Once you've added the line to your =sysctl.conf= and rebooted you
-should be able to open the device as root.
-
-The main way to interact with the device is through the [[https://man.openbsd.org/btrace][=btrace=]]
-command. =btrace= is the "bug tracer" tool, used to run scripts
-written in [[https://man.openbsd.org/bt][=bt=]], the bug tracing language. =bt= uses the same syntax
-as Linux's =bpftrace= tool.
-
-Using the =bt= language and =btrace= tool, you can profile system
-internals and probe the inner workings of many programs. You can see
-which processes are forking, opening files, reading and writing to
-file descriptors, using pipes, using chown, pledging, listening on a
-socket, etc. You can see the full kernel and userland stack traces,
-process ids, function arguments, command names, thread id, cpu id,
-and return values.
-
-You can see the full list of supported probes with =btrace -l=.
-
-#+CAPTION: A sample of the available probes
-#+begin_src
-# doas btrace -l
-profile:hz:97
-interval:hz:1
-syscall:exit:entry
-syscall:exit:return
-syscall:fork:entry
-syscall:fork:return
-syscall:read:entry
-syscall:read:return
-syscall:write:entry
-syscall:write:return
-syscall:open:entry
-syscall:open:return
-syscall:close:entry
-syscall:close:return
-syscall:getentropy:entry
-syscall:getentropy:return
-syscall:__tfork:entry
-syscall:__tfork:return
-syscall:unlink:entry
-syscall:unlink:return
-syscall:wait4:entry
-syscall:wait4:return
-syscall:chdir:entry
-syscall:chdir:return
-syscall:fchdir:entry
-syscall:fchdir:return
-syscall:mknod:entry
-syscall:mknod:return
-syscall:chmod:entry
-syscall:chmod:return
-syscall:chown:entry
-syscall:chown:return
-[...]
-tracepoint:raw_syscalls:sys_enter(register_t)
-tracepoint:raw_syscalls:sys_exit(register_t)
-tracepoint:uvm:fault(vaddr_t, vm_fault_t, vm_prot_t)
-tracepoint:uvm:map_insert(vaddr_t, vaddr_t, vm_prot_t)
-tracepoint:uvm:map_remove(vaddr_t, vaddr_t, vm_prot_t)
-tracepoint:uvm:malloc(int, void *, size_t, int)
-tracepoint:uvm:free(int, void *, size_t)
-tracepoint:uvm:pool_get(void *, void *, int)
-tracepoint:uvm:pool_put(void *, void *)
-tracepoint:vfs:bufcache_rel(long, int, int64_t)
-tracepoint:vfs:bufcache_take(long, int, int64_t)
-tracepoint:vfs:cleaner(long, int, long, long)
-tracepoint:vmm:guest_enter(void *, void *)
-tracepoint:vmm:guest_exit(void *, void *, uint64_t)
-tracepoint:vmm:inout(void *, uint16_t, uint8_t, uint32_t)
-[...]
-#+end_src
-
-
-Currently it seems most probes are for system calls =syscall:*= but
-there are also a couple trace points for other function calls, like
-the scheduler =tracepoint:sched:enqueue=, virtual memory system
-=tracepoint:uvm:malloc=, virtual machine manager (vmm)
-=tracepoint:vmm:guest_enter=, etc.
-
-Trace points are often functions that have arguments that you can
-inspect as part of your probe.
-
-You can print values every time a probe is called using =printf=.
-You can also collect aggregates values, build simple histograms,
-calculate sums, or get min/max values. Check [[https://man.openbsd.org/bt.5][=bt=]] man page for full
-syntax. Since it's based on the =bpftrace= syntax you should also be
-able to look at those resources for ideas of how to use it.
-
-The =bt= language is somewhat similar to awk.
-
-=bt= programs have probes, filters, and actions.
-
-They're put together like this.
-
-#+begin_src
-PROBE /FILTER/ { ACTION(S) }
-#+end_src
-
-The action is called every time the probe gets activated, and you
-can optionally filter the probe to only activate on certain
-conditions like when it's a certain PID or thread ID using the
-filter format ~/pid == 1234/~
-
-There are special =BEGIN= and =END= probes that are called once at
-the beginning and end of program execution, which can be used to set
-and clear variables.
-
-Variables are global and take the form of =@var= for a scalars or =@var[key]= for
-a maps. There are functions like =clear(@map)= and =delete(@map[key])= that
-operate on map values.
-
-You can also do basic math operations on values.
-
-Here are a couple examples of how it can be used:
-
-- Print the process name and paid every time =fork= is called
- #+begin_src
- doas btrace -e 'syscall:fork:entry { printf("%s[%d]\n", comm, pid) }'
- ksh[68490]
- ksh[68490]
- ksh[68490]
- ksh[68490]
- sh[83762]
- ksh[68490]
- ksh[87548]
- ksh[87548]
- ksh[87548]
- sh[33064]
- smtpd[32246]
- smtpd[33076]
- smtpd[33076]
- ksh[88916]
- smtpd[33076]
- smtpd[33076]
- smtpd[33076]
- ksh[52579]
- smtpd[33076]
- smtpd[33076]
-
- #+end_src
-
-- Count how many time processes call =read= (values printed after Ctrl-C)
- #+begin_src
- doas btrace -e 'syscall:read:entry { @[comm] = count() }'
- ^C
- @[sshd]: 243
- @[ksh]: 48
- @[less]: 31
- @[cat]: 2
- #+end_src
-
-- See which processes are calling which syscalls, and how many times
- #+begin_src
- doas btrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm, arg0] = count() }'
- ^C
- @[sshd, 48]: 354
- @[top, 49]: 292
- @[sshd, 109]: 176
- @[top, 86]: 155
- @[top, 202]: 124
- @[sshd, 3]: 92
- @[sshd, 4]: 92
- @[less, 86]: 82
- @[ksh, 4]: 55
- @[less, 49]: 42
- @[top, 46]: 36
- @[sshd, 54]: 34
- @[top, 169]: 31
- @[ksh, 3]: 28
- @[top, 3]: 27
- @[ksh, 202]: 26
- @[top, 252]: 22
- @[ksh, 46]: 20
- @[top, 253]: 19
- @[ksh, 40]: 18
- @[top, 5]: 18
- @[top, 6]: 18
- @[top, 74]: 18
- @[top, 53]: 16
- @[top, 54]: 15
- @[ksh, 54]: 14
- @[ksh, 99]: 14
- @[less, 159]: 14
- @[less, 74]: 14
- @[top, 159]: 14
- @[top, 4]: 11
- @[ksh, 48]: 10
- @[ksh, 6]: 9
- @[sshd, 73]: 9
- @[ksh, 38]: 8
- #+end_src
-
- You can check the syscall numbers here:
- https://github.com/openbsd/src/blob/master/sys/kern/syscalls.master
-
-- See a histogram of the size of =read()= calls for a certain process
- #+begin_src
- doas btrace -e 'syscall:read:return /pid == 5353/ { @readsize = hist(retval) }'
- ^C
- @readsize:
- [0, 1) 3 |@@@@@@@@@@@@@@@@@@@ |
- [2, 4) 3 |@@@@@@@@@@@@@@@@@@@ |
- [32, 64) 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
- #+end_src
-
-- See a histogram of how long it takes for a =read()= to return
- #+begin_src
- doas btrace -e 'syscall:read:entry { @start[tid] = nsecs } syscall:read:return /@start[tid]/ { @times = hist(nsecs - @start[tid]); delete(@start[tid]) } END { clear(@start) }'
- ^C@times:
- [0] 3 |@@@ |
- [1K, 2K) 3 |@@@ |
- [2K, 4K) 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
- [4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@ |
- [8K, 16K) 8 |@@@@@@@@ |
- [16M, 32M) 2 |@@ |
- [32M, 64M) 1 |@ |
- [64M, 128M) 5 |@@@@@ |
- [128M, 256M) 4 |@@@@ |
- #+end_src
-
-There are also a couple of examples in =/usr/share/btrace= that use
-sampling and command line arguments.
-
-You can also check out the Linux [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][=bpftrace= guide]] for ideas.