1 files changed, 285 insertions, 291 deletions
diff --git a/content/posts/openbsd-looking-at-btrace/index.org b/content/posts/openbsd-looking-at-btrace/index.org
index 8ef450b..24b6171 100644
--- a/content/posts/openbsd-looking-at-btrace/index.org
+++ b/content/posts/openbsd-looking-at-btrace/index.org
@@ -1,4 +1,4 @@
-#+TITLE: Looking at btrace on OpenBSD 7.3
+#+TITLE: Dynamic Tracing on OpenBSD 7.3
 #+DATE: 2023-05-21T23:43:05-04:00
 #+DRAFT: true
 #+DESCRIPTION: Taking a look at OpenBSD's dynamic tracing framework
@@ -11,306 +11,300 @@
 #+ATTR_HTML: :alt Puffy with a stopwatch
 [[file:cover.png]]
 
-* What it is
-
-  A couple years ago a new device silently entered the OpenBSD source
-  tree.
-
-  The =dt(4)= device first appearing in OpenBSD 6.7 with [[https://man.openbsd.org/OpenBSD-6.7/dt][very sparse
-  information]]. In the releases since, it's gained quite a bit of
-  functionality and now has tools to manipulate it properly. After
-  playing around with on my test machine running 7.3-current, it's
-  obvious that a lot of work has gone into it.
-
-  The documentation has gotten better over the last couple years, but
-  there isn't much info about how to leverage it to do anything useful.
-
-  The [[https://man.openbsd.org/dt][=dt=]] pseudo device driver is connected to =/dev/dt=, which is
-  used to interact with the device through [[https://man.openbsd.org/ioctl.2][=ioctl=]] calls. =dt= stand
-  for "dynamic tracer", similar to [[https://en.wikipedia.org/wiki/DTrace][=dtrace=]] and [[https://github.com/iovisor/bpftrace][=bpftrace=]]. It's a
-  device that lets you interact with probes on various parts of the
-  system.
-
-* How to enable it
-
-  By default you can't communicate with the device since it can't be
-  opened unless the =kern.allowdt= [[https://man.openbsd.org/sysctl.conf][=sysctl=]] flag is set to =1=.
-  However you can't set this specific =sysctl= flag while the system's
-  =kern.securelevel= is above =0=, which it is under normal circumstances.
-
-  The [[https://man.openbsd.org/securelevel][=securelevel=]] of the machine is set during the boot process and
-  can't be lowered once the machine is running. That means you need to
-  set the flag before the machine's =securelevel= is raised. You can
-  do this by adding the following line to [[https://man.openbsd.org/sysctl.conf][=/etc/sysctl.conf=]].
+A couple years ago a new device silently entered the OpenBSD source
+tree.
+
+The =dt(4)= device first appearing in OpenBSD 6.7 with [[https://man.openbsd.org/OpenBSD-6.7/dt][very sparse
+information]]. In the releases since, it's gained quite a bit of
+functionality and now has tools to manipulate it properly. After
+playing around with on my test machine running 7.3-current, it's
+obvious that a lot of work has gone into it.
+
+The documentation has gotten better over the last couple years, but
+there isn't much info about how to leverage it to do anything useful.
+
+The [[https://man.openbsd.org/dt][=dt=]] pseudo device driver is connected to =/dev/dt=, which is
+used to interact with the device through [[https://man.openbsd.org/ioctl.2][=ioctl=]] calls. =dt= stand
+for "dynamic tracer", similar to [[https://en.wikipedia.org/wiki/DTrace][=dtrace=]] and [[https://github.com/iovisor/bpftrace][=bpftrace=]]. It's a
+device that lets you interact with probes on various parts of the
+system.
+
+By default you can't communicate with the device since it can't be
+opened unless the =kern.allowdt= [[https://man.openbsd.org/sysctl.conf][=sysctl=]] flag is set to =1=.
+However you can't set this specific =sysctl= flag while the system's
+=kern.securelevel= is above =0=, which it is under normal circumstances.
+
+The [[https://man.openbsd.org/securelevel][=securelevel=]] of the machine is set during the boot process and
+can't be lowered once the machine is running. That means you need to
+set the flag before the machine's =securelevel= is raised. You can
+do this by adding the following line to [[https://man.openbsd.org/sysctl.conf][=/etc/sysctl.conf=]].
+
+#+begin_src conf
+kern.allowdt=1
+#+end_src
+
+
+You can see in =/etc/rc= that =/etc/sysctl.conf= is loaded very early
+in the boot process, and that the =securelevel= is raised near the end
+of the boot.
+
+#+CAPTION: Taken from =/etc/rc=
+#+begin_src shell
+50 # Apply sysctl.conf(5) settings.
+51 sysctl_conf() {
+52         # do not use a pipe as limits would only be applied to the subshell
+53         set -- $(stripcom /etc/sysctl.conf)
+54         while [[ $# > 0 ]] ; do
+55                 sysctl "$1"
+56
+57                 case "$1" in
+58                 kern.maxproc=*)
+59                         update_limit -p maxproc
+60                         ;;
+61                 kern.maxfiles=*)
+62                         update_limit -n openfiles
+63                         ;;
+64                 esac
+65                 shift
+66         done
+67 }
+
+[...]
+
+589 [[ -f /etc/rc.securelevel ]] && sh /etc/rc.securelevel
+590
+591 # rc.securelevel did not specifically set -1 or 2, so select the default: 1.
+592 (($(sysctl -n kern.securelevel) == 0)) && sysctl kern.securelevel=1
+
+#+end_src
+
+
+You could also run the machine in =Permanently insecure mode= by
+adding ~kern.securelevel=-1~ in =/etc/sysctl.conf=. This would let
+you change the =kern.allowdt= flag at runtime, but it's not
+recommended because it greatly reduces the security of your system.
+
+Once you've added the line to your =sysctl.conf= and rebooted you
+should be able to open the device as root.
+
+The main way to interact with the device is through the [[https://man.openbsd.org/btrace][=btrace=]]
+command. =btrace= is the "bug tracer" tool, used to run scripts
+written in [[https://man.openbsd.org/bt][=bt=]], the bug tracing language. =bt= uses the same syntax
+as Linux's =bpftrace= tool.
+
+Using the =bt= language and =btrace= tool, you can profile system
+internals and probe the inner workings of many programs. You can see
+which processes are forking, opening files, reading and writing to
+file descriptors, using pipes, using chown, pledging, listening on a
+socket, etc. You can see the full kernel and userland stack traces,
+process ids, function arguments, command names, thread id, cpu id,
+and return values.
+
+You can see the full list of supported probes with =btrace -l=.
+
+#+CAPTION: A sample of the available probes
+#+begin_src
+# doas btrace -l
+profile:hz:97
+interval:hz:1
+syscall:exit:entry
+syscall:exit:return
+syscall:fork:entry
+syscall:fork:return
+syscall:read:entry
+syscall:read:return
+syscall:write:entry
+syscall:write:return
+syscall:open:entry
+syscall:open:return
+syscall:close:entry
+syscall:close:return
+syscall:getentropy:entry
+syscall:getentropy:return
+syscall:__tfork:entry
+syscall:__tfork:return
+syscall:unlink:entry
+syscall:unlink:return
+syscall:wait4:entry
+syscall:wait4:return
+syscall:chdir:entry
+syscall:chdir:return
+syscall:fchdir:entry
+syscall:fchdir:return
+syscall:mknod:entry
+syscall:mknod:return
+syscall:chmod:entry
+syscall:chmod:return
+syscall:chown:entry
+syscall:chown:return
+[...]
+tracepoint:raw_syscalls:sys_enter(register_t)
+tracepoint:raw_syscalls:sys_exit(register_t)
+tracepoint:uvm:fault(vaddr_t, vm_fault_t, vm_prot_t)
+tracepoint:uvm:map_insert(vaddr_t, vaddr_t, vm_prot_t)
+tracepoint:uvm:map_remove(vaddr_t, vaddr_t, vm_prot_t)
+tracepoint:uvm:malloc(int, void *, size_t, int)
+tracepoint:uvm:free(int, void *, size_t)
+tracepoint:uvm:pool_get(void *, void *, int)
+tracepoint:uvm:pool_put(void *, void *)
+tracepoint:vfs:bufcache_rel(long, int, int64_t)
+tracepoint:vfs:bufcache_take(long, int, int64_t)
+tracepoint:vfs:cleaner(long, int, long, long)
+tracepoint:vmm:guest_enter(void *, void *)
+tracepoint:vmm:guest_exit(void *, void *, uint64_t)
+tracepoint:vmm:inout(void *, uint16_t, uint8_t, uint32_t)
+[...]
+#+end_src
+
+
+Currently it seems most probes are for system calls =syscall:*= but
+there are also a couple trace points for other function calls, like
+the scheduler =tracepoint:sched:enqueue=, virtual memory system
+=tracepoint:uvm:malloc=, virtual machine manager (vmm)
+=tracepoint:vmm:guest_enter=, etc.
+
+Trace points are often functions that have arguments that you can
+inspect as part of your probe.
+
+You can print values every time a probe is called using =printf=.
+You can also collect aggregates values, build simple histograms,
+calculate sums, or get min/max values. Check [[https://man.openbsd.org/bt.5][=bt=]] man page for full
+syntax. Since it's based on the =bpftrace= syntax you should also be
+able to look at those resources for ideas of how to use it.
+
+The =bt= language is somewhat similar to awk.
+
+=bt= programs have probes, filters, and actions.
+
+They're put together like this.
+
+#+begin_src
+PROBE /FILTER/ { ACTION(S) }
+#+end_src
+
+The action is called every time the probe gets activated, and you
+can optionally filter the probe to only activate on certain
+conditions like when it's a certain PID or thread ID using the
+filter format ~/pid == 1234/~
+
+There are special =BEGIN= and =END= probes that are called once at
+the beginning and end of program execution, which can be used to set
+and clear variables.
+
+Variables are global and take the form of =@var= for a scalars or =@var[key]= for
+a maps. There are functions like =clear(@map)= and =delete(@map[key])= that
+operate on map values.
+
+You can also do basic math operations on values.
+
+Here are a couple examples of how it can be used:
+
+- Print the process name and paid every time =fork= is called
+  #+begin_src
+  doas btrace -e 'syscall:fork:entry { printf("%s[%d]\n", comm, pid) }'
+  ksh[68490]
+  ksh[68490]
+  ksh[68490]
+  ksh[68490]
+  sh[83762]
+  ksh[68490]
+  ksh[87548]
+  ksh[87548]
+  ksh[87548]
+  sh[33064]
+  smtpd[32246]
+  smtpd[33076]
+  smtpd[33076]
+  ksh[88916]
+  smtpd[33076]
+  smtpd[33076]
+  smtpd[33076]
+  ksh[52579]
+  smtpd[33076]
+  smtpd[33076]
 
-  #+begin_src conf
-  kern.allowdt=1
   #+end_src
 
-
-  You can see in =/etc/rc= that =/etc/sysctl.conf= is loaded very early
-  in the boot process, and that the =securelevel= is raised near the end
-  of the boot.
-
-  #+CAPTION: Taken from =/etc/rc=
-  #+begin_src shell
-  50 # Apply sysctl.conf(5) settings.
-  51 sysctl_conf() {
-  52         # do not use a pipe as limits would only be applied to the subshell
-  53         set -- $(stripcom /etc/sysctl.conf)
-  54         while [[ $# > 0 ]] ; do
-  55                 sysctl "$1"
-  56
-  57                 case "$1" in
-  58                 kern.maxproc=*)
-  59                         update_limit -p maxproc
-  60                         ;;
-  61                 kern.maxfiles=*)
-  62                         update_limit -n openfiles
-  63                         ;;
-  64                 esac
-  65                 shift
-  66         done
-  67 }
-
-  [...]
-
-  589 [[ -f /etc/rc.securelevel ]] && sh /etc/rc.securelevel
-  590
-  591 # rc.securelevel did not specifically set -1 or 2, so select the default: 1.
-  592 (($(sysctl -n kern.securelevel) == 0)) && sysctl kern.securelevel=1
-
+- Count how many time processes call =read= (values printed after Ctrl-C)
+  #+begin_src
+  doas btrace -e 'syscall:read:entry { @[comm] = count() }'
+  ^C
+  @[sshd]: 243
+  @[ksh]: 48
+  @[less]: 31
+  @[cat]: 2
   #+end_src
 
-
-  You could also run the machine in =Permanently insecure mode= by
-  adding ~kern.securelevel=-1~ in =/etc/sysctl.conf=. This would let
-  you change the =kern.allowdt= flag at runtime, but it's not
-  recommended because it greatly reduces the security of your system.
-
-  Once you've added the line to your =sysctl.conf= and rebooted you
-  should be able to open the device as root.
-
-  The main way to interact with the device is through the [[https://man.openbsd.org/btrace][=btrace=]]
-  command. =btrace= is the "bug tracer" tool, used to run scripts
-  written in [[https://man.openbsd.org/bt][=bt=]], the bug tracing language. =bt= uses the same syntax
-  as Linux's =bpftrace= tool.
-
-* What it can do
-
-  Using the =bt= language and =btrace= tool, you can profile system
-  internals and probe the inner workings of many programs. You can see
-  which processes are forking, opening files, reading and writing to
-  file descriptors, using pipes, using chown, pledging, listening on a
-  socket, etc. You can see the full kernel and userland stack traces,
-  process ids, function arguments, command names, thread id, cpu id,
-  and return values.
-
-  You can see the full list of supported probes with =btrace -l=.
-
-  #+CAPTION: A sample of the available probes
+- See which processes are calling which syscalls, and how many times
   #+begin_src
-  # doas btrace -l
-  profile:hz:97
-  interval:hz:1
-  syscall:exit:entry
-  syscall:exit:return
-  syscall:fork:entry
-  syscall:fork:return
-  syscall:read:entry
-  syscall:read:return
-  syscall:write:entry
-  syscall:write:return
-  syscall:open:entry
-  syscall:open:return
-  syscall:close:entry
-  syscall:close:return
-  syscall:getentropy:entry
-  syscall:getentropy:return
-  syscall:__tfork:entry
-  syscall:__tfork:return
-  syscall:unlink:entry
-  syscall:unlink:return
-  syscall:wait4:entry
-  syscall:wait4:return
-  syscall:chdir:entry
-  syscall:chdir:return
-  syscall:fchdir:entry
-  syscall:fchdir:return
-  syscall:mknod:entry
-  syscall:mknod:return
-  syscall:chmod:entry
-  syscall:chmod:return
-  syscall:chown:entry
-  syscall:chown:return
-  [...]
-  tracepoint:raw_syscalls:sys_enter(register_t)
-  tracepoint:raw_syscalls:sys_exit(register_t)
-  tracepoint:uvm:fault(vaddr_t, vm_fault_t, vm_prot_t)
-  tracepoint:uvm:map_insert(vaddr_t, vaddr_t, vm_prot_t)
-  tracepoint:uvm:map_remove(vaddr_t, vaddr_t, vm_prot_t)
-  tracepoint:uvm:malloc(int, void *, size_t, int)
-  tracepoint:uvm:free(int, void *, size_t)
-  tracepoint:uvm:pool_get(void *, void *, int)
-  tracepoint:uvm:pool_put(void *, void *)
-  tracepoint:vfs:bufcache_rel(long, int, int64_t)
-  tracepoint:vfs:bufcache_take(long, int, int64_t)
-  tracepoint:vfs:cleaner(long, int, long, long)
-  tracepoint:vmm:guest_enter(void *, void *)
-  tracepoint:vmm:guest_exit(void *, void *, uint64_t)
-  tracepoint:vmm:inout(void *, uint16_t, uint8_t, uint32_t)
-  [...]
+  doas btrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm, arg0] = count() }'
+  ^C
+  @[sshd, 48]: 354
+  @[top, 49]: 292
+  @[sshd, 109]: 176
+  @[top, 86]: 155
+  @[top, 202]: 124
+  @[sshd, 3]: 92
+  @[sshd, 4]: 92
+  @[less, 86]: 82
+  @[ksh, 4]: 55
+  @[less, 49]: 42
+  @[top, 46]: 36
+  @[sshd, 54]: 34
+  @[top, 169]: 31
+  @[ksh, 3]: 28
+  @[top, 3]: 27
+  @[ksh, 202]: 26
+  @[top, 252]: 22
+  @[ksh, 46]: 20
+  @[top, 253]: 19
+  @[ksh, 40]: 18
+  @[top, 5]: 18
+  @[top, 6]: 18
+  @[top, 74]: 18
+  @[top, 53]: 16
+  @[top, 54]: 15
+  @[ksh, 54]: 14
+  @[ksh, 99]: 14
+  @[less, 159]: 14
+  @[less, 74]: 14
+  @[top, 159]: 14
+  @[top, 4]: 11
+  @[ksh, 48]: 10
+  @[ksh, 6]: 9
+  @[sshd, 73]: 9
+  @[ksh, 38]: 8
   #+end_src
 
+  You can check the syscall numbers here:
+  https://github.com/openbsd/src/blob/master/sys/kern/syscalls.master
 
-  Currently it seems most probes are for system calls =syscall:*= but
-  there are also a couple trace points for other function calls, like
-  the scheduler =tracepoint:sched:enqueue=, virtual memory system
-  =tracepoint:uvm:malloc=, virtual machine manager (vmm)
-  =tracepoint:vmm:guest_enter=, etc.
-
-  Trace points are often functions that have arguments that you can
-  inspect as part of your probe.
-
-  You can print values every time a probe is called using =printf=.
-  You can also collect aggregates values, build simple histograms,
-  calculate sums, or get min/max values. Check [[https://man.openbsd.org/bt.5][=bt=]] man page for full
-  syntax. Since it's based on the =bpftrace= syntax you should also be
-  able to look at those resources for ideas of how to use it.
-
-  The =bt= language is somewhat similar to awk.
-
-  =bt= programs have probes, filters, and actions.
-
-  They're put together like this.
+- See a histogram of the size of =read()= calls for a certain process
+  #+begin_src
+  doas btrace -e 'syscall:read:return /pid == 5353/ { @readsize = hist(retval) }'
+  ^C
+  @readsize:
+  [0, 1)             3 |@@@@@@@@@@@@@@@@@@@                                 |
+  [2, 4)             3 |@@@@@@@@@@@@@@@@@@@                                 |
+  [32, 64)           8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+  #+end_src
 
+- See a histogram of how long it takes for a =read()= to return
   #+begin_src
-  PROBE /FILTER/ { ACTION(S) }
+  doas btrace -e 'syscall:read:entry { @start[tid] = nsecs } syscall:read:return /@start[tid]/ { @times = hist(nsecs - @start[tid]); delete(@start[tid]) } END { clear(@start) }'
+  ^C@times:
+  [0]                3 |@@@                                                 |
+  [1K, 2K)           3 |@@@                                                 |
+  [2K, 4K)          50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+  [4K, 8K)          24 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
+  [8K, 16K)          8 |@@@@@@@@                                            |
+  [16M, 32M)         2 |@@                                                  |
+  [32M, 64M)         1 |@                                                   |
+  [64M, 128M)        5 |@@@@@                                               |
+  [128M, 256M)       4 |@@@@                                                |
   #+end_src
 
-  The action is called every time the probe gets activated, and you
-  can optionally filter the probe to only activate on certain
-  conditions like when it's a certain PID or thread ID using the
-  filter format ~/pid == 1234/~
-
-  There are special =BEGIN= and =END= probes that are called once at
-  the beginning and end of program execution, which can be used to set
-  and clear variables.
-
-  Variables are global and take the form of =@var= for a scalars or =@var[key]= for
-  a maps. There are functions like =clear(@map)= and =delete(@map[key])= that
-  operate on map values.
-
-  You can also do basic math operations on values.
-
-* Examples
-
-  - Print the process name and paid every time =fork= is called
-    #+begin_src
-    doas btrace -e 'syscall:fork:entry { printf("%s[%d]\n", comm, pid) }'
-    ksh[68490]
-    ksh[68490]
-    ksh[68490]
-    ksh[68490]
-    sh[83762]
-    ksh[68490]
-    ksh[87548]
-    ksh[87548]
-    ksh[87548]
-    sh[33064]
-    smtpd[32246]
-    smtpd[33076]
-    smtpd[33076]
-    ksh[88916]
-    smtpd[33076]
-    smtpd[33076]
-    smtpd[33076]
-    ksh[52579]
-    smtpd[33076]
-    smtpd[33076]
-
-    #+end_src
-
-  - Count how many time processes call =read= (values printed after Ctrl-C)
-    #+begin_src
-    doas btrace -e 'syscall:read:entry { @[comm] = count() }'
-    ^C
-    @[sshd]: 243
-    @[ksh]: 48
-    @[less]: 31
-    @[cat]: 2
-    #+end_src
-
-  - See which processes are calling which syscalls, and how many times
-    #+begin_src
-    doas btrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm, arg0] = count() }'
-    ^C
-    @[sshd, 48]: 354
-    @[top, 49]: 292
-    @[sshd, 109]: 176
-    @[top, 86]: 155
-    @[top, 202]: 124
-    @[sshd, 3]: 92
-    @[sshd, 4]: 92
-    @[less, 86]: 82
-    @[ksh, 4]: 55
-    @[less, 49]: 42
-    @[top, 46]: 36
-    @[sshd, 54]: 34
-    @[top, 169]: 31
-    @[ksh, 3]: 28
-    @[top, 3]: 27
-    @[ksh, 202]: 26
-    @[top, 252]: 22
-    @[ksh, 46]: 20
-    @[top, 253]: 19
-    @[ksh, 40]: 18
-    @[top, 5]: 18
-    @[top, 6]: 18
-    @[top, 74]: 18
-    @[top, 53]: 16
-    @[top, 54]: 15
-    @[ksh, 54]: 14
-    @[ksh, 99]: 14
-    @[less, 159]: 14
-    @[less, 74]: 14
-    @[top, 159]: 14
-    @[top, 4]: 11
-    @[ksh, 48]: 10
-    @[ksh, 6]: 9
-    @[sshd, 73]: 9
-    @[ksh, 38]: 8
-    #+end_src
-
-    You can check the syscall numbers here:
-    https://github.com/openbsd/src/blob/master/sys/kern/syscalls.master
-
-  - See a histogram of the size of =read()= calls for a certain process
-    #+begin_src
-    doas btrace -e 'syscall:read:return /pid == 5353/ { @readsize = hist(retval) }'
-    ^C
-    @readsize:
-    [0, 1)             3 |@@@@@@@@@@@@@@@@@@@                                 |
-    [2, 4)             3 |@@@@@@@@@@@@@@@@@@@                                 |
-    [32, 64)           8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
-    #+end_src
-
-  - See a histogram of how long it takes for a =read()= to return
-    #+begin_src
-    doas btrace -e 'syscall:read:entry { @start[tid] = nsecs } syscall:read:return /@start[tid]/ { @times = hist(nsecs - @start[tid]); delete(@start[tid]) } END { clear(@start) }'
-    ^C@times:
-    [0]                3 |@@@                                                 |
-    [1K, 2K)           3 |@@@                                                 |
-    [2K, 4K)          50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
-    [4K, 8K)          24 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
-    [8K, 16K)          8 |@@@@@@@@                                            |
-    [16M, 32M)         2 |@@                                                  |
-    [32M, 64M)         1 |@                                                   |
-    [64M, 128M)        5 |@@@@@                                               |
-    [128M, 256M)       4 |@@@@                                                |
-    #+end_src
-
-  There are also a couple of examples in =/usr/share/btrace= that use
-  sampling and command line arguments.
-
-  You can also check out the Linux [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][=bpftrace= guide]] for ideas.
+There are also a couple of examples in =/usr/share/btrace= that use
+sampling and command line arguments.
+
+You can also check out the Linux [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][=bpftrace= guide]] for ideas.