Logging the Creation of Shell Processes

Since attacks often involve trying to run a shell on a remote host, usually by exploiting a vulnerability in a network service, why don’t we get the shell to log some pertinent information when it starts up. Information that will both alert us to the fact, and identify which potentially compromised process started it.

Introduction

Since attacks often involve running /bin/sh (on UNIX) or cmd.exe (on Windows), why don’t we add some code to the shell to get it to log not only the fact that it has started, but also its location, process id, command line arguments, and the process id and process path of the process that started it? This information not only alerts us to the fact that an attack may have been successful in compromising a process, but also tells us which process and, due to syslog timestamps, when.

Modifying cmd.exe will be tricky, since we don’t have the source code. Having said that, I vaguely remember something creating a modified version of cmd.exe by using the cmd.exe command from ReactOS, for which the source code is available.

Modifying the ReactOS cmd.exe will involve installing the ReactOS build environment, and Windows already has built-in auditing (Control Panel -> Administrative Tools -> Local Security Policy -> Security Settings -> Local Policies -> Audit Policy) which creates Event Log messages covering process creation/termination with the information that we’re after. EvtSys can be used to forward Event Log messages to a syslog server:

A new process has been created: New Process ID: 1272
  Image File Name: C:\WINDOWS\system32\cmd.exe Creator Process ID: 1500
  User Name: [username] Domain: [domainname] Logon ID: (0x0,0x10459)

As such, I’m going to ignore cmd.exe and concentrate on bash, which I believe replaces /bin/sh (usually with a symbolic link) on most Linux distributions these days. If you find that your system actually implements the Bourne shell and not bash (Bourne Again SHell — a successor of the Bourne shell), then the same principle should apply only you will obviously need to get the source code for /bin/sh instead of for bash.

You can tell if /bin/sh is actually a Bourne shell and not a bash by checking to see if /bin/sh is a real file and not a symlink, and running /bin/sh and seeing if echo $BASH_VERSION returns a version string (you have a bash) or not (you have a non-bash /bin/sh, probably a Bourne shell).

Obtaining the source code

Firstly, we’ll need to get the source code for bash. If you are using a Debian, or I suspect Ubuntu, distribution of Linux, this is (if you have configured source sources in apt‘s sources.list file) as easy as:

apt-get source bash

# change in to the top level source directory
# the actual directory name will depend on the version
# of the bash package.
cd bash-4.2+dfsg

# and then you'll need the following command to make sure that you have
# everything necessary to build the new version
apt-get build-dep bash

If not, you can download the source from the bash web site (http://www.gnu.org/software/bash/) and you’ll need to take care of the dependencies yourself (as a general rule, read the README and INSTALL files that ship with the source code).

Modifying the source code

Now that we have the source code, we will add a snippet of code at the start of main(), the program’s entry point, to log various bits of information to syslog. Before we do that though, we’ll need to add a line to include the syslog.h include file. I added this line immediately after the #include <pwd.h> line:

#include <syslog.h>

main() is located in the file shell.c. If you scroll down to the main() function’s definition, we’ll insert the following code just before the USE_VAR() calls:

/* reconstruct the command line from the argv[] array */
char cmdline[256];
memset(cmdline,0,sizeof(cmdline));
for (i = 0;i < argc;i++) {
  strncat(cmdline,argv[i],sizeof(cmdline) - strlen(cmdline) - 1);
  cmdline[strlen(cmdline)] = ' ';
  cmdline[strlen(cmdline) + 1] = '';
}

/* get our process id and that of our parent process */
int mypid = getpid();
int ppid = getppid();

/* used to obtain process names from pids */
char procpath[128];
char myexe[128];
char pexe[128];

/* zero the buffers used for the strings */
/* to make sure there will be a NULL byte */
/* before the end, to terminate the string */
memset(procpath,0,sizeof(procpath));
memset(myexe,0,sizeof(myexe));

/* we need to use the /proc/ filesystem to determine */
/* the process names. The snprintf() call creates the */
/* path to the /proc//exe symlink which we will use */
snprintf(procpath,sizeof(procpath) - 1,"/proc/%d/exe",mypid);

/* the readlink() call finds the destination of the symlink */
/* which in this case is the full path to the process' image file */
/* note that readlink() doesn't append a terminating null byte */
/* to the string that it creates -- this is the main reason we used */
/* memset() above to fill the buffer with null bytes */
readlink(procpath,myexe,sizeof(myexe) - 1);

/* rinse and repeat for our parent process */
memset(pexe,0,sizeof(pexe));
snprintf(procpath,sizeof(procpath) - 1,"/proc/%d/exe",ppid);
readlink(procpath,pexe,sizeof(pexe) - 1);

/* log the information via syslog */
openlog("bash",LOG_PID,LOG_USER);
syslog(LOG_USER | LOG_NOTICE,"%s (%d) invoked by %s (%d), with uid/euid/
  gid/egid %d/%d/%d/%d: %s",myexe,mypid,pexe,ppid,getuid(),geteuid(),
  getgid(),getegid(),cmdline);

The first block of code recreates the command line from the command line arguments in the argv[] array by concatenating them together, with spaces, in to one string, cmdline.

The code then uses getpid() and getppid() to get its own process id and its parent’s process id, respectively, before calling snprintf() which uses the process ids to build the path to the two processes’ exe symbolic links in the proc filesystem.

These symbolic links are found in each process’ /proc/ directory so, for instance, a process with a pid of 1235 would have a /proc/1235/ directory, within which would be a symlink called exe. The exe link’s destination is the full path to the process’ executable file.

The readlink() calls find the symbolic links’ destinations, and these are then logged to identify the process and its parent process by path.

The openlog() call sets the ident string (the string immediately after the hostname in the log messages). If we don’t call this, then the syslog() call defaults to setting it based on the process name, which can vary depending on how bash was invoked. For instance:

Feb  5 10:14:56  -su: /bin/bash (3738) invoked by /bin/su (3640), with
  uid/euid/gid/egid 0/0/0/0: -su

Since an attack may copy /bin/sh to another directory (and usually make it setuid root), we can’t guarantee that these log messages will contain the string /bin/bash.

The openlog() call causes syslog to always log these messages with the string bash immediately after the hostname in the log messages, and gives us a constant string that we can search for to find these messages. This will be necessary if using automated log file analysis, with something like logsurfer, to alert when these messages are logged.

Building the new version

Debian package

If you are using a Debian (Ubuntu may also work) system, then you can use Debian’s package commands to build a new bash package after adding our extra bit of source code. This has the advantage in that you can then install the changes as a package and not have to copy files around and worry about files being inconsistent with the original package.

The Debian package builder, dpkg-buildpackage, will extract a clean copy of the source code before building the package. As such, any changes we make to shell.c will be discarded when we build the package.

In order to get around this, we need to create a patch which the Debian package builder will apply after extracting the source code and before building the package. I have created a patch, logstartup.diff, which will apply the changes described above, to shell.c.

Change in to the top level directory, bash-<version>, save the patch to the debian/patches/ subdirectory, and then add the file name of the patch to the debian/patches/series.in file. I just added the name of my patch file, logstartup.diff, to the bottom of series.in.

You can then build the new bash package by running the following command in the bash-<version> directory:

dpkg-buildpackage -b

That should, eventually, spit out a few .deb files. You can install the new bash package, from the parent directory, with

cd ..
dpkg -i bash_4.2+dfsg-0.1_*.deb

or the version/architecture specific one generated on your box.

From GNU’s source archive

Follow the instructions in the README and INSTALL files but basically it is:

./configure
make
make install

Running the new bash

Running the new bash is exactly the same as running the old bash, except you should see something like the following in your syslog log files:

Feb  5 10:17:51 [hostname] bash[12518]: /bin/bash (12518) invoked by
  /usr/bin/xterm (12516), with uid/euid/gid/egid 1000/1000/1000/1000: -bash
Feb  5 10:21:35 [hostname] bash[22739]: /bin/bash (22739) invoked by
  /bin/su (22641), with uid/euid/gid/egid 0/0/0/0: -su

The first log entry is from me starting an xterm, and the xterm running a shell. The second log entry is from me running su.

What to look for

Of course logging that a shell has started isn’t much use if you don’t have anything, or anyone, monitoring the log files for the shell startup messages. What you are looking for is processes that usually shouldn’t start a shell, starting a shell. Some examples would be a web server such as Apache (httpd, apache2), a DNS server such as BIND (named), database servers such as PostgreSQL (postgres) and MySQL (mysqld), and generally any process which accepts network connections.

It could be possible for a web server to legitimately start a shell to run CGI scripts. This should be identifiable by the command line displayed in the log message, and whether you have any web pages that reference CGI scripts which use /bin/sh to run.

logsurfer can be used to monitor log files for patterns and perform actions such as sending an email or running a command. If you use EvtSys to get copies of Windows Event Log messages sent to a syslog server, then you can also use logsurfer to monitor for cmd.exe processes starting on Windows hosts.

Conclusion

So there you have a reasonably simple way of getting log messages about bash processes being started. The same principle should apply to other shells like sh, csh, tcsh, ksh, etc., but because I’ve only seen attacks that attempt to run /bin/sh (on UNIX hosts — I think that POSIX states that the Bourne shell, /bin/sh, must exist on a POSIX compliant UNIX system), I decided to modify bash, which I believe most Linux distributions symlink /bin/sh to.

To catch attacks causing processes to run /bin/sh, you will obviously need to add the logging source code to whichever shell /bin/sh runs, or replace /bin/sh with your modified shell. However, if you choose the latter approach then the shell that you choose to replace it with must obviously be capable of running /bin/sh scripts (which bash is) otherwise you will break scripts using /bin/sh (and I suspect also break POSIX compliance).

One comment on “Logging the Creation of Shell Processes”

Karl on 2016-02-14 at 00:05 said:

A quick postscript:

The proper way to do it would be to use the UNIX auditing software — auditd — as that will also catch the creation of rogue shells that are uploaded by an attacker (started from weird locations, like /dev/ for instance).

Modifying the source code though, as shown here, will have the advantage of sending syslog messages which can then be routed to a remote system (which is a tad harder with auditd log data, or it was the last time that I tried).

Log in to Reply

Malware Musings

Thoughts on malware and malware analysis