With malware beginning to search for documents, images, and other file types, often to encrypt them or to delete them, I began to wonder if there could be a simple way to protect your files. What if we made different file types look like types of farm-yard animals, without making them unusable?
Ok, so making them look like farm-yard animals may be a tad daft (I was grasping for some humour and that happened to fit with the title), however, the principle remains — if malware is searching for all .doc files, for instance, can we change files’ extensions without losing the association with the application that opens them?
This obviously isn’t the best strategy because it would be easy for malware to work around it once it was aware of it, but it may help to protect your documents in the short-term, until you can put proper protection in place.
There is a registry key containing keys for a number of file name extensions and information linking the extensions to applications, namely HKEY_LOCAL_MACHINE\SOFTWARE\Classes.
We can rename all .doc files, say, to .sheep, and then create a registry key, HKLM\SOFTWARE\Classes\.sheep with the same contents as HKLM\SOFTWARE\Classes\.doc. Note that HKLM is a common abbreviation of HKEY_LOCAL_MACHINE.
Now not being one for performing the same manual task over and over, I was determined to script this. I was also determined to script this using software that is available on a standard Windows build. This meant that I was reluctant to use PowerShell.
After spending a bit of time looking at the MS-DOS for command and regedit, and realising that it is actually quite difficult to do the equivalent of
reg export HKLM\SOFTWARE\Classes\.doc | sed 's/\.doc/\.sheep/g' > doc2sheep.reg && reg import doc2sheep.reg ; del doc2sheep.reg
we’re going to have to settle for a compromise.
I stumbled across two MS-DOS commands: assoc and ftype. assoc associates file extensions with file types, and ftype associates file types with an application. This means that we can use the assoc command to associate the .sheep extension with the same file type that .doc is associated with, namely WordPad.Document.1 (default on Windows XP without an office suite installed).
So, if we test our theory with the following commands:
assoc .sheep=WordPad.Document.1 start testdoc.sheep
we can see that WordPad starts up to open the .sheep file — smashing. The next question is, can we do this in a single command such that we can then script the creation of a new extension that is equivalent to an existing extension?
for /f "delims== tokens=2 usebackq" %a in (`assoc .doc`) do assoc .sheep=%a
That command will associate the .sheep extension with whichever file type the .doc extension is associated with. The command works by running the assoc .doc command in between the back-quote characters, and processing its output. Note that the usebackq option is required to be able to enclose the command in back-quotes — I use that as it is the same method used to reference command output in UNIX.
The assoc .doc command will produce output similar to the following (default on Windows XP without an office suite installed):
.doc=WordPad.Document.1
The for command then processes that output line by line (although the assoc command has only output one line). The delims== option sets the delimiter to the ‘=’ character. The tokens=2 option then states that we would like the %a variable set to the second field. Fields being separated by the delimiter character (‘=’), that will set the %a variable to the WordPad.Document.1 string. This is good (which often isn’t the case when doing things in MS-DOS), as it will enable us to pass that as a parameter to the assoc command after the do keyword.
Basically, that for command above is the MS-DOS equivalent of the UNIX command line:
assoc .sheep=`assoc .doc | cut -d= -f2`
Now to rename the files:
for /f "delims= usebackq" %f in (`dir c:\*.doc /s/b`) do ren "%f" "%~nf.sheep"
That took some messing around and experimentation to work out the particular peculiarities of the MS-DOS ren command. Namely, that you can’t pass a complete path as the second parameter, presumably to stop you from specifying a different path and hence making it the same as the move command. If it stops you from specifying a path in the second parameter, then it doesn’t have to check that it is the same as the path of the source file.
The previous for command runs the dir command in the back-quotes and processes its output line by line. The /s tells dir to search subdirectories, and the /b tells it to only output the full path of the matching files (without all of the headers and other fluff).
In this case, the delims= option, which doesn’t actually specify a delimiter, is to stop the for command from using the space and tab characters as a delimiter (which is the default behaviour) — I was surprised that this worked actually. We need to do this because file names and paths can contain a space character. Without this the for command will run the ren command with C:\Documents as the source file, rather than C:\Document and Settings\…\example.doc (for instance).
The other option would be to specify tokens=* to specify that you want all of the tokens to be passed to the %f variable. I figured that it makes more sense to specify not to break the line up in to fields (delims=), rather than to break it up only to request all of the fields (tokens=*).
Right. So the for command then runs the ren command (after the do keyword). The %f variable expands to a line of output from the dir command which, thanks to the /b option to dir, is the complete file spec (path and file name) of a matching file name. The %~nf variable reference expands to the file name component (without the file extension) of the file spec in the %f variable (obviously!).
The previous for command is the equivalent of the following UNIX command line, which in this case is actually more complicated than the MS-DOS command line:
find c:\\ -name \*.doc -print | while read filename; do fname="`echo $filename | sed 's/\.doc$/\.sheep/'`" mv "$filename" "$fname" done
The main reason for the complexity in the UNIX version is because we need to remove the .doc extension from the file name, and substitute .sheep for it (the sed command). We need to use a while loop because we can’t pipe from within the find command’s -exec predicate.
What we need now are MS-DOS command lines, tied up with strings, also known as an MS-DOS batch file:
for /f "delims== tokens=2 usebackq" %%a in (`assoc %1`) do assoc %2=%%a for /f "delims= usebackq" %%f in (`dir c:\*%1 /s/b`) do ren "%%f" "%%~nf%2
Those two commands can be saved in to a .bat file and then ran with two extensions. It will create an association for the second extension and then rename all files with the first extension, found underneath c:\, to files with the second extension. For example:
dupext.bat .doc .sheep
will rename all .doc files underneath c:\ to .sheep files after associating .sheep with whichever application .doc is associated with. You could add an extra parameter to specify the directory to search (to replace the hard coded c:\ in the above commands), or remove the c:\ from the dir command in the second line to only search beneath the current directory.
Now, if we tweak it a bit, we can use the same .bat file to undo these changes:
if "%1" == "-r" goto undoassoc for /f "delims== tokens=2 usebackq" %%a in (`assoc %1`) do assoc %2=%%a goto rename :undoassoc shift assoc %1= :rename for /f "delims= usebackq" %%f in (`dir c:\*%1 /s/b`) do ren "%%f" "%%~nf%2
We can get even fancier. If we create a text file with the current extension and the target extension together on a line, say like:
.doc .sheep .xls .tablecloth .jpg .outoffocuspic
Then we can use a for loop to run the batch file to change each of the extensions listed in the file:
for /f "tokens=1,2" %a in (extensions.txt) do renext %a %b
Where extensions.txt is the name of the text file that you created. Plus, now that we’ve added the -r option to reverse things, we can do:
for /f "tokens=1,2" %a in (extensions.txt) do renext -r %b %a
to put things back again.
Those last two for commands process lines in the extensions.txt text file, break them in to fields delimited by the space (or tab) character, and assign the first field (token) to the %a variable, and the second to the next (%b) variable (run for /? for documentation on the for command). It then runs the batch file passing it the two extensions from the text file.
We need to reverse the two parameters in the second command to save us from having to use two separate ren commands in the batch file — one to rename the files one way and the other to rename them back again. That is, if we reverse the parameters when running the batch file, then we can use the same ren command in the batch file regardless of whether it is run with the -r option or not.
My complete batch file (after I added some checks for number of parameters) can be downloaded from http://malwaremusings.com/scripts/dupext-bat. It can be improved a bit — I started with MS-DOS v2.11 back in the eighties, and it shows. I noticed just after I wrote dupext.bat that the if command now seems to support blocks of commands and an else clause.
Also, the usage text that it outputs has to have quotes around it. If it doesn’t, then cmd.exe gives you an error about ambiguous redirect, and rightly so. So you use quotes to tell the shell that the ‘<‘ and ‘>’ characters are part of a string to be passed to the echo command and not input and output redirections, but the shell doesn’t remove the quotes and actually passes them to the echo command… which outputs them. Brilliant.
So there we have it, a quick and dirty way to obtain some short term protection from malicious software that targets files with specific extensions. Obviously this is only security-by-obscurity and easy to work around if you know that it is in place. Note that it leaves the old association in place, so if you receive files with the original extension (.doc in the above example), then they will still open as normal.