A Tutorial in x86 Assembly Language:
An Examination
of the
EICAR Standard AV Test Program
Including
A step-by-step Analysis of its Operation
using
Microsoft's DEBUG
Program
The EICAR (European Institute for Computer Anti-Virus Research ) Standard AV Test Program has two slightly different forms (designated as eicar68.com and eicar70.com. The digits signify how many bytes are in each file). The larger file is often created when a text editor places the cursor on a newline before saving it (MS-DOS's EDIT.COM in text mode does this); which adds the hexadecimal bytes 0D and 0A (a carriage return and linefeed ) to the end of the file. This does not affect the program's operation in any way; it functions the same no matter how many extra bytes are place at the end of its file (unless your OS refuses to execute a .COM file that exceeds the old 64 KiB memory limit). From this point on we'll refer to any of its formats (68, 70 or whatever length) as simply EICAR.COM.
There appears to have been three main requirements
for this program
The reason for this last requirement was to make it possible for the program to be created using only a text editor; the exclusion of spaces leaves no doubt about the total number of bytes. Another benefit of limiting the program to ASCII characters only is the ease of transmission by any email client/server.
Requirements 3 and 4, meant that the programmer could only use the HEX bytes 21 - 60 and 7B - 7E. But the only way a DOS program can send characters to the display screen is through an "interrupt function," and all x86 interrupts must begin with the hex byte, CD (decimal = 205). This obviously exceeds the range of the standard ASCII characters ( 0 to 127). So how did they do it? That's one of the things you'll learn by going through this tutorial.
Most programmers today rarely if ever deal with the kind of details presented in this tutorial. We wrote this page so students and even the average PC user could appreciate both the complexity involved in running a very simple program and early programmers of the past. Programmers today normally use high-level macro instructions and libraries of pre-assembled code. A single statement in these high-level languages often produces the equivalent of dozens to even pages full of assembly instructions compared to the few we'll be examining here. |
If you're using an Anti-Virus program set to run in the background, Eicar.com should trigger an "alarm" when you either access or simply try to save the file! That's what it's supposed to do; your AV program should also tell you that it's the EICAR test program. (Note: Some AV programs can be set to exclude Eicar.com so you can actually run the program normally.) If your AV program doesn't alert you immediately (when you either create or extract Eicar.com from our .ZIP archive), it's most likely set to do manual scanning only; scan the file (as you would any other program you download) to see if your AV program alerts you!
If
your AV program is disabled or only does manual scans, then executing EICAR.COM
should print out the following when run in a DOS-box:
EICAR-STANDARD-ANTIVIRUS-TEST-FILE!
Even if you can't run the program (say you're using a workstation in
a Library for example), you could still learn something by viewing the
illustrations below and reading the information presented here.
Download
the file (with brief text description) from The Starman's Realm: EICAR.ZIP
.
Or, create it
yourself by pasting the following characters into Notepad and saving the file
with a .com extension. (Note: Some text editors will always
change a file's extension to .txt; if that's the case, you'll get a filename
like 'eicar.com.txt' and need to change it.)
The eicar.com program:
Warning: Using the W (write) command in DEBUG can result in loss of data on your hard drive! Do NOT experiment with this command. For more information, see Guide to DEBUG. |
Although we could place Eicar.com in some folder, note its location and load the program into DEBUG.COM, most of us will never be able to create the file due to some AV program not allowing that! So, we'll simply paste the code into DEBUG manually.
If you've never used a DOS-window before, you should read How to Use a DOS Window; others may wish to skim its contents for new information. A list of DOS 7 Internal Commands is here, but the only two you might need are the cd (Change Directory) command (to get to a folder you store a file in) and the exit command (to end your DOS session); DOS-windows are usually set to open at the C:\WINDOWS folder.
Open a DOS-window
(sometimes called a "Command Prompt") at this time. |
Once you've got the DOS prompt at the folder where
you saved eicar.com, load the program into DEBUG by entering:
debug eicar.com
(See green text in the pic below.) The only thing you'll see is a little
dash [-] on the next line. This is the "command prompt" for
DEBUG. Enter the letter d or D (case doesn't matter in DEBUG)
and you should see a display similar to the one below. For each d command
you enter, only 128 bytes are displayed at a time. But our program is just 68
bytes long, so you can ignore all the rest (under the yellow lines). Each line
contains 16 bytes of code or data. [ All numbers displayed in DEBUG are hexadecimal.
( For a detailed study, see: What
Is 'Hexadecimal'?) ]
The two numbers separated by a colon (:) at the
beginning of each line tell you where the first byte of that line is located
in your computer's memory. The first number is called the Segment and
the second is the Offset. [ For a detailed study see: Removing
the Mystery from the SEGMENT:OFFSET Addressing scheme.]
In the pic above, we see
that EICAR.COM was placed into memory beginning at a segment
:offset location of 1795:0100. It's highly unlikely that
your computer will load the program into the same segment of memory (1795),
but the offset (0100) will be the same. DEBUG always
loads .COM files so the first byte of the program has an offset of 0100 hex.
[ The first 256 bytes (00h to FFh) of
the segment contain information that DOS uses to run the
program (that section is called the PSP - Program Segment Prefix.) ]
(Program Instruction Step
1):
Now enter the letter r
(or R) and
you should see a display of your CPU's 16-bit registers similar to the
pic below. The numbers at the beginning of the last line are always the segment
:offset location of the x86 machine instruction which is ready
for execution. As we step through this program, the offset will always equal
the value in the IP
register ( Instruction Pointer). For all true .COM files (size must be less
than 64kb), the CX
register will always contain the length of the program unless the code changes
it. Here CX = 44 hex = 68 decimal bytes.
Following the segment:offset (1795:0100) pair in
the last line is the hex number 58 this is the first byte of the program's
code. It is then decoded as the assembly language instruction
"POP
AX" which means to take the last two bytes of the
Stack (stored at offsets FFFE and FFFF ) and move them into the AX
register. [ My experience has taught me that DOS always 'zeroes-out'
the last two bytes of a segment used by .COM programs (the SP always
being set to FFFE), so executing POP AX should still leave us with zero
in the AX register. The Stack Pointer will, of course, be changed
to 0000 in the process. Under normal circumstances, however, I would
never consider this to be an example of good programming practice
and would recommend using an XOR AX,AX instruction to zero-out
the AX register first. ]
(Instruction Step 2):
Enter a 't'
(or T; for Trace) at the DEBUG prompt to carry out the POP instruction. This
will also display any register changes and decode the next instruction
(at IP = 101).
AX=011C BX=0140 CX=0044 DX=0000 SP=FFFC DS=1795 ES=1795 SS=1795 CS=1795 IP=010D 1795:010D 5A POP DX AX=011C BX=0140 CX=0044 DX=011C SP=FFFE DS=1795 ES=1795 SS=1795 CS=1795 IP=010E 1795:010E 58 POP AX AX=214F BX=0140 CX=0044 DX=011C SP=0000 DS=1795 ES=1795 SS=1795 CS=1795 IP=010F 1795:010F 353428 XOR AX,2834
The following Table gives the equivalents of 214F and 2834 in Binary and provides a bit-level graphic display of how the XOR function operates:
214Fh -> 0010 0001 0100 1111 2834h -> 0010 1000 0011 0100 ( XOR ) --------------------- 0000 1001 0111 1011 0 9 7 B |
(Instruction Steps 12 through
14):
Have you guessed what the number in the BX
register (0140) refers to yet?
Hint: It's a location in memory near the end of our program; and we're running
out of code! As we continue stepping through the code (see below), keep in mind
how numerical WORD-sized values are stored in memory (Low-byte, High-byte).
Just before and after you execute the instruction
at IP = 114, enter the Dump command 'd 140 143' to see how this Subtract instruction
changes the bytes in memory near the end of our program:
-r <-- To see the registers again... AX=097B BX=0140 CX=0044 DX=011C SP=0000 BP=0000 SI=097B DI=0000 DS=1795 ES=1795 SS=1795 CS=1795 IP=0116 NV UP EI PL NZ AC PO NC 1795:0116 43 INC BX [Step 15] -p AX=097B BX=0141 CX=0044 DX=011C SP=0000 BP=0000 SI=097B DI=0000 DS=1795 ES=1795 SS=1795 CS=1795 IP=0117 NV UP EI PL NZ NA PE NC 1795:0117 43 INC BX [Step 16] -p(Instruction Steps 17 and 18):
-d 11c 13f :0110 45 49 43 41 EICA :0120 52 2D 53 54 41 4E 44 41-52 44 2D 41 4E 54 49 56 R-STANDARD-ANTIV :0130 49 52 55 53 2D 54 45 53-54 2D 46 49 4C 45 21 24 IRUS-TEST-FILE!$So why did the programmer use JGE 0140 (7D 24) instead of an un-conditional jump instruction? Simply because a short JMP instruction here would begin with the byte EB (decimal=235) which is again beyond the upper limit set for the program's characters. Since there are no conditions which keep execution from making this jump, JGE is an acceptable substitution here.
xxxx:0100 58 POP AX xxxx:0101 354F21 XOR AX,214F xxxx:0104 50 PUSH AX xxxx:0105 254041 AND AX,4140 xxxx:0108 50 PUSH AX xxxx:0109 5B POP BX ;--> Places 0140 in BX xxxx:010A 345C XOR AL,5C xxxx:010C 50 PUSH AX xxxx:010D 5A POP DX ;--> Places 011C in DX xxxx:010E 58 POP AX xxxx:010F 353428 XOR AX,2834 xxxx:0112 50 PUSH AX xxxx:0113 5E POP SI xxxx:0114 2937 SUB [BX],SI ;--> changes bytes at 140-141 xxxx:0116 43 INC BX xxxx:0117 43 INC BX xxxx:0118 2937 SUB [BX],SI ;--> changes bytes at 142-143 xxxx:011A 7D24 JGE 0140 ;--> Jumps over data string to ; the last two instructions xxxx:011C 45 49 43 41 52 2D 53 54 41 EICAR-STA xxxx:0125 4E 44 41 52 44 2D 41 4E 54 NDARD-ANT DATA STRING xxxx:012E 49 56 49 52 55 53 2D 54 45 IVIRUS-TE which is displayed xxxx:0137 53 54 2D 46 49 4C 45 21 24 ST-FILE!$ by the program. xxxx:0140 CD21 INT 21 ;--> DOS Function 09h: ; Displays the text. xxxx:0142 CD20 INT 20 ;--> Program Termination funct.