Some days ago, Hugh managed to recover a bunch of files from a ZIP disk. Among the files there was this one special archive file:
VAX.ARC. So close, yet so far.
Despite all the efforts Hugh had taken to track down, compile and run the ARC 5.21p tool, the file has blatantly withstood all extraction attempts. After having a closer look at the ARC sources, there was only one reasonable conclusion: The file is encrypted.
Considering the colourful past of the ARC compression tools, there was another minimal chance: A compatibility problem between ARC 5.21p and the version that was used to create the archive. Diving deep into my old archive CDs brought up a handful of tools with support for the ARC format: arca, arce, pkarc, arc - DOS files from the late 80's or early 90's. Firing up DOSBOX and ... you guessed it. All show the same CRC errors as ARC 5.21. A little bit disappointing, but certainly not unexpected.
Ok, the password then. Taking a closer look at the ARC sources reveals that we are facing a repeating XOR encryption. Actually nothing fancy, but the encryption is applied to the packed data. This makes everything a lot harder, because it prevents frequency analysis to a high degree.
The only obvious way open to us seems to be brute-forcing the password. But even considering the simplicity of the algorithm we are probably talking about a substantial time frame. Assuming a password length of at least 7 or 8 chars and the fact that we must verify the passwords through the CRC calculation, we are talking about days if not weeks in the worst case. Plus Hugh has detected some serious inconsistencies aka bugs in ARC 5.21 when he added a brute-force option. We probably can reduce the key space a little bit: An early documentation of ARC suggests that all passwords are converted to uppercase and considering the fact that the user has to enter the passwords via shell and the fact that we are talking about the late 80's, we can assume a password consisting of alphanumeric chars. However, not exactly a giant leap forward.
Sigh, and it looked so promising:
alice.tar 9 (Squashed) 474673 1421312 1991-06-23 16:01 4AD7 animator.tar 9 (Squashed) 1013114 2396160 1991-06-23 21:33 69CE as68.tar 9 (Squashed) 226903 675840 1991-06-23 21:46 C5D8 asasin.tar 9 (Squashed) 506893 1157120 1991-06-23 18:38 F320 dvi2ps.gfx 9 (Squashed) 136691 266240 1991-06-23 21:51 60D5 emu.tar 9 (Squashed) 924620 2211840 1991-06-23 22:33 3CDA fish.tar 9 (Squashed) 530010 1198080 1991-06-23 18:03 1198 fred23jr.tar 9 (Squashed) 42197 112640 1991-06-23 13:47 A1F6 gfi.tar 9 (Squashed) 149121 296960 1991-06-23 22:38 F339 kermit.tar 4 (Squeezed) 324504 358400 1991-06-23 22:45 45A7 lnk.tar 9 (Squashed) 262342 921600 1991-06-23 22:59 8B24 miscc.tar 9 (Squashed) 31894 81920 1991-06-23 23:01 2633 patch.tar 9 (Squashed) 75587 163840 1991-06-23 23:04 9AB0 rcs.tar 9 (Squashed) 216397 532480 1991-06-23 23:13 7FD8 sps2.tar 9 (Squashed) 76224 188416 1991-06-23 23:16 1585 undump.tar 9 (Squashed) 23259 61440 1991-06-23 23:17 B60C windows.rcs 2 (Stored) 2593536 2593536 1991-05-13 20:07 DA28
But wait, there is one file in the archive that is particularly special:
windows.rcs 2 (Stored) 2593536 2593536 1991-05-13 20:07 DA28
It is stored, not packed. This has the original data XOR'ed with the password. We might gain a foothold! To begin with, we extract the encrypted data from the archive. Now we have a WINDOWS.RCS, which is still encrypted, though.
Next we need to go for the password. Saying that we need the password length and the actual password chars. Time for a little statistical probing. For each possible password length we shift our raw data about the password length and XOR it with the original raw data.
49 D4 D4 6A A5 81 DE BD D4 84 DF 74 49 D4 D4 6A A5 81 DE BD D4 84 DF 74 49 D4 D4 6A A5 81 DE BD D4 84 DF 74
Why would we do that? XOR'ing a value with itself results in zero. a xor a = 0. For each shift we count the number of resulting zeroes. If we have an unusual high number of zeroes we hit a potential password length - or a multiple of it. Counting for passwords up to 16 chars, we get the following numbers:
1: 17972 2: 17971 3: 17981 4: 18416 5: 17968 6: 18136 7: 17966 8: 24309 9: 18019 10: 18039 11: 17994 12: 18655 13: 18023 14: 18148 15: 18023 16: 24343
See, what we got. The numbers for 8 chars and 16 chars are significantly higher than the other ones. Chances are high, our password has a length of 8 or 16 chars and for the above reasons we assume an 8 char password.
Next we need to go for the actual password. If we assume an 8 char password every 9th byte in the original data is encrypted with the same char. The first char of the password encrypts the bytes 0,8,16,...; the second char encrypts 1,9,17,...; and so on. Time to run a frequency analysis on each of the 8 groups.
But what are we looking for? ARC decided to store the file in the archive. This means that WINDOWS.RCS almost certainly contains random binary data. The most frequent byte in a random binary file might likely be 0x00. Let's count and see what we get:
In the first group we have a maximum of 4265 hits for char 86. Now here comes the beautiful part: Remember a xor a = 0? Assuming that the most frequent byte is 0x00, the value 86 would immediately represent the first char of the password: 'V'.
What do we get from the other 7 groups?
73 with 1786 hits 68 with 4123 hits 69 with 1901 hits 79 with 4292 hits 77 with 1810 hits 79 with 4266 hits 78 with 1874 hits
There it is: the moment of truth... Firing up DOSBOX again. We got it!