Compression and Encryption

Yesterday I watched the Handmade Hero videos on compression and I thought about what is different between compression and encryption.And why most compression software provide password features .Is it easy to encrypt when compressed or vice versa.
And how do these software store password data in the file .Do they use an algorithm to encrypt password and store it in the file header(maybe XOR).

And will Casey do a video on encryption for maybe i don't know save-file protection.


Edited by Shazan Shums on Reason: Typo
msmshazan
Yesterday I watched the Handmade Hero videos on compression and I thought about what is different between compression and encryption.And why most compression software provide password features .Is it easy to encrypt when compressed or vice versa.

It's easy to encrypt after compression, but not easy to compress after encryption. If you do encryption well, the resulting data is almost indistinguishable from random data. That means there's little for a compression algorithm to work with in the way of redundancy it can remove.

msmshazan
And how do these software store password data in the file .Do they use an algorithm to encrypt password and store it in the file header(maybe XOR).

No, that would be quite unsafe and make it easy to recover the password. What they do is try and decrypt with the password you give it (after perhaps using a key stretching algorithm on it), and then they see if the result makes sense as a compressed stream. If not, it's the wrong password.

msmshazan
And will Casey do a video on encryption for maybe i don't know save-file protection.

I think he mentioned he wouldn't provide any game protection features like that, what with the source being available and all :)

Edited by Jeroen van Rijn on
So if i am writing a basic RLE compressor .What is the encryption to be used? Just for practicing.

Should i just XOR all the File Contents after compressing. Any guidance would be helpful.

Edited by Shazan Shums on
msmshazan
So if i am writing a basic RLE compressor .What is the encryption to be used? Just for practicing.

Should i just XOR all the File Contents after compressing. Any guidance would be helpful.

Just for practice you could indeed first compress the file contents and then XOR with the password. Rather than doing that in two stages, maybe just run the output of the RLE stage directly through the encryption instead of first compressing all of it, then encrypting.

But, don't be under the impression that is this in any way secure :) It'll do for practice, however. Then later you can read up on how better encryption works. RSA is easy to implement and Feistel networks are easy to understand.

Some things to look into, maybe: Feistel, S-Box, RC4, RSA, Blowfish, Twofish, key stretching/derivation, PBKDF2.
Kelimion
msmshazan
Yesterday I watched the Handmade Hero videos on compression and I thought about what is different between compression and encryption.And why most compression software provide password features .Is it easy to encrypt when compressed or vice versa.

It's easy to encrypt after compression, but not easy to compress after encryption. If you do encryption well, the resulting data is almost indistinguishable from random data. That means there's little for a compression algorithm to work with in the way of redundancy it can remove.


But encrypting after compression leaks information about the length of the compressed data (down to the block granularity). Which in turn leaks information about the general structure of the file (with an LZ compression that means number of repeated strings). If the attacker is in control of even a small part of the message then he can find out more about the rest of the message.

It's things like this that make security and encryption a hard problem.
This is the XOR function but its so slow,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
static void
Encrypt(size_t MaxOutSize,u8 *Out,char * keys)
{
    u8 * key = (u8 *)keys;
    
    for(unsigned int i=0; i<MaxOutSize;)
    {
        for(int j = 0; i<strlen(keys);j++)
        {
        Out[i]=Out[i]^key[i];
        i++;
        }
    }
}



And here's main


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
int main(int ArgCount ,char** Args)
{
    if(ArgCount >= 4)
    {
        size_t FinalOutputSize =0;
        u8 *FinalOutputBuffer = 0;
        char * Command = Args[1];
        char * InFileName = Args[2];
        char * OutFileName = Args[3];
        char * Password = Args[4];
        
        file_contents InFile = ReadEntireFileIntoMemory(InFileName);

        if(ArgCount == 5)
        {
           Password = Args[4];
        }

        if(strcmp(Command ,"compress") == 0)
        {
            size_t OutBufferSize = GetMaximumCompressedOutputSize(InFile.FileSize);
            u8 *OutBuffer = (u8 *)malloc(OutBufferSize);
            size_t CompressedSize =
                Compress(InFile.FileSize,InFile.Contents,OutBufferSize,OutBuffer + 4);
            if(ArgCount == 5) Encrypt(OutBufferSize,OutBuffer,Password);
            *(int unsigned *)OutBuffer = (int unsigned)InFile.FileSize;

            FinalOutputSize = CompressedSize + 4;
            FinalOutputBuffer = OutBuffer;
            
        }
        else if(strcmp(Command ,"decompress") == 0)
        {
            if(InFile.FileSize >=4)
            {
                size_t OutBufferSize = *(int unsigned *)InFile.Contents;
                u8 *OutBuffer = (u8 *)malloc(OutBufferSize);
                if(ArgCount == 5) Decrypt(OutBufferSize,OutBuffer,Password);
                Decompress(InFile.FileSize - 4,InFile.Contents + 4,
                           OutBufferSize,OutBuffer);
                FinalOutputSize = OutBufferSize;
                FinalOutputBuffer = OutBuffer;
            }
            else
            {
                fprintf(stderr,"Invalid input\n");
            }
        }
        else
        {
            fprintf(stderr,"Unrecognized command %s\n",Command);
        }
        
        if(FinalOutputBuffer)
        {
            FILE *OutFile  = fopen(OutFileName,"wb");

            if(OutFile)
            {
                fwrite(FinalOutputBuffer,1,FinalOutputSize,OutFile);
            }
            else
            {
                fprintf(stderr,"Unable to open output file %s\n",OutFileName);
            }
        }
    }
    else
    {
        fprintf(stderr,"Usage: %s compress [raw filename] [compressed filename] [optional:password]\n",
                Args[0]);
        fprintf(stderr,"       %s decompress [compressed filename] [raw filename] [optional:password]\n",
                Args[0]);
        
    }
    return 0;
}


ratchetfreak
But encrypting after compression leaks information about the length of the compressed data (down to the block granularity). Which in turn leaks information about the general structure of the file (with an LZ compression that means number of repeated strings). If the attacker is in control of even a small part of the message then he can find out more about the rest of the message.

It's things like this that make security and encryption a hard problem.

Yup, no disagreement from me there. None of my above advice was intended as a way to implement good encryption, just a few points to get someone started and read up more from there.

Implementing encryption that can withstand attack isn't all about the maths either. You have to implement it in such a way that the math isn't subtly altered. That's just so that data at rest is safe. On top of that you'll want to write the code so that it can withstand timing and other side channel attacks. If that isn't done, then even the data at rest isn't safe potentially if a side channel attack could've retrieved all or part of the key material or internal state during encryption.

Take away lesson: If you don't know what you're doing, don't implement your own encryption when you rely on it for something important. Learn about it by all means. If all you're after is obfuscation for unimportant data, sure, have a crack at it.

Encryption and security are hard. Half the battle is knowing what is and is not potentially insecure. Even if you use pre-made encryption, make sure you know what you're doing. Using the wrong mode or the wrong number of rounds, or any number of things, can make an ostensibly secure solution insecure.

Good times, huh? :)
But don't let that discourage you. It's a fascinating field of study.
msmshazan
This is the XOR function but its so slow,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
static void
Encrypt(size_t MaxOutSize,u8 *Out,char * keys)
{
    u8 * key = (u8 *)keys;
    
    for(unsigned int i=0; i<MaxOutSize;)
    {
        for(int j = 0; i<strlen(keys);j++)
        {
        Out[i]=Out[i]^key[i];
        i++;
        }
    }
}


So, one reason this is likely slow is you're calling strlen inside the loop there. Believe it or not, compilers will often fail to optimize that out, and so an otherwise O(n) loop become O(n^2) and we're unhappy and slow.

Secondly, that function probably doesn't do what you wanted it to: assuming the key is shorter than the data you're encrypting, it only encrypts the first N bytes, where N is the length of your string. (There may be transcription errors in there, for example j is never used.) Try something more like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
static void
Encrypt(size_t MaxOutSize, u8 * Out, char * keys)
{
    u8 * key = (u8 *)keys;
    size_t key_len = strlen(key);

    for (size_t i = 0; i < MaxOutSize; ++i)
    {
        size_t j = i % key_len;
        Out[i] = Out[i] ^ key[j];
    }
}

What you're doing, for XOR, is basically duplicating the password over and over and XORing it with the corresponding byte of output. Each consecutive byte gets XORed with a different part of the password, which is where the security of this comes from. (If your password is shorter than the data, that security is minimal, however if your password is longer than the data it's actually fairly secure -- cracking an XOR scheme depends on that repetition.)
msmshazan
This is the XOR function but its so slow,





there's a few errors in there:

i<strlen(keys) is wrong in 3 ways, it should be j, strlen should be cached outside the loops and i should also be checked against MaxOutSize.

Out[i]=Out[i]^key[i]; should be Out[i]=Out[i]^key[j];

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
static void
Encrypt(size_t MaxOutSize,u8 *Out,char * keys)
{
    u8 * key = (u8 *)keys;
    int keylen = strlen(keys);

    for(unsigned int i=0; i<MaxOutSize;)
    {
        for(int j = 0; j<keylen && i<MaxOutSize;j++)
        {
        Out[i]=Out[i]^key[j];
        i++;
        }
    }
}

Edited by ratchetfreak on