HashTables and Some Unexpected Code Performance

mojobojo

#2463

February 6, 2015

So I am playing around with hash tables and looking at how it effects performance. I wrote three different ways. My test consisted of 1000 integers and 1000 strings and I use the integers, hash them, then use them for looking up the strings. I used a performance counter to measure the time in seconds it takes for it to add 1000 to the table then again how long it takes to look them all up.

First one was the "naive approach", the idea was that I would write it as if I needed some way to look up strings given an integer value and did not know the concept of hash tables. Not surprisingly this one had the most loop iterations done. Given the simplicity of adding to the table I expected adding to be the fastest and the look up to be the slowest out of all the functions.

D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000005 TimeEnd 0.000043
Time taken to add 1000 values to table 0.000038 Iterations 1000

TimeBegin 0.000350 TimeEnd 0.000377
Time taken to look up 1000 values 0.000027 Iterations 1000


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000009 TimeEnd 0.000084
Time taken to add 1000 values to table 0.000074 Iterations 1000

TimeBegin 0.000785 TimeEnd 0.000836
Time taken to look up 1000 values 0.000051 Iterations 1000


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000009 TimeEnd 0.000055
Time taken to add 1000 values to table 0.000046 Iterations 1000

TimeBegin 0.000547 TimeEnd 0.000586
Time taken to look up 1000 values 0.000039 Iterations 1000


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000009 TimeEnd 0.000055
Time taken to add 1000 values to table 0.000045 Iterations 1000

TimeBegin 0.000558 TimeEnd 0.000597
Time taken to look up 1000 values 0.000038 Iterations 1000


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000006 TimeEnd 0.000036
Time taken to add 1000 values to table 0.000030 Iterations 1000

TimeBegin 0.000324 TimeEnd 0.000352
Time taken to look up 1000 values 0.000028 Iterations 1000

Second one was internal chaining. This one had less loop iterations but the time it took to look up was slower.

D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
TimeBegin 0.000009 TimeEnd 0.000134
Time taken to add 1000 values to table 0.000125 Iterations 703

TimeBegin 0.000598 TimeEnd 0.000711
Time taken to look up 1000 values 0.000113 Iterations 702


D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
TimeBegin 0.000009 TimeEnd 0.000109
Time taken to add 1000 values to table 0.000101 Iterations 703

TimeBegin 0.000817 TimeEnd 0.000902
Time taken to look up 1000 values 0.000085 Iterations 702


D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
TimeBegin 0.000008 TimeEnd 0.000098
Time taken to add 1000 values to table 0.000090 Iterations 703

TimeBegin 0.000570 TimeEnd 0.000667
Time taken to look up 1000 values 0.000097 Iterations 702


D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
TimeBegin 0.000008 TimeEnd 0.000099
Time taken to add 1000 values to table 0.000091 Iterations 703

TimeBegin 0.000499 TimeEnd 0.000594
Time taken to look up 1000 values 0.000094 Iterations 702


D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
TimeBegin 0.000004 TimeEnd 0.000038
Time taken to add 1000 values to table 0.000035 Iterations 703

TimeBegin 0.000215 TimeEnd 0.000247
Time taken to look up 1000 values 0.000032 Iterations 702

Third one was external chaining. This one had a surprisingly low amount of loop iterations. I chose to allocate memory for each new node so I expected adding to be the slowest of the three. Look up was still slower than the first despite even less loop iterations than the previous.

D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
TimeBegin 0.000007 TimeEnd 0.000135
Time taken to add 1000 values to table 0.000128 Iterations 232

TimeBegin 0.001469 TimeEnd 0.001539
Time taken to look up 1000 values 0.000070 Iterations 431


D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
TimeBegin 0.000007 TimeEnd 0.000438
Time taken to add 1000 values to table 0.000432 Iterations 232

TimeBegin 0.000776 TimeEnd 0.000824
Time taken to look up 1000 values 0.000048 Iterations 431


D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
TimeBegin 0.000004 TimeEnd 0.000049
Time taken to add 1000 values to table 0.000045 Iterations 232

TimeBegin 0.000266 TimeEnd 0.000310
Time taken to look up 1000 values 0.000044 Iterations 431


D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
TimeBegin 0.000008 TimeEnd 0.000128
Time taken to add 1000 values to table 0.000120 Iterations 232


TimeBegin 0.001622 TimeEnd 0.001693
Time taken to look up 1000 values 0.000070 Iterations 431


D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
TimeBegin 0.000006 TimeEnd 0.000096
Time taken to add 1000 values to table 0.000090 Iterations 232

TimeBegin 0.001861 TimeEnd 0.001929
Time taken to look up 1000 values 0.000068 Iterations 431

I guess more importantly is my question:
Why is the naive approach working faster than the others?

I have attached my source code, it is for windows.
(Also gotta thank Casey for showing how easy cl is to use, breaking my dependency of Visual Studio and allowing me to use Sublime in a much more efficient way)

mojobojo

#2464

February 6, 2015

Oh man, should have waited 10 more minutes and I would have had my answer. After careful evaluation of HashTablesNaive.cc I realized I had one of the worst typos that you never get notified about in c/c++.

I noticed the iteration count for looking up was not right so I checked the loop and.....

1	Line 82: if (Table[i].Lookup = Lookup)

Sigh......

Well this is a bit more expected.

D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000007 TimeEnd 0.000036

Time taken to add 1000 values to table 0.000030 Iterations 1000
TimeBegin 0.000321 TimeEnd 0.003097
Time taken to look up 1000 values 0.002776 Iterations 500433


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000004 TimeEnd 0.000019

Time taken to add 1000 values to table 0.000016 Iterations 1000
TimeBegin 0.000209 TimeEnd 0.001614
Time taken to look up 1000 values 0.001405 Iterations 500433


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000004 TimeEnd 0.000023
Time taken to add 1000 values to table 0.000019 Iterations 1000
TimeBegin 0.000243 TimeEnd 0.002273
Time taken to look up 1000 values 0.002030 Iterations 500433

D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000009 TimeEnd 0.000047

Time taken to add 1000 values to table 0.000038 Iterations 1000
TimeBegin 0.000468 TimeEnd 0.004544
Time taken to look up 1000 values 0.004076 Iterations 500433


D:\Development\HashTablesTest>build\HashTablesNaive.exe
TimeBegin 0.000015 TimeEnd 0.000063
Time taken to add 1000 values to table 0.000047 Iterations 1000

TimeBegin 0.000610 TimeEnd 0.004775
Time taken to look up 1000 values 0.004166 Iterations 500433

Well, now that that's resolved, enjoy the code maybe it will help someone. Or maybe someone can take a look at it and see if there are things I can do better.

David Owens II

#2472

February 6, 2015

If you enable warnings the compiler will tell you. It's warning C6282 for MSVC.

https://msdn.microsoft.com/en-us/vstudio/esa6csd7(v=vs.111).aspx

mojobojo

#2474

February 6, 2015

Thank you for that. I forgot about checking warning levels, that is something I need to do more often.

Dale Kim

#2483

February 6, 2015

I would highly recommend using more items than 1000 if you want to check performance. 1000 really is a trivial number to the computer for all but the most horrible algorithms (or NP-hard problems).

mojobojo

#2512

February 7, 2015

DaleKim
I would highly recommend using more items than 1000 if you want to check performance. 1000 really is a trivial number to the computer for all but the most horrible algorithms (or NP-hard problems).

So I modified my code a bit to do what you suggested and boy did I see the differences. The naive approach actually took too long and after ten minutes I hit ctrl+c so time taken for the naive approach is > 10 minutes.

But wow, increasing the size of the hash table, barely had to wait for the other two to finish. There is also a cheap string generation I threw in there.

D:\Development\HashTablesTest>build\HashTablesNaive.exe
Time taken to add 1000000 values to table 0.180649 Iterations 1000000
^C
D:\Development\HashTablesTest>build\HashTablesInternalChaining.exe
Time taken to add 1000000 values to table 0.300012 Iterations 10216664
Time taken to look up 1000000 values 0.113545 Iterations 10216664

D:\Development\HashTablesTest>build\HashTablesExternalChaining.exe
Time taken to add 1000000 values to table 0.296565 Iterations 482547
Time taken to look up 1000000 values 0.066534 Iterations 842183

EDIT: Went ahead and let the first one run and did something while it ran.

1
2
3

D:\Development\HashTablesTest>build\HashTablesNaive.exe
Time taken to add 1000000 values to table 0.171783 Iterations 1000000
Time taken to look up 1000000 values 1619.253833 Iterations 1784293664

I am measuring in seconds so that is a full 26.987 minutes!

Edited by mojobojo on February 7, 2015, 7:30pm