K&R Chap 6.4 - C keywords counter

Hi, I'm back on C... and in need of help with this code, I can't understand what's the purpose of using a pointer outside the actual data, (maybe it's just for teaching purpose, since it's legal to use the one past the actual array) so I would like to know if the modifications I did to the binary seach function are equivalent to the original one; I wrote those as comments. I did some debugging and I think it's ok, but since I'm a noob I'm looking for more experienced people to help :)
Here the code:
Thanks in advance!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
struct key
{
    char *word;
    int count;
};

struct key keytab[]=
{
    { "auto", 0 },
    { "break", 0 },
    { "case", 0 },
    { "char", 0 },
    { "const", 0 },
    { "continue", 0 },
    { "default", 0 },
      /* ... */
    { "unsigned", 0 },
    { "void", 0 },
    { "volatile", 0 },
    { "while", 0 }
};

#define NKEYS (sizeof keytab / sizeof keytab[0])

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include "mylib.c"

#define MAXWORD 100

int getword(char *, int);
struct key *binsearch(char *, struct key *, int);

/* count C keywords */

main()
{
    char word[MAXWORD];
    struct key *p;

    while(getword(word, MAXWORD) != EOF)
        if(isalpha(word[0]))
            if((p = binsearch(word, keytab, NKEYS)) != NULL)
                p->count++;
    for(p = keytab;
        p < keytab + NKEYS;
        p++)
        if(p->count > 0)
            printf("%4d %s\n", p->count, p->word);
    return 0;
}

struct key *binsearch(char *word, struct key *tab, int n)
{
    int cond;
    struct key *low = &tab[0];
    struct key *high = &tab[n];  // struct key *high = &tab[n-1];
    struct key *mid;

    while(low < high)  // while(low <= high)
    {
        mid = low + (high - low) / 2;
        if((cond = strcmp(word, mid->word)) < 0 )
            high = mid; // high = mid -1;
        else if (cond  > 0)
            low = mid + 1;
        else
            return mid;
    }
    return NULL;
}


int getword(char *word, int lim)
{
    int c, getch(void);
    void ungetch(int);
    char *w = word;

    while(isspace(c = getch()))
        ;
    if(c != EOF)
        *w++ = c;
    if(!isalpha(c))
    {
        *w++ = '\0';
        return c;
    }
    for(;
        --lim > 0;
        w++)
        if(!isalnum(*w = getch()))
        {
            ungetch(*w);
            break;
        }
    *w = '\0';
    return word[0];
}

Edited by rizoma on
If n is 1, then low = high = mid = &tab[0]. If the first condition is met, then you'll be setting high = mid-1, which is &tab[-1]. This points outside of the array, and is undefined behavior I think.

Edit: If n is 0, then high = &tab[-1], which hits that case more directly.

Edited by Mattie on
I wasn't thinking abount the binsearch function by itself out of the context. Now it's clear why it was written using a pointer one past the array.
Thank you so much, sometimes I need someone to give an advice, and sadly It's difficult for me to find anyone interested in low level C outside this forum!
Another thing: In the first iteration (high - low) == n, so for even n in your changes the first assignment to `mid` will be off-center. That changes the worst case behaviour a little

There's a common C++ idiom that uses pointers past the end of arrays, too:
1
2
3
for (T it = begin(foo); it != end(foo); ++it) {
    // ...
}


This just works in C with arrays and pointers, but in C++ you can overload ++ and != for any type. This became formalised in C++11 with the range based for loop, which just looks for those functions (or methods) and dereferences the iterator for you.

Soapbox: it's one of those things that's really stupid about C++; they have an iterator 'type', the definition of which is implicitly defined by this syntactic expansion. The language update to fix this has been just around the corner for a decade as far as I know. This example happens to be small and baked into the language, but metaprogramming exclusively via duck typing shivers my timbers

Edited by graeme on