K&R Chap 6.4 - C keywords counter

rizoma

#10537

January 25, 2017

Hi, I'm back on C... and in need of help with this code, I can't understand what's the purpose of using a pointer outside the actual data, (maybe it's just for teaching purpose, since it's legal to use the one past the actual array) so I would like to know if the modifications I did to the binary seach function are equivalent to the original one; I wrote those as comments. I did some debugging and I think it's ok, but since I'm a noob I'm looking for more experienced people to help :)
Here the code:
Thanks in advance!

struct key
{
    char *word;
    int count;
};

struct key keytab[]=
{
    { "auto", 0 },
    { "break", 0 },
    { "case", 0 },
    { "char", 0 },
    { "const", 0 },
    { "continue", 0 },
    { "default", 0 },
      /* ... */
    { "unsigned", 0 },
    { "void", 0 },
    { "volatile", 0 },
    { "while", 0 }
};

#define NKEYS (sizeof keytab / sizeof keytab[0])

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include "mylib.c"

#define MAXWORD 100

int getword(char *, int);
struct key *binsearch(char *, struct key *, int);

/* count C keywords */

main()
{
    char word[MAXWORD];
    struct key *p;

    while(getword(word, MAXWORD) != EOF)
        if(isalpha(word[0]))
            if((p = binsearch(word, keytab, NKEYS)) != NULL)
                p->count++;
    for(p = keytab;
        p < keytab + NKEYS;
        p++)
        if(p->count > 0)
            printf("%4d %s\n", p->count, p->word);
    return 0;
}

struct key *binsearch(char *word, struct key *tab, int n)
{
    int cond;
    struct key *low = &tab[0];
    struct key *high = &tab[n];  // struct key *high = &tab[n-1];
    struct key *mid;

    while(low < high)  // while(low <= high)
    {
        mid = low + (high - low) / 2;
        if((cond = strcmp(word, mid->word)) < 0 )
            high = mid; // high = mid -1;
        else if (cond  > 0)
            low = mid + 1;
        else
            return mid;
    }
    return NULL;
}


int getword(char *word, int lim)
{
    int c, getch(void);
    void ungetch(int);
    char *w = word;

    while(isspace(c = getch()))
        ;
    if(c != EOF)
        *w++ = c;
    if(!isalpha(c))
    {
        *w++ = '\0';
        return c;
    }
    for(;
        --lim > 0;
        w++)
        if(!isalnum(*w = getch()))
        {
            ungetch(*w);
            break;
        }
    *w = '\0';
    return word[0];
}

Edited by rizoma on January 25, 2017, 10:33pm

Mattie

#10547

January 26, 2017

If n is 1, then low = high = mid = &tab[0]. If the first condition is met, then you'll be setting high = mid-1, which is &tab[-1]. This points outside of the array, and is undefined behavior I think.

Edit: If n is 0, then high = &tab[-1], which hits that case more directly.

Edited by Mattie on January 26, 2017, 4:42pm

rizoma

#10574

January 28, 2017

I wasn't thinking abount the binsearch function by itself out of the context. Now it's clear why it was written using a pointer one past the array.
Thank you so much, sometimes I need someone to give an advice, and sadly It's difficult for me to find anyone interested in low level C outside this forum!

graeme

#10583

January 28, 2017

Another thing: In the first iteration (high - low) == n, so for even n in your changes the first assignment to `mid` will be off-center. That changes the worst case behaviour a little

There's a common C++ idiom that uses pointers past the end of arrays, too:

1
2
3

for (T it = begin(foo); it != end(foo); ++it) {
    // ...
}

This just works in C with arrays and pointers, but in C++ you can overload ++ and != for any type. This became formalised in C++11 with the range based for loop, which just looks for those functions (or methods) and dereferences the iterator for you.

Soapbox: it's one of those things that's really stupid about C++; they have an iterator 'type', the definition of which is implicitly defined by this syntactic expansion. The language update to fix this has been just around the corner for a decade as far as I know. This example happens to be small and baked into the language, but metaprogramming exclusively via duck typing shivers my timbers

Edited by graeme on January 28, 2017, 11:27pm