Safe replacement for gets

August 13th, 2009 Leave a comment Go to comments

Every C programmer knows that gets() should never be used in a C application. For a small project at university I needed to read in one line from standard input using pure C, so I wrote a function which reads in one line from standard input and allocates only the amount of needed memory dynamically. Because of the rather big buffer (100 characters) it should be faster than a read-char-loop as well. I’m putting it online here because it might prove useful to someone.

I am aware that most of you have your own way of dealing with this, or that you use external libraries which handle input and output, but this example is targeted at new C programmers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define BUFFERLEN 100

/** \brief Read in one line (can be any size) from the standard input
  * \return Dynamically allocated string on success or NULL if the memory could not be allocated
  * \warning The returned string has to be deallocated with free() after you're done with it.
  */

char *getLine()
{
    int length, count = 0;
    char buffer[BUFFERLEN] = {0};
    char *out = NULL, *temp;
    int finished = 0, error = 0;
   
    do {
        if (!fgets(buffer, BUFFERLEN, stdin)) {
            if (feof(stdin))
                finished = 1;
            else
                error = 1;
        } else {
            length = strlen(buffer);
            if (length > 0 && buffer[length - 1] == '\n') {
                buffer[--length] = '\0';
                finished = 1;
                temp = (char *)realloc(out, count * (BUFFERLEN - 1) + length + 1);
                if (temp) {
                    out = temp;
                    strcpy(&out[count * (BUFFERLEN - 1)], buffer);
                } else
                    error = 1;
                ++count;
            }
            if (feof(stdin))
                finished = 1;
        }
    } while (!finished && !error);

    if (error) {
        free(out);
        out = NULL;
    }

    return out;
}

int main(void) {
    char *line = getLine();
    if (line) {
        printf("You entered: %s\n", line);
        free(line);
    } else
        puts("Could not get line.");
    return 0;
}
VN:F [1.6.3_896]
Rating: 0 (from 0 votes)
Share and Enjoy:
  • Print this article!
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Twitter
  1. August 8th, 2009 at 21:20 | #1

    I wonder how you define “memory efficient” if you’re using realloc() and allocations which are not page-aligned?

  2. Wesley
    August 8th, 2009 at 22:06 | #2

    @Philip Paeps
    I’m sorry. I am far from an expert when it comes to memory management. The only reason why I call this memory efficient is because it will only allocate the necessary amount of bytes for the string to fit. What would be a better way than using realloc()? And you are right about the page alignment of course. Thanks for your comment.

  3. Bob van der Vleuten
    August 9th, 2009 at 13:24 | #3

    I did something quite like this during the same project. It would’ve been annoying if I had not done that. Thanks for posting the code, yours looks a lot cleaner :-)

  4. August 9th, 2009 at 13:59 | #4

    Most modern malloc() implementations don’t allocate bytes but pages or other “convenient” chunks of memory. So your malloc() won’t allocate the number of bytes you need but potentially many more than that. Knowing that, you can avoid calling realloc() — which, again depending on the libc — can be quite expensive, by allocating more natural chunks yourself.

    Instead of using realloc() you can make your string buffer a linked list with a pointer to the next chunk of your string. That way you don’t pay the realloc() penalty, but if your string takes more than one buffer (which would be a very large string…) the drawback is that you won’t be able to use the standard libc functions to manipulate it.

    In my experience, if something needs to be memory efficient, it shouldn’t be using strings in the first place, but my experience is mostly in areas of development where strings don’t figure much. As usual, your mileage and requirements may vary.

  5. vart
    August 9th, 2009 at 18:25 | #5

    You overwrite out pointer with return value of realloc.
    If realloc fails – out pointer is lost. And you are using null pointer – causing crash.

    1. check return value of realloc
    2. use temp pointer to store the return value of it before overwriting value of out

  6. Wesley
    August 9th, 2009 at 19:28 | #6

    @vart You are absolutely right of course. I have modified the example code slightly so that it shouldn’t crash when realloc fails. Thanks!

  7. Jon
    August 9th, 2009 at 20:03 | #7

    Check the return value of fgets! It will return NULL on an EOF read (which happens all the time… user types and then hits ctrl-D, or input is piped to you). In your example, you’ll continue the loop, using the last contents of buffer. Over and over and over, with no end.

    Check the return value of getLine! Your docstring indicates it can return NULL, but then you go ahead and try to use it without checking for that.

    Beginning C coders do this sort of thing all the time. It is a very fast, very powerful language, and consequently it will allow you to load the gun and blow your foot right off.

    Side note: you can gain all sorts of little speed-ups by thinking like a computer:
    buffer[length - 1] = ”;
    –length;
    becomes:
    buffer[--length] = ”;

  8. Justin
    August 9th, 2009 at 20:32 | #8

    Trace through the execution of that function in the case where the user enters a string < 100 chars. You call realloc even when you don't need to.

    Then you do

    temp = (char *)realloc(out, count * 99 + length + 1);

    where count is 0 so that becomes

    temp = (char *)realloc(out, 0 * 99 + length + 1);

    so you are calling

    temp = (char *)realloc(out, length + 1);

  9. Wesley
    August 9th, 2009 at 21:43 | #9

    @Jon Okay. I modified the example code again. I hope that it’s less error-prone now. Thanks for your tips.

  10. Wesley
    August 9th, 2009 at 21:48 | #10

    @Justin why is that bad? realloc behaves just like malloc when the pointer to the memory block is NULL according to this link

  11. Jon
    August 10th, 2009 at 07:09 | #11

    @Wesley

    Your fix isn’t quite right:
    if (fgets(buffer, 100, stdin)) {

    fgets returns buffer on success, so your logic is backwards. Also, the EOF case isn’t an error, but normal operation. Something like:

    if (!fgets(buf, 100, stdin))
    {
    if (feof(stdin))
    length = finished = 0;
    else
    error = … blah;
    }
    else
    length = strlen(buf);

  12. Jon
    August 10th, 2009 at 07:10 | #12

    @Jon
    err… the above should read
    {
    length = 1;
    finished = 0;
    }

    Brain failure while typing.

  13. Jon
    August 10th, 2009 at 07:10 | #13

    @Jon
    *smacks head against wall*
    length = 0;
    finished = 1;

    I’m going to sleep now!

  14. August 10th, 2009 at 10:16 | #14

    Uhm… fgetln(3) has been part of the BSDs for ages (there are GNU/Linux implementations too, e.g. in libbsd in Debian), and getline(3) from glibc has made it into SUSv3 (POSIX) 2008. So what’s the point?

  15. James
    August 10th, 2009 at 23:12 | #15

    I strongly recommend you #define LINEBUFFERLEN 100
    and then replace every instance of 100 with LINEBUFFERLEN so that you (or another programmer) don’t change your buffer length, forget to change it in every single location and end up with another stack overflow just like the one you were trying to fix in the first place.

  1. No trackbacks yet.