Safe replacement for gets
Every C programmer knows that gets() should never be used in a C application. For a small project at university I needed to read in one line from standard input using pure C, so I wrote a function which reads in one line from standard input and allocates only the amount of needed memory dynamically. Because of the rather big buffer (100 characters) it should be faster than a read-char-loop as well. I’m putting it online here because it might prove useful to someone.
I am aware that most of you have your own way of dealing with this, or that you use external libraries which handle input and output, but this example is targeted at new C programmers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #include <string.h> #include <stdio.h> #include <stdlib.h> #define BUFFERLEN 100 /** \brief Read in one line (can be any size) from the standard input * \return Dynamically allocated string on success or NULL if the memory could not be allocated * \warning The returned string has to be deallocated with free() after you're done with it. */ char *getLine() { int length, count = 0; char buffer[BUFFERLEN] = {0}; char *out = NULL, *temp; int finished = 0, error = 0; do { if (!fgets(buffer, BUFFERLEN, stdin)) { if (feof(stdin)) finished = 1; else error = 1; } else { length = strlen(buffer); if (length > 0 && buffer[length - 1] == '\n') { buffer[--length] = '\0'; finished = 1; temp = (char *)realloc(out, count * (BUFFERLEN - 1) + length + 1); if (temp) { out = temp; strcpy(&out[count * (BUFFERLEN - 1)], buffer); } else error = 1; ++count; } if (feof(stdin)) finished = 1; } } while (!finished && !error); if (error) { free(out); out = NULL; } return out; } int main(void) { char *line = getLine(); if (line) { printf("You entered: %s\n", line); free(line); } else puts("Could not get line."); return 0; } |








I wonder how you define “memory efficient” if you’re using realloc() and allocations which are not page-aligned?
@Philip Paeps
I’m sorry. I am far from an expert when it comes to memory management. The only reason why I call this memory efficient is because it will only allocate the necessary amount of bytes for the string to fit. What would be a better way than using realloc()? And you are right about the page alignment of course. Thanks for your comment.
I did something quite like this during the same project. It would’ve been annoying if I had not done that. Thanks for posting the code, yours looks a lot cleaner
Most modern malloc() implementations don’t allocate bytes but pages or other “convenient” chunks of memory. So your malloc() won’t allocate the number of bytes you need but potentially many more than that. Knowing that, you can avoid calling realloc() — which, again depending on the libc — can be quite expensive, by allocating more natural chunks yourself.
Instead of using realloc() you can make your string buffer a linked list with a pointer to the next chunk of your string. That way you don’t pay the realloc() penalty, but if your string takes more than one buffer (which would be a very large string…) the drawback is that you won’t be able to use the standard libc functions to manipulate it.
In my experience, if something needs to be memory efficient, it shouldn’t be using strings in the first place, but my experience is mostly in areas of development where strings don’t figure much. As usual, your mileage and requirements may vary.
You overwrite out pointer with return value of realloc.
If realloc fails – out pointer is lost. And you are using null pointer – causing crash.
1. check return value of realloc
2. use temp pointer to store the return value of it before overwriting value of out
@vart You are absolutely right of course. I have modified the example code slightly so that it shouldn’t crash when realloc fails. Thanks!
Check the return value of fgets! It will return NULL on an EOF read (which happens all the time… user types and then hits ctrl-D, or input is piped to you). In your example, you’ll continue the loop, using the last contents of buffer. Over and over and over, with no end.
Check the return value of getLine! Your docstring indicates it can return NULL, but then you go ahead and try to use it without checking for that.
Beginning C coders do this sort of thing all the time. It is a very fast, very powerful language, and consequently it will allow you to load the gun and blow your foot right off.
Side note: you can gain all sorts of little speed-ups by thinking like a computer:
buffer[length - 1] = ”;
–length;
becomes:
buffer[--length] = ”;
Trace through the execution of that function in the case where the user enters a string < 100 chars. You call realloc even when you don't need to.
Then you do
temp = (char *)realloc(out, count * 99 + length + 1);
where count is 0 so that becomes
temp = (char *)realloc(out, 0 * 99 + length + 1);
so you are calling
temp = (char *)realloc(out, length + 1);
@Jon Okay. I modified the example code again. I hope that it’s less error-prone now. Thanks for your tips.
@Justin why is that bad? realloc behaves just like malloc when the pointer to the memory block is NULL according to this link
@Wesley
Your fix isn’t quite right:
if (fgets(buffer, 100, stdin)) {
fgets returns buffer on success, so your logic is backwards. Also, the EOF case isn’t an error, but normal operation. Something like:
if (!fgets(buf, 100, stdin))
{
if (feof(stdin))
length = finished = 0;
else
error = … blah;
}
else
length = strlen(buf);
@Jon
err… the above should read
{
length = 1;
finished = 0;
}
Brain failure while typing.
@Jon
*smacks head against wall*
length = 0;
finished = 1;
I’m going to sleep now!
Uhm… fgetln(3) has been part of the BSDs for ages (there are GNU/Linux implementations too, e.g. in libbsd in Debian), and getline(3) from glibc has made it into SUSv3 (POSIX) 2008. So what’s the point?
I strongly recommend you #define LINEBUFFERLEN 100
and then replace every instance of 100 with LINEBUFFERLEN so that you (or another programmer) don’t change your buffer length, forget to change it in every single location and end up with another stack overflow just like the one you were trying to fix in the first place.