Are pointer really *that* bad?

RyanRothweiler

#7334

June 19, 2016

Since hmh I've been working on a variety of prototypes in my own engine. I use pointers all over the place and never run into any issues with them.
My college only teaches Java (...ugh) and the professors often say Java is better since it doesn't have programmer exposed pointers. Some reasons they say pointers are bad.
- Make code difficult to read.
- Easy for pointers to point to nothing.
- Easy for the data referenced by a pointer to be lost forever (if the pointer gets changed and the data is not referenced anywhere else).

So what's the deal? It seems my professors are just straight up wrong. Do pointers become a problem with large projects? Maybe programming casey-style resolves common points of pointer errors?

Here is the question. What are the true pros / cons of pointers?

Andrew Bromage

#7335

June 20, 2016

If you'll allow me a bit of self-indulgence for a moment...

Sometimes I despair about the current state of CS education. But in those moments I remember that I started off with 8-bit interpreted BASIC, and I think I probably turned out okay. The simple fact is this: Writing good software is hard, and no matter what language or system you start with, there is a crapload to unlearn when you get to your second and third.

Nonetheless, teachers shouldn't outright lie to students. Pointers are neither good nor bad, they are a fact of life.

Java doesn't give you pointers, it gives you managed references. There are two main differences between a pointer and a managed reference:

You can do arithmetic on pointers (including loading any value at all into a pointer). You can't "invent" references in this way; they must be given to you by the runtime system (in Java, that means "new").
Managed references "own" the thing (an "object" in memory management speak, but this is a little confusing because "object" means something specific in Java) that they point to. This means that when all references are destroyed, the thing is destroyed.

References, and managed references in particular, are there for safety and convenience. Like all such devices, they are there to make your life a bit easier in the common case, and to help you not hurt yourself. Whether or not you think that pointers are bad, they are undeniably powerful. They are powerful in a way such that they can cause a lot of problems if you don't know what you are doing.

Java references, in particular, are extremely safe, to the point that there are mathematically provable senses in which Java code cannot possibly "go wrong" at run time. They may not be the senses that you care about, of course, but they are there nonetheless. I think it's significant that in 20+ years of Java, there has not (to my knowledge) ever been a case where a security hole was attributable to a bug or design flaw in the Java virtual machine. It's a shame we can't say the same of the Java standard library and its broken SecurityManager model, but you can't have everything.

So why pointers? The short answer is that it's because that's what the machine is actually doing. Pointers are not like false gods, where you must know nothing about them less you be tempted away from the truth. If you don't understand pointers, you will never understand what the machine is actually doing, which means you will never be able to write code at that level yourself.

Managed references are implemented in terms of pointers. Indeed, in C++, managed references are a library; see std::shared_ptr for details. Similarly, the people who wrote the Java runtime, the JIT compiler, and so on, understand pointers extremely well.

Larry Wall, the creator of Perl (which is written in C), was once famously asked about how to do something like an array of pointers into a struct in Perl. His answer: "If you want to program in C, program in C. It's a nice language. I use it occasionally..."

Now, on the topic of the specific concerns...

- Make code difficult to read.

Every programming language has something in it which other people find difficult to read. Many people find Haskell's type system syntax impenetrable. That's because it's both extremely concise and extremely powerful. Something has to give.

One of the common complaints about readability in Java is that it is too verbose. For example, in this common construction:

1	Something something = new Something();

How many times do you really need to tell the compiler that it's a "Something"?

Or what about the endless qualification keywords which sometimes mean you're half-way across the screen before you get to the name of a member function?

1	public static synchronized void foo() { }

Yes, pointer syntax could be done differently. Many, if not most, C and C++ programmers think the distinction between the "." operator and the "->" operator is pointless and stupid, for example. I personally like the pointer syntax in Bliss/11 better. But really, this is a bit of a silly complaint. Just because the syntax is suboptimal doesn't mean the underlying model is broken.

- Easy for pointers to point to nothing.

Well it's easy for Java references to point to nothing, too. Tony Hoare publicly apologised for this, but it's done now and we now just have to make the best of it.

- Easy for the data referenced by a pointer to be lost forever (if the pointer gets changed and the data is not referenced anywhere else).

Right, and this is the "managed" part. On the other hand, it also makes it easy to concoct pointers which don't "own" the data that they point to (e.g. they might point into the middle of an allocation unit), and in low-level programming that's often precisely what you want.

Having said that, from the perspective of a modern declarative programming language, this complaint could equally be made about Java-style references. It would be even easier to never lose track of references to data if you couldn't modify variables at all, not to mention improved thread safety. There are useful programming languages which implement this quite successfully.

Imagine a 2D plot diagram, with "safety" on one axis and "power" on the other axis. Assembly language pointers have extreme power and no safety. C pointers have pretty high power and relatively low safety (they are type checked, so it's not "no safety"). Pascal references are lower power and slightly higher safety. Java references are even lower power but quite high safety. Haskell "references" (to the extent that they're a thing at all) are very low power and very high safety. Every programming language which has a concept like it can be placed on this diagram.

The interesting part of the diagram is the convex hull. I think it would be fair to say that both C pointers and Java references lie on that convex hull, in the sense that there is no programming language which beats them on both axes. Any pointer/reference model which lies on that convex hull represents a "sweet spot". That part of the diagram is where the interesting tradeoffs lie.

Edited by Andrew Bromage on June 20, 2016, 1:33am

RyanRothweiler

#7349

June 21, 2016

Thanks for the awesome reply. It's very thorough and informative.

I'm not too worried about my schooling since I add in my own out of class teachings (hmh, custom engine c programming, etc). The rest of the students though seem to have drank the java cool-aid.

All your points are things I had suspected, but I was worried maybe I had missed something. Thanks for the reply. :)

Randy Gaul

#7422

June 28, 2016

RyanRothweiler
Since hmh I've been working on a variety of prototypes in my own engine. I use pointers all over the place and never run into any issues with them.
My college only teaches Java (...ugh) and the professors often say Java is better since it doesn't have programmer exposed pointers. Some reasons they say pointers are bad.
- Make code difficult to read.
- Easy for pointers to point to nothing.
- Easy for the data referenced by a pointer to be lost forever (if the pointer gets changed and the data is not referenced anywhere else).

So what's the deal? It seems my professors are just straight up wrong. Do pointers become a problem with large projects? Maybe programming casey-style resolves common points of pointer errors?

Here is the question. What are the true pros / cons of pointers?

Q : Make code difficult to read?
A : Any language feature of any language can probably be abused to make code difficult to read. A C pointer is a fundamental idea of assembly (storing an address to lookup data later). Whether pointers should be "hidden" from a programmer or not is an entire argument about where people should write code. Close to hardware, or abstracted somewhere in the clouds? You will get different answers depending on who you ask. My opinion is that pointers are great and should be used all the time.

Q : Easy for pointers to point to nothing?
A : It's also easy to check to see if they point to something. Pointing to nothing (like NULL) is often a good way to setup and debug code.

Q : Easy for the data referenced by a pointer to be lost forever (if the pointer gets changed and the data is not referenced anywhere else)?
A : This is a non-problem in most cases. This kind of comment really only comes up if someone wants to use garbage collection or some kind of higher-level smart pointer. In these cases just go use another language. Otherwise programs are written in such a way this is just never a problem, and does not come up.

Q : So what's the deal? It seems my professors are just straight up wrong.
A : They make claims as if they are absolutely true, but really there's a lot of philosophy and opinions behind it all. Where you lay in the spectrum should be decided by you, not your professor.

Q : Do pointers become a problem with large projects?
A : No.

Jeremiah Johnson

#7427

June 28, 2016

Pointers just aren't necessary for some people to learn. Pointers aren't required to be used in C; you can put all of your variables on the stack, provided you have enough stack space on your platform to do what you need done.

IMO the syntax for pointers is awful.

Pointers are often considered difficult because people who talk about pointers in normal conversation are either asking what's so hard about pointers or are talking about how hard pointers are for them. People who know pointers well and who use them often don't usually find themselves in this conversation, in my experience. They're off coding and making things happen.

I, personally, don't really use them, and I'm not super versed in how to use them correctly. I rarely write software that needs them, and I've never once needed to perform pointer arithmetic or know exactly how my code is running on a processor. ¯\_(ツ)_/¯

Lastly, I want to say.. no, emphasize, that pointer knowledge does not make one superior to another, and lack of pointer knowledge does not make one inferior to another. The big-headedness that seems to come with frequent use of pointers usually turns me off of C entirely. I don't see it in this forum, but I've seen it throughout my career, and it's disgusting. (I've not worked in the best places or with the best people.)

A bad developer will leak objects in Java just like a bad developer will leak RAM in C. We can talk trash about Java or dotnet all day long; different valid requirements have different valid solutions. There is no single language that is perfect for every task.

Timothy Wright

#7428

June 28, 2016

Pointers are not bad at all. I've worked with a lot of different hardware over the years, and one thing I can always count on is a hardware.c and hardware.h file for using the hardware. Just try doing any low-level programming with Java. It doesn't even have unsigned integers. A lot of that kind of coding is flipping bits in memory with a starting address and an offset. I will take pointers every time in this case. They are the best way to get the job done.

Is it easier to mess them up and have crazy undefined behavior when you are writing into unknown memory? Sure. But I have to disagree about not needing pointers in C. Vanilla C has no references, so if you do anything other than a "hello world", you will be using pointers. Arrays, strings, FILE* pointers, or any memory on the heap needs pointers, and with the huge amount of data, going over the stack will happen really fast.

And guess what language the Java virtual machine is using. It's not Java, it's C. Google's V8 Javascript engine? C. Operating systems? C. Hardware and device drivers? C.

Not to mention that once you know C/C++, every other language (other than assembly and lisp) is easy.

Edited by Timothy Wright on June 28, 2016, 9:19pm

Abner Coimbre

#7430

June 29, 2016

And it's interesting to hear this from Timothy, as he is an author of a good Java book. I feel we've largely understood the merits of evaluating the real-world effects of each choice we make during development. A dedicated Java developer who knows about the stack, caches, and gets the general gist of how managed references are implemented is necessarily a better programmer than the one who lazily followed prescriptive guidelines. "Pointers are bad" people fall into the latter camp.

In other words, feel free to use literally whatever you want, and this site won't stop you. But try to understand the why behind your choice. Is it because the hive mind told you, or did you dig deeper into the tradeoffs you're making when you use a particular language?

Giving a f#ck is key.

-Abner

Edited by Abner Coimbre on June 29, 2016, 2:15am Reason: Wording

Andrew Bromage

#7431

June 29, 2016

I agree with many of the points made here. I'm not going to go through them because I hate "me too" posts. Please understand that, because I'm about to go through a bunch of things that I disagree with.

Randy Gaul
Otherwise programs are written in such a way this is just never a problem, and does not come up.

For most programs, this is the case, and it doesn't cause a problem. However, for other types of program, the cost of structuring your program around the lifetimes of your data is under-appreciated, possibly because it's hard to measure. I've done a lot of scientific-type programming, in many disparate fields, and it can go either way.

Take programming with discrete symbols (e.g. natural language processing) as an example. This tends to be very data structure-heavy, with lots of small structures being created and destroyed as you run your schmancy algorithms over them. You basically have two choices: Use the best algorithms you can think of, and turn on garbage collection (because keeping track of small-object lifetimes is a pain), or use slightly worse algorithms and structure your program around the lifetimes of the data.

Now I haven't measured this in a while, so this is going from old memories, but my recall is that from a purely runtime-efficiency perspective, it usually came out even. Sometimes one approach was a clear win, sometimes the other approach was a clear win, but on the whole, the cost of managing memory manually (which was largely algorithmic in nature) was pretty much the same as the cost of managing memory automatically. However, automatic memory management was much easier on the programmer, and also allowed faster iteration if you were doing exploratory programming.

I'm very pro-data-oriented-design, but it also imposes a cost. For games, the cost is almost always worth it. For other fields, it may not be.

naikrovek
Lastly, I want to say.. no, emphasize, that pointer knowledge does not make one superior to another, and lack of pointer knowledge does not make one inferior to another.

I'm going to double up a bit on what Timothy and Abner said, but nonetheless...

It doesn't make you a superior person, but it absolutely makes you a better programmer, just like how knowledge of how virtual machines work makes you a better programmer.

About a month ago, I spent half an hour explaining to a professional Java developer the finer points of modern concurrent garbage collectors because of a performance problem that was attributable to modifying a reference variable stored in an object on the heap inside a tight inner loop. Changing it to modifying a variable on the stack and then committing the change after the loop fixed the problem. To understand why it's the case you need to understand that modifications on the heap go through write barriers. Modifications on the stack do not, because objects aren't stored on the stack, and other threads can't write to your stack.

This is advanced stuff, admittedly, but this is the difference between a program which meets its performance targets and one that doesn't. Without understanding what's going on under the hood (or bonnet, in my case) you can't diagnose (let alone fix) problems which happen at that level.

Programmers are not measured on a single metric. C knowledge is not the standard by which all programmers must be judged. Having said that, the best programmers that I've ever worked with have shallow broad knowledge, as well as deep knowledge in a few select fields.

timothy.wright
Not to mention that once you know C/C++, every other language (other than assembly and lisp) is easy.

...and Prolog, and Haskell, and Erlang, and PostScript, and Epigram, and...

Casey Muratori

#7433

June 29, 2016

I might also add to the discussion here that, at least with the sorts of things I talk about, people often take my comments to mean that the goal of programming is that people should be doing lots of pointer manipulation and manual memory management. But that is actually not even remotely what I think. What I actually think is that currently we have no high-level languages that are as efficient as doing those things, so I do not like programming in them because it makes my programs worse.

Someday I would like to not have to do any manual memory management, because the programming languages are good enough that they do the right thing with memory automatically. That right thing is definitely not garbage collection, because that does not even seem to roughly approximate the sorts of things I want it to do. But that doesn't mean we won't eventually get to a real high level language that is about easily enabling programmers to make code that does the same things they were doing when they wrote it by hand.

I just think, sadly, most of the time that is not the goal of high level language designers :(

- Casey

Timothy Wright

#7439

June 29, 2016

abnercoimbre
And it's interesting to hear this from Timothy, as he is an author of a good Java book. I feel we've largely understood the merits of evaluating the real-world effects of each choice we make during development. A dedicated Java developer who knows about the stack, caches, and gets the general gist of how managed references are implemented is necessarily a better programmer than the one who lazily followed prescriptive guidelines. "Pointers are bad" people fall into the latter camp.

In other words, feel free to use literally whatever you want, and this site won't stop you. But try to understand the why behind your choice. Is it because the hive mind told you, or did you dig deeper into the tradeoffs you're making when you use a particular language?

Giving a f#ck is key.

-Abner

I didn't right the book because I love Java and think its the best way to make games. There just wasn't a single book that showed a Java programmer how to make a real game, from scratch, just like an SDL game or a simple Direct2D game. With so many people learning Java, I wanted there to be at least one book to help them transition into games.

hugo schmitt

#7445

June 30, 2016

The only language that is getting popular and doesn't have garbage collection is Rust, IIRC: https://www.rust-lang.org/

Andrew Bromage

#7446

July 1, 2016

cmuratori
I just think, sadly, most of the time that is not the goal of high level language designers :(

The unintentional goal of most high-level language designers is to devise a language which is optimised for writing its own compiler in. I am guilty of this.

Jeremiah Johnson

#7450

July 2, 2016

cmuratori
I might also add to the discussion here that, at least with the sorts of things I talk about, people often take my comments to mean that the goal of programming is that people should be doing lots of pointer manipulation and manual memory management. But that is actually not even remotely what I think. What I actually think is that currently we have no high-level languages that are as efficient as doing those things, so I do not like programming in them because it makes my programs worse.

Someday I would like to not have to do any manual memory management, because the programming languages are good enough that they do the right thing with memory automatically. That right thing is definitely not garbage collection, because that does not even seem to roughly approximate the sorts of things I want it to do. But that doesn't mean we won't eventually get to a real high level language that is about easily enabling programmers to make code that does the same things they were doing when they wrote it by hand.

I just think, sadly, most of the time that is not the goal of high level language designers :(

- Casey

I agree about one day not needing to do memory management. outside of games, which I neither author nor contribute to, garbage collection is just fine. In my world (the plebian business applications and middleware) GC is more than sufficient in 99% of cases, and in the remaining 1%, throwing more hardware at the problem fixes any performance problems I've ever encountered.

I am curious about the hinted third option after manual memory management and garbage collection (which I have always considered automatic memory management.) Is there a third or are the current GC methods insufficient? I am asking honestly.

I am hopeful that someone can write a dramatically better automatic memory management system one day. As it is, my understanding is that there is no known better ways to perform GC, today, than what's being done already.

Edited by Jeremiah Johnson on July 2, 2016, 6:17pm

Jeremiah Johnson

#7451

July 2, 2016

nevermind

Edited by Jeremiah Johnson on July 2, 2016, 5:52pm

Jeremiah Johnson

#7452

July 2, 2016

timothy.wright
abnercoimbre
And it's interesting to hear this from Timothy, as he is an author of a good Java book. I feel we've largely understood the merits of evaluating the real-world effects of each choice we make during development. A dedicated Java developer who knows about the stack, caches, and gets the general gist of how managed references are implemented is necessarily a better programmer than the one who lazily followed prescriptive guidelines. "Pointers are bad" people fall into the latter camp.

In other words, feel free to use literally whatever you want, and this site won't stop you. But try to understand the why behind your choice. Is it because the hive mind told you, or did you dig deeper into the tradeoffs you're making when you use a particular language?

Giving a f#ck is key.

-Abner

I didn't right the book because I love Java and think its the best way to make games. There just wasn't a single book that showed a Java programmer how to make a real game, from scratch, just like an SDL game or a simple Direct2D game. With so many people learning Java, I wanted there to be at least one book to help them transition into games.

Are there any books or websites that teach someone how to write a game that is not the specific game that everyone else that follows the book or website will write? I've always viewed those books/websites/videos as something like "how to write the exact thing I already wrote, line-for-line. See? It's easy!"

I'm not making an attack at you, I don't know you or the book you've written. I've written zero books. It seems that writing games is something that really can't be taught in a book, I guess. I mean, kudos for doing what you did, though. You've all done more than me.

I'm in a shitty mood and I'm sorry that's come through my posts today. None of it is directed at any particular direction at all. I have PTSD and this is just one of those days.

Edit: Why doesn't this software handle nested quotes? What year is this?

Edited by Jeremiah Johnson on July 2, 2016, 6:24pm