If you'll allow me a bit of self-indulgence for a moment...
Sometimes I despair about the current state of CS education. But in those moments I remember that I started off with 8-bit interpreted BASIC, and I think I probably turned out okay. The simple fact is this: Writing good software is
hard, and no matter what language or system you start with, there is a crapload to unlearn when you get to your second and third.
Nonetheless, teachers shouldn't outright lie to students. Pointers are neither good nor bad, they are a fact of life.
Java doesn't give you pointers, it gives you managed references. There are two main differences between a pointer and a managed reference:
- You can do arithmetic on pointers (including loading any value at all into a pointer). You can't "invent" references in this way; they must be given to you by the runtime system (in Java, that means "new").
- Managed references "own" the thing (an "object" in memory management speak, but this is a little confusing because "object" means something specific in Java) that they point to. This means that when all references are destroyed, the thing is destroyed.
References, and managed references in particular, are there for safety and convenience. Like all such devices, they are there to make your life a bit easier in the common case, and to help you not hurt yourself. Whether or not you think that pointers are
bad, they are undeniably
powerful. They are powerful in a way such that they can cause a lot of problems if you don't know what you are doing.
Java references, in particular, are
extremely safe, to the point that there are mathematically provable senses in which Java code cannot possibly "go wrong" at run time. They may not be the senses that you care about, of course, but they are there nonetheless. I think it's significant that in 20+ years of Java, there has not (to my knowledge) ever been a case where a security hole was attributable to a bug or design flaw in the Java virtual machine. It's a shame we can't say the same of the Java standard library and its broken SecurityManager model, but you can't have everything.
So why pointers? The short answer is that it's because that's what the machine is actually doing. Pointers are not like false gods, where you must know nothing about them less you be tempted away from the truth. If you don't understand pointers, you will never understand what the machine is actually doing, which means you will never be able to write code at that level yourself.
Managed references are implemented in terms of pointers. Indeed, in C++, managed references are a library; see std::shared_ptr for details. Similarly, the people who wrote the Java runtime, the JIT compiler, and so on, understand pointers extremely well.
Larry Wall, the creator of Perl (which is written in C), was once famously asked about how to do something like an array of pointers into a struct in Perl. His answer: "If you want to program in C, program in C. It's a nice language. I use it occasionally..."
Now, on the topic of the specific concerns...
- Make code difficult to read.
Every programming language has something in it which other people find difficult to read. Many people find Haskell's type system syntax impenetrable. That's because it's both extremely concise and extremely powerful. Something has to give.
One of the common complaints about readability in Java is that it is too verbose. For example, in this common construction:
| Something something = new Something();
|
How many times do you really need to tell the compiler that it's a "Something"?
Or what about the endless qualification keywords which sometimes mean you're half-way across the screen before you get to the name of a member function?
| public static synchronized void foo() { }
|
Yes, pointer syntax could be done differently. Many, if not most, C and C++ programmers think the distinction between the "." operator and the "->" operator is pointless and stupid, for example. I personally like the pointer syntax in Bliss/11 better. But really, this is a bit of a silly complaint. Just because the syntax is suboptimal doesn't mean the underlying model is broken.
- Easy for pointers to point to nothing.
Well it's easy for Java references to point to nothing, too.
Tony Hoare publicly apologised for this, but it's done now and we now just have to make the best of it.
- Easy for the data referenced by a pointer to be lost forever (if the pointer gets changed and the data is not referenced anywhere else).
Right, and this is the "managed" part. On the other hand, it also makes it easy to concoct pointers which don't "own" the data that they point to (e.g. they might point into the middle of an allocation unit), and in low-level programming that's often precisely what you want.
Having said that, from the perspective of a modern declarative programming language, this complaint could equally be made about Java-style references. It would be even easier to never lose track of references to data if you couldn't modify variables
at all, not to mention improved thread safety. There are useful programming languages which implement this quite successfully.
Imagine a 2D plot diagram, with "safety" on one axis and "power" on the other axis. Assembly language pointers have extreme power and no safety. C pointers have pretty high power and relatively low safety (they are type checked, so it's not "no safety"). Pascal references are lower power and slightly higher safety. Java references are even lower power but quite high safety. Haskell "references" (to the extent that they're a thing at all) are very low power and very high safety. Every programming language which has a concept like it can be placed on this diagram.
The interesting part of the diagram is the convex hull. I think it would be fair to say that both C pointers and Java references lie on that convex hull, in the sense that there is no programming language which beats them on both axes. Any pointer/reference model which lies on that convex hull represents a "sweet spot". That part of the diagram is where the interesting tradeoffs lie.