Casey's programming methods and encapsulation

In video 42 Casey was asked about private variables for which he responded he never uses them and they are only there to prevent bugs (bugs of which he never really has issues with) so all his variables are just public. While I do agree that having walled off variables make it harder to work with classes and should not be used willy nilly, is it not still a desirable trait in code to have a more well defined place where certain variables can be manipulated, thus reducing their scope and increasing encapsulation? Having increased encapsulation is desirable because you can make modifications to the underlying data of a certain class without having to change client code utilizing that class's interface, reducing the amount of work needing to be done.

Edited by Jason on
In the end I think this pretty much just comes down to personal preference and the situation you're in. Casey can be quite opinionated about certain things, and it's always interesting to hear what he thinks about the different ways of doing things, but in the end it's nothing more than his opinion, and it's ok to disagree with him since he can only speak from his own experience.

As with everything in engineering, it's a trade-off.

In most of my projects I tend to go a similar route to Casey and just leave everything as being public. My reasoning for this is that it's easier to just use a member on a struct than it is to use getters/setters all the time. Also, in a compiled language, I can just change a member to a getter/setter when I like and the compiler will tell me which lines in my code I need to change.

Perhaps private by default is more useful in non-compiled languages since it becomes more difficult to reliably find and replace all usages of a struct member in a larger project.

Personally, when I write code in my projects I absolutely hate dealing with getters/setters unless it's actually necessary. But again this could just be a bias of mine that comes from the fact I work with a compiler that tells me where I need to update some code that uses a struct member.

If you're writing a project that many people rely on, and you don't want to needlessly break compatibility with people between versions then it could be useful to just start out with getters/setters to avoid breaking other people's code at a later time just because you changed your mind about how a member should be accessed.

In some cases though, even if you do use getters/setters, a change in the processing of the data in either of those functions could break code anyway.

There are too many examples to try and list, but again, it depends a lot on the situation and the trade-offs you're willing to make; there is no silver bullet.
In the end private variables is about removing temptation and dealing with other programmers using the same codebase.

Having a variable behind a fence means you cannot simply poke at it and possible invalidate some invariant you need to keep. Forcing all updates to a variable to go through a specific bit of code also means you can put a breakpoint there (helpful when your debugger doesn't have data breakpoints or you want to monitor many objects).

If you and your team have the self discipline and communication to stick to the plan regarding the variables of an object then you don't need private.

But as soon as there is that one guy on the team that will just poke at random variables when he doesn't need or you get tempted too easily by it to you need to add some kind of convenience barrier between the programmer and the variable.
Well the main reason I was struggling a bit with it's use was the fact that I was always taught to reduce the scope of variables/data when possible, for the same reasons you don't want global variables. Having many possible places where code can be accessed can make it harder to reason about the state of your program at any given point, which becomes especially troublesome when trying to debug a particular issue that has arisen. Though I don't think public variables are quite as bad as global variables, given most of your objects you create will be scoped in some way. I guess what I'm trying to wrap my head around is while I know even global variables have their uses, the general consensus is you very much want to limit them so I'm wondering why the same logic wouldn't be applied to having most of your class variables public by default?

Edited by Jason on
I think the need for public/private class data members fades as you transition from a class based (OOP) style to a function/procedure based one (imperative, functional).

When you're thinking about classes first, you tend to group data that looks like should belong to a class together, even though the actual data access patterns for a single class are wildly different. On the other hand, when functions are at the center of the thought process, you tend to group data that is generally accessed together, therefore making it harder to break the invariants.

For example:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class CartesianDistance {
  public:
    void SetX(u32 x) { _x = x; ComputeDistance(); }
    void SetY(u32 y) { _y = y; ComputeDistance(); }

    u32 GetDistance() { return _dist; }

  private:
    void ComputeDistance() { _dist = _x + _y; }

    u32 _x;
    u32 _y;
    u32 _dist;
};

// usage code
// read some data from a file
CartesianDistance* cd = manager->CreateCartesianDistance();
cd->SetX(readX);
cd->SetY(readY);

// later on
void MapGenerator::Generate() {
  foreach(...) {
    u32 dist = manager->GetCartesianDistance(...)->GetDistance()
    // do something with dist
  }
}


vs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
u32 CartesianDistance(u32 x, u32 y) {
  u32 result = x + y;
  return result;
}

// usage code
// read some data from a file

void GenerateMap(void) {
  foreach(...) {
    u32 dist = CartesianDistance(distances[idx].x, distances[idx].y);
    // do something with dist
  }
}


In this very simplified example, the class based approach cannot give public access to any of its member variables, because writing to any of them would break the invariant (i.e. _dist needs to be set to the correct cartesian distance). However, in the function based implementation, we don't care where x & y come from, since we'll compute the distance when we need it.

I'll leave the pros & cons of each solution out of this discussion. Hopefully it's clear enough that this is an extremely simple example for illustration purposes only.
I didn't know how to put my thoughts on this topic into words--basically what I'd say is that it irks me that this discussion is entirely code-centric. It's like builders talking about which hammer they use, and what sort of grip the hammer should have (s/builders/programmers, s/hammer/code, s/grip/language features). Code is a tool--that's it. Programmers might supply some code for themselves or others, and if that code isn't a useful tool, then it's not well-written. Being a useful tool means approaching the easiest possible thing to use while also approaching the best possible solution to the problem.

My point is, I guess, just worry about the problem at hand. Development shouldn't be about worrying about what golden rule you're breaking or whether your code is 'clean' enough. It should be about solving problems.

My development process is to attempt to write the simplest possible thing to solve a problem, take out the packages of data that need to be replicated during the process, take out operations that are done in multiple spots and throw them into functions, and at the end, the things that are actually done multiple times are very easy to modify, and the data that the system works with is also very easy to modify.

In my experience, the above process never really results in accessors/mutators, data-hiding, or member functions--it also produces code that is really easy to reason about and use.
Delix
In my experience, the above process never really results in accessors/mutators, data-hiding, or member functions--it also produces code that is really easy to reason about and use.


Easy for who? ...For you obviously. Not everyone is the same and something that's intuitive to reason about for you may not be for someone other than you. There is no single approach or set of approaches when it comes to this stuff, people's brains often work radically differently from one another.

Brute forcing simple solutions and only worrying about the problem at hand is fine if your dealing with a small program. But it isn't until you approach or get into the hundreds of thousands of lines of code until architectural decisions, like encapsulation schemes, begin to pay off their dividends. For larger projects minimizing complexity is crucial and breaking your program into discrete black boxes (ideally) that communicate (through well defined interfaces) with as few of each other as necessary, should almost always be a priority.

NelsonMandella
Not everyone is the same and something that's intuitive to reason about for you may not be for someone other than you. There is no single approach or set of approaches when it comes to this stuff, people's brains often work radically differently from one another.


I completely agree with this, which is why I think introducing a greater number of abstractions from the hardware is a problem for understandable code. The common language for all programmers, irrespective of how one might mentally model a particular problem, is hardware. Failing to understand the hardware, then, is detrimental to the ability of multiple programmers to communicate--it is the common ground for all mental models.

When code is framed around objects/class hierarchies/"real world representations", that common language ceases to exist--all subjective interpretations are equally valid. This is why I claim that writing something that is as simple as possible and that is purely dictated by the computational realities of the problem is best for understandable code.

To respond more directly to your response, NelsonMandella, I simply don't see the justification behind the argument that a software development problem should ever deviate from the problem at hand. An API inside of a codebase should provide a simple way to do its part in the solution of a larger problem. Complexity will increase as projects get larger, of course, but introducing new things to the problem isn't going to reduce that complexity.

Edited by Ryan Fleury on
Delix
I completely agree with this, which is why I think introducing a greater number of abstractions from the hardware is a problem for understandable code. The common language for all programmers, irrespective of how one might mentally model a particular problem, is hardware. Failing to understand the hardware, then, is detrimental to the ability of multiple programmers to communicate--it is the common ground for all mental models.

When code is framed around objects/class hierarchies/"real world representations", that common language ceases to exist--all subjective interpretations are equally valid. This is why I claim that writing something that is as simple as possible and that is purely dictated by the computational realities of the problem is best for understandable code.


For each additional new abstractions there's definitely a cost/benefit consideration. Sometimes tons of abstractions make sense in terms of reducing complexity for a particular person/persons in particular situations and sometimes they don't. But yeah certainly unnecessary abstractions, like what we're used to seeing with typical C++ oop-centric code is something to be avoided. And yes, oop-centric paradigms do abstract people away from the reality of what the program is actually doing.


Delix

To respond more directly to your response, NelsonMandella, I simply don't see the justification behind the argument that a software development problem should ever deviate from the problem at hand. An API inside of a codebase should provide a simple way to do its part in the solution of a larger problem. Complexity will increase as projects get larger, of course, but introducing new things to the problem isn't going to reduce that complexity.


Within the context of a large project, beelining for the simplest most elegant solution to a given problem will not necessarily have favorable implications for your overall projects architecture. Packaging the module, defining the interface and encapsulating its data are largely architectural decisions that may add complexity to the module itself while at the same time reducing the complexity of the overall program.
NelsonMandella
Within the context of a large project, beelining for the simplest most elegant solution to a given problem will not necessarily have favorable implications for your overall projects architecture. Packaging the module, defining the interface and encapsulating its data are largely architectural decisions that may add complexity to the module itself while at the same time reducing the complexity of the overall program.


I don't disagree, and in the cases where simplicity in the API is inherently at odds with simplicity in the implementation of the API, it's most certainly a decision that involves various tradeoffs. All I can really say about that is which direction one chooses to favor is completely dependent on their project/team/<insert other relevant circumstances here>.

I was just specifying my ideal methodology and describing the characteristics of an "ideal" solution (if there were ever one to exist): That which approaches computational efficiency and API simplicity. As you described, these might not always work hand-in-hand, which as I mentioned previously, is an architectural decision with an answer that depends on the circumstances.
Within the context of a large project, beelining for the simplest most elegant solution to a given problem will not necessarily have favorable implications for your overall projects architecture. Packaging the module, defining the interface and encapsulating its data are largely architectural decisions that may add complexity to the module itself while at the same time reducing the complexity of the overall program.

I think this is what I was really trying to get at. I guess, since I've never been apart of a large software project, I'm just trying to figure out how I should be framing my mental model of my coding process. It seems, from what I can gather and please correct me if I'm wrong, the larger and more complex your project gets the more important encapsulation and well defined interfaces can be. Is this because at a certain point it becomes less costly to have slightly more complicated and harder to work with class hierarchies with higher encapsulation than having less complicated and more flexible classes which are less encapsulated? If so, is this due to the fact that more publicly accessible, flexible code can make it harder to reason about the state of a program at any given point and thus makes it harder to debug your code generally speaking?

Edited by Jason on
boagz57
It seems, from what I can gather and please correct me if I'm wrong, the larger and more complex your project gets the more important encapsulation and well defined interfaces can be. Is this because at a certain point it becomes less costly to have slightly more complicated and harder to work with class hierarchies with higher encapsulation than having less complicated and more flexible classes which are less encapsulated?


Generally speaking the costs of properly black-boxing your code (to the extent possible) would be offset by the overall reduced program complexity. Simpler programs are easier to work with. Wether it's trying to figure out what something does, integrating a new module or just general refactoring - all of these are should take less time if the underlying program is simple vs. complex. The same general principles that apply to engineering a Boeing 787 also apply to software architecture.

boagz57
If so, is this due to the fact that more publicly accessible, flexible code can make it harder to reason about the state of a program at any given point and thus makes it harder to debug your code generally speaking?


You don't simply blackbox something to make it easier to reason about (this is more of an ancillary benefit), you do it because it makes the entire system more robust by cutting down on interdependencies (probably the most important driver of complexity for large software systems), which in turn makes everything you to with/to it easier in the long run.


When it comes to large complex programs, the best programmers figure out how to minimize cognitive load (for lack of a better term). Newbies strain to fit all sorts of unnecessary complexities into their heads, unnecessarily memorizing stuff, writing convoluted algorithms, etc. while more advanced programmers write systems in such a way that they don't need to memorize anything because the overhead in understanding a given module is so low because it is so simple. Great programmers tend to spend more time than novices doing things like naming variables, designing interfaces, etc but the payoffs are greatly asymmetrical in the long run because you'll be more productive writing code in a simple system vs. a complex one, generally speaking.
boagz57
Well the main reason I was struggling a bit with it's use was the fact that I was always taught to reduce the scope of variables/data when possible, for the same reasons you don't want global variables. Having many possible places where code can be accessed can make it harder to reason about the state of your program at any given point, which becomes especially troublesome when trying to debug a particular issue that has arisen. Though I don't think public variables are quite as bad as global variables, given most of your objects you create will be scoped in some way. I guess what I'm trying to wrap my head around is while I know even global variables have their uses, the general consensus is you very much want to limit them so I'm wondering why the same logic wouldn't be applied to having most of your class variables public by default?


That goes back to the discipline argument, you can have globals all over the place and still have a decent functional and maintainable program. Access modifiers and such are a way to compiler enforce a certain access pattern.

In fact a lot of old console titles would do exactly that because they don't have the time to spend passing the pointer to your singleton god object around, there's only one copy of the state you need to keep so why even bother pretending that parameter could be pointing anywhere other than that one statically allocated block of memory.

Also a lot of embedded development globals is the best way to keep your state around.
For a typical game engine, the level of parameter passing required to avoid the use of globals for a single game state makes it impractical. The benefit is nearly non-existent for me personally while the hit to aesthetic cleanliness and readability is significant. I would guess that Casey feels differently but then again his entire naming scheme is unconventional in its verbosity.
I think that this kind of discussion (using globals, private vs. public, etc) tend to generate a lot of "general aesthetic" opinions, because the langages at hand (here C/C++) are not really good at providing what we want in term of invariants.

All the arguments ultimately revolve around how we enforce invariants or "state consistency" in a program, so that we can reason about it. Eg. "you should use accessors and not touch global shared state to ensure your data are always in a consistent state", vs "you don't want the langage to get in your way, and you can be disciplined enough to not touch what you're not supposed to touch".

C style doesn't help you much to enforce anything beside minimal type correctness, so it's easy to have wrong assmuptions about what the state of the program can be when a particular piece of code is executed, which is the cause of most subtle bugs.

On the other hand, C++ pretend to offer that, but in fact it decides on your behalf what kind of assumptions you want to make and furthermore it couples the "invariance" problem with encapsulation, and encapsulation with objects. In my opinion that's unfortunate, because these three thing have little in common.

More often I would like features to enforce invariants at the function/block level ie. declaring which state can be touched in a particular block of code (eg captures/lambda/something to be invented) rather than declaring which state can be touched inside a particular block of data (private/public).

If we had semantics that allowed us to choose which invariants we want to be true inside a particular piece of code, there would be no point in all these "best practices" opinions, because you would declare the invariants you need to reason about your code, and the discussion could not be detached from that particular piece of code.

Edited by Martin Fouilleul on