Understanding the C linker: Internal/External linkage

It has been mentioned in several episodes on how we can hint the linker to work the way we need in regards to the accessibility between the different compile units.
In concrete, the concept on how the internal and external linkage works for every compile unit and how that can possibly affect linking times or that we can produce linker errors if not applied correctly.
After experimenting with a minimal code example, want to share what I have put together and the results.

Will appreciate any corrections, and will try to keep the discussion under the C realm, but I have found that the C++ classes might also be a topic of interest when trying to further understand how the linker works.

Specs: MVS Community 2019, Win10Pro




Multiple compile units

Command line
1
2
3
4
5
6
cl /c -Z7 ModuleA.cpp 
cl /c -Z7  ModuleB.cpp
dumpbin /SYMBOLS ModuleA.obj > ModuleA.dmp
dumpbin /SYMBOLS ModuleB.obj > ModuleB.dmp
cl -Z7 linker.cpp ModuleA.obj ModuleB.obj /link
dumpbin /SYMBOLS linker.obj > linker.dmp


linker.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include "ModuleA.h"
#include "ModuleB.h" 

int main()
{
    externalG();
    internalA = 9; // internal linkage, behaves like a local variable
    // internalF(); // error! function not accesible, internal linkage
    externalF();
    externalA = 19; // external linkage
    return (0);
}



ModuleA.h/cpp, internalA, according to the documentation, should not be visible to a different compile unit (other than ModuleA)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#if !defined(MODULEA_H)
#define MODULEA_H

static int internalA; // internal linkage

extern int externalA; // external linkage

inline void internalF(); // internal linkage
extern void externalF(); // external linkage

#endif

#include "ModuleA.h"

void internalF()
{
    internalA = 2;
    externalA = 23;
}

void externalF()
{
    internalA = 22;
}



ModuleB.h/cpp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#if !defined(MODULEB_H)
#define MODULEB_H

extern void externalG(); // external linkage

#endif


#include "ModuleB.h"
#include "ModuleA.h"

int externalA;

void externalG()
{
    externalA = 200;
    internalA = 400; // internal linkage, behaves like a local variable
    // internalF(); // error, UNDEF symbol f() because of internal linkage
    //internalA = 10;
}




At this point, the interesting findings is related to the internalA, declared in the ModuleA: regardless is declared as static, it can be accessed in any other module.
But more important, it behaves as a local variable: its value is different in every scope is used.

Does it mean that internalA should only be declared as static in the cpp file to really behave as internal?
For those who have experience working with other compiles, should I expect that this behavior is gonna be the same for g++ too ?

For the rest of the functions, it works as expected. For instance ModuleB.cpp, include ModuleA.h, but it can't access the internalF function, and the linker is gonna report that error as an "Unreasolved external symbol".
externalA variable, which is defined in ModuleB, it is still accessible and has a global scope. All good.



Single compile unit (Unity build)

Command line
1
2
cl -Z7 linker.cpp ModuleA.obj ModuleB.obj /link
dumpbin /SYMBOLS linker.obj > linker.dmp


linker.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include "ModuleA.cpp"
#include "ModuleB.cpp" 

int main()
{
    externalG();
    internalA = 9; // internal linkage, behaves like a local variable
    internalF(); // unity build now can access the internal linkage function
    externalF();
    externalA = 19; // external linkage
    return (0);
}


internalA, now really behaves as a global variable, regardless is declared static.

Again, should the internalA variable be declared in the cpp file to define the internal linkage?

Further more, the internalF is not accessible regardless the internal linkage. Is this result correct or a misconception on my side?

It seems that the concept of internal/external linkage is less meaningful to the linker and more informative to the coder when using a unity build??



Appreciate your feedback.



Edited by itzjac on
But more important, it behaves as a local variable: its value is different in every scope is used.
Does it mean that internalA should only be declared as static in the cpp file to really behave as internal?


Yes, usually you put static symbols only in .cpp files, as they do not make much sense in .h files - as each translation unit will get its own independent copy of this symbol. This is normal C++ behavior, nothing MSVC specific.

There is one more way to define functions - with "static inline" before them. It behaves similarly as "static" except compilers won't generate warning if function is not used in translation unit.


Again, should the internalA variable be declared in the cpp file to define the internal linkage?

If you do unity build, the concepts of "files" do not exist - there is only one translation unit. Everybody can access whatever symbol they want - as everything is visible in same translation unit.

Further more, the internalF is not accessible regardless the internal linkage. Is this result correct or a misconception on my side?

Not sure what you mean by "not accessible" here. It is accessible in unity build as your comment says it.


In Unity builds you really should do "static" prefix before all functions ("static inline" is also OK). As that will greatly improve compilation and linker speed. Because then compiler won't need to put symbol names in exported symbol table, and linker will not need to deal with them.
If you do unity build, the concepts of "files" do not exist - there is only one translation unit. Everybody can access whatever symbol they want - as everything is visible in same translation unit.


That was my point, regardless linkage it seems that in Unity build, the linker hints are only informative for the coder. It seems a bit an excess to keep specifying the other meanings than the "static inline" as you mentioned.

In Unity builds you really should do "static" prefix before all functions ("static inline" is also OK). As that will greatly improve compilation and linker speed. Because then compiler won't need to put symbol names in exported symbol table, and linker will not need to deal with them.


Yes that's true, it is already obvious even with the smallest sample code, that unity build is way faster for the reasons you are mentioning.

Any recommendations (as a general guide) to reduce linker times when using multiple compile units?

Not sure what you mean by "not accessible" here.

Sorry, a typo, you are correct. It is accessible in unity build.


Thanks