In concrete, the concept on how the internal and external linkage works for every compile unit and how that can possibly affect linking times or that we can produce linker errors if not applied correctly.
After experimenting with a minimal code example, want to share what I have put together and the results.
Will appreciate any corrections, and will try to keep the discussion under the C realm, but I have found that the C++ classes might also be a topic of interest when trying to further understand how the linker works.
Specs: MVS Community 2019, Win10Pro
Multiple compile units
Command line
1 2 3 4 5 6 | cl /c -Z7 ModuleA.cpp cl /c -Z7 ModuleB.cpp dumpbin /SYMBOLS ModuleA.obj > ModuleA.dmp dumpbin /SYMBOLS ModuleB.obj > ModuleB.dmp cl -Z7 linker.cpp ModuleA.obj ModuleB.obj /link dumpbin /SYMBOLS linker.obj > linker.dmp |
linker.cpp
1 2 3 4 5 6 7 8 9 10 11 12 | #include "ModuleA.h" #include "ModuleB.h" int main() { externalG(); internalA = 9; // internal linkage, behaves like a local variable // internalF(); // error! function not accesible, internal linkage externalF(); externalA = 19; // external linkage return (0); } |
ModuleA.h/cpp, internalA, according to the documentation, should not be visible to a different compile unit (other than ModuleA)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #if !defined(MODULEA_H) #define MODULEA_H static int internalA; // internal linkage extern int externalA; // external linkage inline void internalF(); // internal linkage extern void externalF(); // external linkage #endif #include "ModuleA.h" void internalF() { internalA = 2; externalA = 23; } void externalF() { internalA = 22; } |
ModuleB.h/cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #if !defined(MODULEB_H) #define MODULEB_H extern void externalG(); // external linkage #endif #include "ModuleB.h" #include "ModuleA.h" int externalA; void externalG() { externalA = 200; internalA = 400; // internal linkage, behaves like a local variable // internalF(); // error, UNDEF symbol f() because of internal linkage //internalA = 10; } |
At this point, the interesting findings is related to the internalA, declared in the ModuleA: regardless is declared as static, it can be accessed in any other module.
But more important, it behaves as a local variable: its value is different in every scope is used.
Does it mean that internalA should only be declared as static in the cpp file to really behave as internal?
For those who have experience working with other compiles, should I expect that this behavior is gonna be the same for g++ too ?
For the rest of the functions, it works as expected. For instance ModuleB.cpp, include ModuleA.h, but it can't access the internalF function, and the linker is gonna report that error as an "Unreasolved external symbol".
externalA variable, which is defined in ModuleB, it is still accessible and has a global scope. All good.
Single compile unit (Unity build)
Command line
1 2 | cl -Z7 linker.cpp ModuleA.obj ModuleB.obj /link dumpbin /SYMBOLS linker.obj > linker.dmp |
linker.cpp
1 2 3 4 5 6 7 8 9 10 11 12 | #include "ModuleA.cpp" #include "ModuleB.cpp" int main() { externalG(); internalA = 9; // internal linkage, behaves like a local variable internalF(); // unity build now can access the internal linkage function externalF(); externalA = 19; // external linkage return (0); } |
internalA, now really behaves as a global variable, regardless is declared static.
Again, should the internalA variable be declared in the cpp file to define the internal linkage?
Further more, the internalF is not accessible regardless the internal linkage. Is this result correct or a misconception on my side?
It seems that the concept of internal/external linkage is less meaningful to the linker and more informative to the coder when using a unity build??
Appreciate your feedback.