- Overview
-
Table of Contents
- Special Member Functions: Constructors, Destructors, and the Assignment Operator
- Operator Overloading
- Memory Management
- Templates
- Namespaces
- Time and Date Library
- Streams
- Object-Oriented Programming and Design Principles
- The Standard Template Library (STL) and Generic Programming
- Exception Handling
- Runtime Type Information (RTTI)
- Signal Processing
- Creating Persistent Objects
- Bit Fields
- New Cast Operators
- Environment Variables
- Variadic Functions
- Pointers to Functions
- Function Objects
- Pointers to Members
- Lock Files
- Design Patterns
- Dynamic Linking
-
Tips and Techniques
- Using the Swap() Algorithm
- Using class stopwatch for Performance Measurements
- Extending <tt><iostream></tt> to Support User-Defined Types
- Using <tt>auto_ptr</tt> To Automate Memory Management
- Using <tt>auto_ptr</tt> To Automate Memory Management, Part II
- Using <tt>auto_ptr</tt> To Automate Memory Management, Part III
- Using <tt>enum</tt>s as Mnemonic Indexes
- Create Objects on Pre-Allocated Memory Using <tt>Placement-new</tt>
- Online Books: <tt>Placement-new</tt>
- Bitwise Operators
- Bitwise Operators II
- Who's <tt>this</tt>?
- A Reference Guide
- The Virtues of Multiple Inheritance
- Interfaces
- Multiple Inheritance: Construction and Destruction Order
- nothrow new
- POD Initialization
- Object Initialization
- <tt>const</tt> Declarations
- The Semantics of <tt>volatile</tt>
- <tt>inline</tt> Functions
- Project Organization Guidelines
- All About <tt>bool</tt>
- <tt>typedef</tt> Declarations
- State of the <tt>union</tt>
- Dynamic Cast Uses
- Integrating C and C++
- <tt>const</tt>-Correctness
- <tt>const</tt>-Correctness: Advanced Issues
- Sprucing Up Legacy Code
- Virtual Constructors
- Naming Names
- Function Calls
- Speaking Standardese (updated)
- Speaking Standardese: the One Definition Rule
- Declarations and Definitions
- More on Declarations and Definitions
- The Most Vexing Parse
- Finally, At Last
- Sound Bytes (Admittedly Off Topic)
- Local Classes
- Complex Arithmetic
- Floating Point Woes
- String Manipulation
- The Object Model
- The Object Model II
- The Object Model III
- Temporary Objects
- Temporary Objects: Advanced Techniques
- Over-Engineering
- Security Enhancements
- Drop the (automatic) Pilot
- Choosing the Right Container
- Choosing the Right Container II
- Choosing the Right Container, Part III
- Arrays and Pointers
- Low-Level File I/O
- Low-Level File I/O Part II
- static Declarations, Part I
- static Declarations, Part II
- <code>static</code> Initialization Order
- Revisiting the Deprecation of File-Scope Static
- Virtual Memory and Memory Mapping
- Cellular Phone Programming Guidelines
- The Handle/Body Idiom
- Whole Program Optimization, Part I
- Whole Program Optimization, Part II
- Manipulating Directories
- Window Dressing
- <code>friend</code> Declarations
- <code>friend</code> Part II: the Interaction of Friendship and Template Classes
- Forcing Object Allocation on Specific Storage Types
- Lazy Evaluation
- Cache and Carry
- Controlling a Container’s Capacity
- Non-Blocking I/O, Part I
- Non-Blocking I/O, Part II
- Using Unions for Automatic Conversion
- Launching a Child Process
- <tt>switch</tt> Statements
- Introducing the "struct Hack"
- Scoped Enumerators
- Doing Statistics with STL
- Fixing the "Unresolved External" Linkage Error
- Understanding Calling Conventions
- Understanding the Empty Base Optimization
- Implementing RPC with the door Library, Part 1
- Implementing RPC with the door Library, Part 2
- Eliminating Two Common Pointer and <tt>sizeof</tt> Bugs
- Command Line Arguments
- Performance Myths Busting
- Tag Names And Types Part I
- Tag Names And Types Part II
- The Infamous goto
- Trimming Strings
- Can Objects Live Forever? Part I
- Can Objects Live Forever? Part II
- Five Ways to Improve Your Functions
- Member Aggregate Initialization
- Five Futile Coding-Style Debates
- The Good Parasite Idiom: An Exercise in OOD
- The Good Parasite Idiom: An Exercise in OOD, Part II
- The Good Parasite Idiom: An Exercise in OOD, Part III
- Ten Techniques to Reduce the Size of Your Classes, Part I
- Ten Techniques to Reduce the Size of Your Classes, Part II: Inheritance Issues
- Ten Techniques to Reduce the Size of Your Classes, Part III
- Ten Techniques to Reduce the Size of Your Classes IV
- Taking the Address of an Object with an Overloaded Operator <tt>&</tt>
- strcpy() -- How and Why Does It "Just Work"?
- Anonymous Structs
- Five Easy Ways to Reduce The Size of your Executables
- Standard Layout Classes and Trivially Copyable Types, Part I
- Standard Layout Classes and Trivially Copyable Types, Part II
- Five Simple Code Sanity Checks
- Five Things You Need to Know About C++11 Unions
- A Tour of C99
- A Tour of C1X
- C++0X: The New Face of Standard C++
- C++0x Concurrency
- The Reflecting Circle
- We Have Mail
- The Soapbox
- Numeric Types and Arithmetic
- Careers
- Locales and Internationalization
Integrating C and C++
Last updated Jan 1, 2003.
Using C code from a C++ program is pretty straightforward. Is the opposite possible at all? To the surprise (and delight) of many programmers, using C++ code in a C program is possible but it requires some precautions and elbow grease. In this article I will show how to overcome some of the difficulties that might arise you need to integrate C++ code into a C program, both at the source file level and at the binary file level. I will also explain how C and C++ differ in their handling of source-file entities and linkage, and how all that relates to integrating C++ code in a C program.
C Code Reuse
Using C code in a C++ program is rarely a problem. In fact, you do this more often than you think: on many implementations new is implemented as a call to the standard C function malloc(). Likewise, high-level fstream operations are delegated to low-level C I/O routines and there are many other examples of reusing C code in C++. And yet, accessing C++ code from a C program is a different story
ABI
Every C++ implementation defines it own Application Binary Interface (ABI). An ABI specifies the binary representation of a programming language's entities. This includes the underlying representation of objects in memory, how function calls are performed, the name decoration scheme of identifiers (which I will explain shortly), memory alignment requirement and so on.
When you write code in a high level language such C++, the compiler translates your source files into binary .obj files. In these binary files, C++ concepts such as objects, namespaces, overloaded operators, exceptions, templates and other goodies turn into entirely different beasts. For example, overloaded operators, member functions and function templates become ordinary freestanding functions with very unusual names called decorated names (also known as mangled names). Decorated names are compiler-generated strings that encode all that the compiler and a linker need to know about the original entity (function, variable, object etc.). The decorated name ensures for example that each overloaded version of the same function shall have a unique name, or that member functions with identical names and parameter lists declared in different classes shall have distinct names. Take for example the following class and its member functions:
class Task
{
public:
Task();
int suspend();
int resume();
int run();
int run(int priority, int cpu_timelimit);
int get_id() const;
virtual ~Task();
};
The compiler generates distinct decorated names for every member function of Task. Each decorated name encodes:
- The function's user-given name
- The return type
- The parameter list (parameter names, their order and their const/volatile qualification)
- The parameter passing mechanism (a register variable will usually have a distinct name from a variable passed on the stack).
- The enclosing class name
- Whether the function is static or virtual
- The namespace in which Task is declared.
Encoding the namespace in a member function's decorated name ensures that if two Task classes are defined in different namespaces, their member functions will still have distinct decorated names.
All this information is packed in a compact set of alphanumeric symbols concatenated into a long, cryptic and unique string -- the decorated name. Name decoration is what enables an extern function called get_id(), a member function called Task::get_id(), another member function called Person::get_id() and a function template called get_id<>() to live peacefully in the same program without causing name clashes. Consequently, when you call:
Person p; int n=p.get_id(0); //error, no arguments expected
The compiler can tell simply by examining the decorated name of the function being called that:
- Person::get_id() is being called
- The argument 0 is an error; the function takes no arguments.
- Person::get_id() is a valid initializer for n because it returns an int.
As said earlier, each compiler defines its own ABI. Therefore, you should assume that different compilers will generate different decorated names for the same identifier. Under some conditions (e.g., when using two different versions of the same compiler), even the same compiler might produce different decorated names for the same identifier.
Notice that name decoration isn't unique to C++. Every high-level programming language uses some form of assigning symbols (decorated names) to source-file entities. Notice also that decorated names aren't used for functions alone. Class objects, variables, user-defined types, temporaries and in some cases even constants are assigned decorated names.
Integration and Compiler-Specific ABIs
The ABI affects the process of integrating C++ code in a C program. For example, it explains why linking object files that were produced by different compilers is likely to fail. As each compiler uses a different ABI, the linker will find different decorated names, not knowing that they actually refer to the same entity.
Because C compilers know little about C++ name decoration and because C and C++ employ different ABIs, C programs can't access a function compiled by a C++ compiler unless you explicitly disable C++ name decoration. This brings us to another related concept known as linkage. When an identifier is said to have C++ linkage, it means that its decorated name is generated according to the C++ name decoration rules (linkage in this context doesn't refer to the process of linking .obj files to produce an executable. It simply means an ABI). This is the default linkage of identifiers in C++. You may override it. For example, to access a global function from a C source file, you must declare that function as extern "C" in the C++ source file. This declaration will instruct the C++ compiler to apply the C name decoration scheme for it. Consequently, a C compiler (and often compilers of Pascal, Fortran, Ada, COBOL etc.) can call this function even if it's compiled by a C++ compiler. Remember: A C++ compiler knows the C ABI but not vice versa.
Let's look at a concrete example. Suppose you define a global function called hashval() in a C++ file:
int hashval(int val);
When a C++ compiler compiles it, its decorated name may look like this:
__i__hshvl_4i //a simplified hypothetical decorated name
The __ that opens the decorated name indicates that this function is global. The first i that comes next encodes the return type which is int. The following __ separates the return type from the function's name (which often loses its vowels in the name decoration process). Next, a single underscore indicates the beginning of the parameter list which starts with 4 (the size of the sole argument) and i -- the argument's type. Your compiler may use slightly different conventions but in essence, this is what name decoration is all about.
A C compiler will assign a completely different decorated name for the same function. Therefore, when a C compiler translates the following call to hashval:
n=hashval(x); //in a .c file
it will look for an entirely different decorated name, something like
__hashval //equivalent decorated name in C
Let's clarify this point: the C and C++ compilers are referring to the same function; they simply assign a different decorated name to it when they process the function's declaration. Here's the snag: at link time, the linker will complain about an unresolved symbol '__hashval' because the C++ compiler never generated this decorated name (which a C compiler expects). To solve this problem you need to declare the original function like this:
extern "C" int hashval(int val); //force C name decoration
When a C++ compiler sees an extern "C" declaration, it uses a C name decoration for the identifier instead of the C++ decorated name. Consequently, the C++ compiler will now generate this decorated name instead:
__hashval
Now, the linker will resolve any reference to this function correctly because the C and C++ compilers and the linker all agree on the same decorated name for the same function.
extern "C" limitations
extern "C" isn't a panacea. You can't apply it to member functions, templates, or even global overloaded functions. Does this mean you have to give up all the joys of generic and object-oriented programming? Not at all. Remember that a C compiler doesn't care about the implementation of hashval(). Inside this function you can instantiate objects, call member functions, instantiate templates etc., because the function is still compiled under a C++ compiler:
extern "C" int hashval(int val)
{
std::string result;
int num=0;
std::stringstream str;
str<<val; //insert val to str
str>>result; //convert val to a string
//..play with result using hashing algorithms
return num;
}
The C compiler will only see the function's declaration, and calls to that function so it couldn't care less about the body of hashval().
Summary
Most users and implementers alike agree that forming a universal ABI is impossible because compilers may reorder the arguments of a function call, use different sizes for fundamental types (32-bit pointers versus 64-bits, signed versus unsigned char etc.) However, there are some interesting C++0x proposals that aim to unify certain aspects of the ABI and specify which core language changes will affect it.
