This time let's take a look what PLT and GOT has to do with tcmalloc at all ? Should we worry when going to production with our new tcmalloc allocator ?
In short terms YES we should.. Why ? Well lets find out.
At the beginning we need some basic knowledge about dynamic linking under Linux system with glibc, so lets start with that. To keep it simple I will only consider DT_RUNPATH, not the old and currently deprecated DT_RPATH (but please pay attention that in production environment its quite common to have old deprecated variables..). So when ld.so is trying to load dynamic shared library, there are few paths that ld.so will consider taking into consideration the following order, and indeed this very important to understand it when you deals with shared libraries:
- firstly path from the environment $LD_LIBRARY_PATH
- than paths from the caller's (the ELF object that require the library load through ELF dependency or dlopen) DT_RUNPATH.
- DT_RUNPATH is a set of paths, hardcoded in the binary at link time, that are here to help in the path resolution of dynamic library at runtime. At link time, it's controlled by the -rpath ld flag, or then environment variable LD_RUN_PATH
- from ld.so cache. Libraries present in the cache are found from paths given in /etc/ld.so.conf and files from /etc/ld.so.conf.d/
- and /lib and /usr/lib as a last resort
OK we have some basic knowledge, lets briefly explain what PLT is? as we will need this knowledge latter. One more time briefly it's procedure linkage table. This mechanism is used to speed up process startup. It allows position independent code (PIC) object (dynamically linked shared library) to lazily rely on foreign symbols defined by other ELF object. Exactly LAZY, that means PLT symbol is only resolved the very first time it is used at the cost of one more indirection before accessing the GOT (global offset table). If you symbol relies only on GOT its faster in runtime but slower in startup phase, as usually we are using only small subset of symbols, not all of them.
A bit more detailed info here:
http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
Some symbols in ELF can be declared as weak objects. It means that its possible to provide that symbol redefinition in another ELF object. This is exactly what static linkage is doing, it will keep only strong symbols after linkage thus there will be no conflict in linkage time. For dynamic libraries everything looks different mainly because dynamic linkage knows nothing about weak and strong symbols. When resolving a symbol dynamicly, the first encountered definition wins, it could be weak or strong it does not matter FIRST win always. Its clear here that order matters when you redefine symbols! So its clear now that if you want to link your tcmalloc dynamically you have to link it before glibc, otherwise you are one more time doomed :-)
Now lets dive deeper..
Do you ever think why you are able to redefine malloc/free/realloc/calloc? If no than maybe you read the the text above ? If yes than you should already know it. Indeed all allocation symbols are marked as weak in glibc, and this is why we are able to use tcmalloc instead of standard one! But this is not the end! Remember that ld.so is still here and it has to call malloc/free also! Which version it will call when resolving dynamic linkage ? Dynamic linker need malloc, but in order to call malloc there must by one already loaded, sounds weird but, well to link something dynamically you at least need one malloc call before you can redefine it. To fix this issue, the glibc always uses calls to the PLT version of malloc/calloc/realloc/free so that the address of the actual implementation can be easily rewritten at runtime with a single write. In a first time, malloc@plt points to the glibc malloc implementation. Later on, when the new malloc implementation is loaded and initialized (http://www.delorie.com/gnu/docs/glibc/libc_34.html), malloc@plt will point to the new implementation. Same for free@plt.
Now its probably a bit more clear comparing to our knowledge from first post about tcmalloc.
Can we now go even deeper ? We can and lets try to find out why you should be very careful when you changing default malloc.
Very early during the process startup, ld.so call _dl_init_paths which initializes the search paths for the current executable we are loading. It allocates through malloc@plt (pointing to glibc malloc) structures to store data from \$LD_LIBRARY_PATH and DT_RUNPATH. When looking for a dynamic object, ld.so sequentially calls open_path with paths from \$LD_LIBRARY_PATH, then DT_RUNPATH. If for some reason we could not load the library using these paths, then the structure if freed using free@plt and assigned to NULL (why its freed?). What is important ? dlopen! open_path can be called at any time. So if we call this after tcmalloc has been initialized, free@plt points to tc_free, but the data was allocated with glibc's malloc. BAM! core.
What I told you is now fixed in https://www.sourceware.org/ml/libc-alpha/2013-04/msg00308.html 2.18 glibc. But did you checked your glibc version ? :-) If its newer than that forget about it and go to prod without any worries (really? ;-) )
This fix is preventing free@plt calls for DT_RUNPATH (because free@plt could be now tc_free).
Sum up:
- Check your glibc version
- Check your LD_LIBRARY_PATH and all other paths
- Take your time to understand how tcmalloc works and why it works at all. Otherwise don't use it, cause you wont be able to say why it fail if it starts to.
Credits go to Amadeus MDW team that did a good job investigating some tricky parts that could be presented here now.
Rgds
$(TA)$
Comments $\TeX$ mode $ON$