Chapter objective: Describe the benefits and costs of garbage collection. Show how to improve program performance by reducing the amount of garbage it generates.
In simplest terms, garbage is any storage that your program once used, but uses no longer. Here's a simple example:
(let ((x (list 1 2 3 4 5))) (print x))
When you evaluate this form, the list
'(1 2 3 4 5 6) is
first bound to
X and then printed. Once control leaves the
LET form, the list bound to
X is no longer
accessible; its storage can be reclaimed by the garbage collector.
Actually, there's a minor complication that you should know about. When you evaluate a form in the Lisp listener, the form itself is assigned to the symbol
+, and the value is assigned to the symbol
*. The previous form and value are assigned to
**, respectively, and the form and value before that are assigned to
***. Because these three pairs of variables give you a way to access the forms and results, a form and its result can't really become garbage until you've evaluated additional forms to flush these six variables.
You won't normally have to worry about this unless you've done something in the listener to exhaust all available memory in Lisp; if you can evaluate a simple expression (like
T) three times, you'll release any storage held by
*, and friends.
Lisp allocates storage as needed for your program's data. You don't have direct control over how or when storage is allocated; the compiler is free to do the best job it can to satisfy the meaning of your program.
Lisp does not provide a way for your program to explicitly deallocate storage. This is an important feature, because you can never write a program to mistakenly deallocate storage that is still needed elsewhere in the program. This eliminates an entire class of errors, sometimes referred to as "dead pointer bugs" in languages that support explicit storage allocation and deallocation.
On the other hand, your program may eventually run out of memory if your program never deallocates storage. So a language (like Lisp) that doesn't support explicit deallocation must still provide a mechanism to automatically deallocate storage when the storage is no longer needed. The garbage collector's job is to figure out which storage can no longer be accessed by your program, and then recycle those inaccessible storage blocks for later use.
Lisp compiles your program in such a way that all of its allocated storage can be found by following pointers from a small number of known root pointers. The compiler and runtime system arrange for your program to retain type information at runtime; this is combined with compile-time knowledge of storage layouts to encode knowledge of the locations of pointers within data structures.
The garbage collector follows every pointer in every reachable data structure, starting with the root set. As it does so, it marks the reachable data structures. Once every pointer has been followed, and its referenced data structure marked, any block of memory that is unmarked is unreachable by your program. The garbage collector then reclaims these unmarked blocks for future use by the storage allocator.
The actual marking algorithm used by the garbage collector must account for cycles in the reachable data structures, and must perform in limited space and time; these details complicate the implementation of a garbage collector. Also, most collectors will relocate the marked data (and adjust references accordingly). [Jones96] provides an excellent survey and analysis of various garbage collection techniques.
Garbage causes your program to run slower. The more garbage your program creates, the more time the garbage collector will need to spend recycling the garbage. Modern garbage collectors are very efficient; it's unlikely that you'll see a noticeable pause in your program's execution as the garbage collector runs. However, the cumulative effect of many small pauses will cause a detectable degradation in overall performance.
The good news is that garbage collection ensures that your program will never suffer from memory leaks or dead pointers.
Also, because many garbage collector implementations rearrange storage as your program runs, heap fragmentation is minimized; thus, a large Lisp program's performance will not degrade over time like a C or C++ program that performs comparable storage allocation (typically 25 to 50 percent degradation for a C or C++ program, depending upon heap size, malloc/free implementation, and allocation/deallocation patterns).
You should note that explicit storage allocation and deallocation has overheads which are not strictly predictable. In typical malloc and free implementations, block allocation involves a search and deallocation involves extra work to coalesce free blocks; both of these activities are of effectively indeterminate duration, affected by the size and fragmentation of the heap.
You can reduce garbage collection overhead by reducing garbage generation. Use appropriate data structures; list manipulation is the most common cause of garbage creation in poorly-written Lisp programs. Pay attention to whether an operation returns a fresh copy or a (possibly modified) existing copy of data.
If you have a profiler available in your Lisp system, use it to find your program's hot spots for storage allocation.
Use destructive operations carefully; they can reduce garbage generation, but will cause subtle bugs if the destructively-modified data is shared by another part of your program.