Introduction Introduction Debugging and performance tuning are major parts of programming. Knowing the available tools and techniques can make this part of programming easier. This book covers debugging techniques and tools that can be used to solve both kernel and application problems on the Linux operating system. The book includes many sample programs that demonstrate how to use the best profiling and debugging tools available for Linux. All the tools are open-source and continue to be enhanced by the open-source community. The goal of the book is to provide you with the knowledge and skills you need to understand and solve software problems on Linux servers. It discusses techniques and tools used to capture the correct data using first failure data capture approaches. This Book s Audience As the Linux operating system moves further into the enterprise, the topic of being able to fix problems that arise in a timely manner becomes very important. This book helps software developers, system administrators, and service personnel find and fix that problem or capture the correct data so that the problem can be fixed. This book is intended for the person who is developing or supporting Linux applications or even the kernel. Chapter Descriptions This book is organized into 14 chapters, each focusing on a specific tool or set of tools. The chapters also describe the steps to build and install the tools in case your Linux distribution does not ship with that tool or if there is a later release of the tool. Most of these tools are easy to build or add to the kernel. Chapter 1, "Profiling," discusses methods to measure execution time and real-time performance. Application performance tuning is a complex process that requires correlating pieces of data with source code to locate and analyze performance problems. This chapter shows a sample program that is tuned using a profiler called gprof and a code coverage tool called gcov. Chapter 2, "Code Coverage," discusses coverage code that can be used to determine how well your test suites work. One indirect benefit of gcov is that its output can be used to identify which test case provides coverage for which source file. Code coverage during testing is one important measurement of software quality. Like an X-ray machine, gcov peers into your code and reports on its inner workings. What would debugging be without a debugger? Chapter 3, "GNU Debugger gdb ," looks at the GNU debugger. You can debug by adding printf statements to a program, but this is clumsy and very time consuming. A debugger like gdb is a much more efficient debugging tool. Chapter 4, "Memory Management Debugging," looks at the APIs for memory management, which, although small, can give rise to a large number of disparate problems. These include reading and using uninitialized memory, reading/writing from memory past or in front of underrun the allocated size, reading/writing inappropriate areas on the stack, and memory leaks. This chapter covers four memory management checkers: MEMWATCH, YAMD, Electric Fence, and Valgrind. We ll review the basics, write some "buggy" code, and then use each of these tools to find the mistakes. The /proc file system is a special window into the running Linux kernel and is covered in Chapter 5, "System Information /proc ." The /proc file system provides a wealth of information for the Linux kernel. It offers information about each process to system-wide information about CPU, memory, file systems, interrupts, and partitions. Some of the utilities that use /proc entries to get data from the system include iostat, sar, lsdev, lsusb, lspci, vmstat, and mpstat. Each of these utilities is covered in the chapter. Chapter 6, "System Tools," looks at various tools that can be used to pinpoint what is happening to the system and to find which component of the system is having a problem. The ps command is a valuable tool that can be used to report the status of each of the system processes. Three other process tools are covered-pgrep, pstree, and top. The strace command lets you trace system calls. The magic key sequence can provide a back trace for all the processes on the system. The lsof tool can be used to list the open files on the system. Finally, the network debugging tools ifconfig, arp, ethereal, netstat, and tcpdump are covered. They can help solve network-type problems. Many kernel bugs show themselves as NULL pointer dereferences or other values to pointers that are incorrect. The common result of such a bug is the Oops message. Chapter 7, "System Error Messages," covers where an Oops message is stored, how to analyze the Oops, and finding the failing line of code. An important goal of a Linux systems administrator is to ensure that his or her systems are functioning and performing 100% of the time. Applications producing error messages, file systems not having free space available, network adapter failures, hard drives producing errors, and the kernel producing errors are just a few types of problems that could possibly stop a system, impacting that goal. Chapter 8, "Event Logging," helps administrators grapple with these issues by describing Syslog and event logging. Chapter 9, "Linux Trace Toolkit," shows how an execution trace shows exactly what scheduling decisions are made and how various management tasks are done. It captures how they are handled, how long they take, and to which process the processor has been allocated. The trace facility provides a dynamic way to gather system data. Application I/O latencies can also be identified, as well as the time when a specific application is actually reading from a disk. Certain types of locking issues also can be seen by tracing. In short, tracing can be used to: Isolate and understand system problems. Observe system and application execution for measuring system performance. Permit bottleneck analysis when many processes are interacting and communicating. The Linux Trace Toolkit LTT differs from strace or gprof in that LTT provides a global view of the system, including a view into the kernel. Chapter 10, "oprofile: a Profiler Supported by the Kernel," covers the kernel profiler called oprofile. Profilers are software development tools designed to help analyze the performance of applications and the kernel. They can be used to identify sections of code that aren t performing as expected. They provide measurements of how long a routine takes to execute, how often it is called, where it is called from, and how much time it takes. Profiling is also covered in Chapter 1; one profiler in that chapter is called gprof. Another topic covered in Chapter 10 is ways to minimizing cache misses. Cache misses can be a cause of applications not performing as expected. User-Mode Linux UML is covered in Chapter 11, "User-Mode Linux"; it is a fully functional Linux kernel. It runs its own scheduler and virtual memory VM system, relying on the host kernel for hardware support. The benefits of UML from a debugging point of view are that it lets you do kernel development and debugging at the source code level using gdb. The UML technology can be a powerful tool to reduce the time needed to debug a kernel problem and development kernel-level features. Chapter 12, "Dynamic Probes," explains dynamic probes Dprobes , which is a technique for acquiring diagnostic information without custom-building the component. Dynamic probes can also be used as a tracing mechanism for both user and kernel space. It can be used to debug software problems that are encountered in a production environment that can t be re-created in a test lab environment. Dprobes are particularly useful in production environments where the use of an interactive debugger is either undesirable or unavailable. Dprobes also can be used during the code development phase to cause faults or error injections into code paths that are being tested. Chapter 13, "Kernel-Level Debuggers kgdb and kdb ," covers two kernel-level debuggers: kgdb and kdb. kgdb is an extension to gdb that allows the gdb debugger to debug kernel-level code. One key feature of kgdb is that it allows source code-level debugging of kernel-level code. The kdb debugger allows kernel-level debugging but does not provide source-level debugging. There are multiple ways for Linux to support a crash dump. Chapter 14, "Crash Dump," covers the different types of crash dumps. It discusses Linux Kernel Crash Dump LKCD , Netdump, Diskdump, and mcore. Crash dump is designed to meet the needs of end users, support personnel, and systems administrators needing a reliable method of detecting, saving, and examining system problems. There are many benefits of having a bug report and dump of the problem, since the dump provides a significant amount of information about the system s state at the time of the problem. Copyright Pearson Education. All rights reserved.