How to Debug Linux Errors Like a Pro

On
htop screen in Linux command line environment

When it comes to desktop OS market share, Linux may not be the number one choice, but it still has millions of users across the world. Like any other operating system, it is too prone to errors. If you are new to Linux and struggle to resolve and debug errors in the Linux environment, here's a guide to help you tackle the most common errors, its users may encounter. Whether these errors are related to system crashes or an incorrectly configured application, this tutorial guides you through a step-by-step method to handle these errors. You can apply these methods to all popular distros without little or no changes.

htop screen in Linux command line environment
📷 Credit: Image by Rajeev Edmonds/FreshTechTips

Working knowledge of the command line environment is required to apply these methods. In most cases, you may require root privileges to complete the execution of the commands.

Read Also:
Setting Up an Encrypted Virtual Hard Drive on Linux: A Step-by-Step Guide

If you want to use a Linux system like a pro, learn the debugging methods given below. It'll give you an edge over other users who struggle to resolve Linux-related problems. Let's get started!

Step 1: Understand the Problem

Before you attempt to resolve a Linux error, it's imperative to first understand it correctly. Unless you are not aware of the following, you cannot resolve any error in the Linux environment. Here are some of the important questions to ask yourself before you move ahead with finding a solution.

  • What task you were executing or running when the error occurred?
  • Can you reproduce this error?
  • Have you recently made substantial changes to your Linux system?

Whenever an error occurs, it's always a good practice to take note of the following:

  • Write down the exact text of the error message. It may include error codes that may be required during the debugging process.
  • You must also take note of any unusual behaviour of the application or system in general that's completely different prior to the error.

Jot down all this information in a text editor of your choice and keep it ready for future reference.

Step 2: Check System Logs

Fortunately, Linux has rich logging system support. All you need is the knowledge about these logs. These logs contains tons of information about Linux events and errors. Let's learn how to check these logs for finding the root cause of an error.

1. View the System Log with journalctl

Linux uses the systemd software suite to manage systems and services on a Linux system. It extensively logs information related to events and services. To access its logs, you can use the journalctl command.

sudo journalctl

# Filters you can apply through this command

# Filter by time range:
sudo journalctl --since "2024-11-01" --until "2024-11-07"

# Filter for a specific service:
sudo journalctl -u nginx.service

# Check logs in real-time:
sudo journalctl -f

This is just the tip of the iceberg. Check the command's man page to see all the available options for log filtering. This command can give you insights about Linux events in chronological order.

2. Check the Kernel Log with dmesg

If you have to deal with device driver issues, hardware issues, or low-level system messages, the dmesg command is your best bet. It displays the Linux kernel message buffer. Here's how to use it.

dmesg | less

# To filter results for specific keywords or phrases:
dmesg | grep -i "X-Server"

You can also pipe the output to a text file for deferred analysis of the log.

3. Explore Application-Specific Logs

Apart from system logs, you also have the option to analyze logs created by applications. These application-specific logs help you pinpoint the exact reason for the error. Here are some common log file locations you can use to identify an error.

  • /var/log/syslog: Contains general system logs on Debian-based Linux distros.
  • /var/log/messages: Contains general system logs on CentOS-based Linux distros.
  • /var/log/auth.log: Here, you'll get authentication-related logs on Debian-based distributions.
  • /var/log/boot.log: And, this is the one containing boot process logs.

Here are some examples of how you can examine these logs:

sudo tail -f /var/log/syslog

# Use `grep` utility to search for specific keywords in the log files:
sudo grep "ERROR" /var/log/syslog

Applications like Apache or Nginx maintain separate logs which can be examined to find the exact reason of an error.

Step 3: Use Debugging Tools

Linux provides several tools and applications to analyze, monitor, and debug errors. Let's take a look at some of the most common and popular tools you can use to debug the errors on your Linux system.

1. Monitor System Performance with htop

You can use the htop command to get insights about the system resources (disk, memory, and CPU) and processes on a Linux system. It gives you an interactive interface where you get the data in real time.

htop interface in the Linux command-line environment
📷 `htop` gives you insights about the system resources on a Linux system

Through this tool, you can monitor and analyze the following:

  • Identify the processes that are consuming excessive system resources.
  • Identify orphaned or zombie processes which are marked with the Z symbol.
  • Find high wait times associated with active I/O processes.

2. Trace System Calls with strace

New developers often ignore the strace command that is one of the best tools to find the errors of a program. This command outputs all the system calls and signals that are executed and generated during the life cycle of a program.

strace command output
📷 Track all the system calls of a process execution life cycle

Here are some examples showcasing how you can use this command for debugging errors:

strace -o analyze.txt <command>

# Tracing errors when listing a non-existent directory
strace -o analyze.txt ls -la /nonexistent/directory

We are redirecting the strace output to the 'analyze.txt' text file. Storing the output in a file enables you to analyze it at a later time. Feel free to change the name of these files as per your preference.

3. Debug Programs with gdb

If you want to debug compiled programs on Linux, GNU Debugger (gdb) is your best option. It's a powerful tool to analyze and debug the program during its execution.

# Exceute the program through `gdb` for tracing and debugging:
gdb <program_name>

Once within the gdb environment, you can use the following commands:

  • run: Fire this command to start program execution.
  • backtrace: Use it to display the call stack. It's somewhat similar to the strace command.
  • break: Set breakpoints in your program to halt execution at given points.

4. Analyze Disk Usage

Sometimes, the errors are related to disk space or disk-related issues. Correctly monitoring and analyzing disks on your Linux machine is an important skill to debug errors related to these storage devices.

Here's a complete guide to disk management in Linux. It's a comprehensive tutorial that equips you with all the knowledge required to monitor and manage disks on a Linux system.

Step 4: Research and Use Online Resources

No one is perfect or has all the answers. If you get stuck and struggle to find a solution for an error, do not hesitate to go to the online Linux community. Here are some handpicked online resources you can use to seek help when debugging Linux errors.

Search Engines

One of the largest developer communities reside on Stack Overflow. You can either search existing solutions or can ask a new question related to your error. It's always a good practice to first search for an existing solution as there are high chances that someone has already asked about that problem.

Forums and Wikis

There are several good forums and Wikis where you can find answers to your queries. For example:

  • LinuxQuestions.org: A large forum dedicated to questions related to Linux. Register and benefit from its large user base.
  • Ubuntu Community Hub: It's an active community of Ubuntu user base. The forum is run and managed by Ubuntu and is a great place to seek help related to Ubuntu issues.
  • Red Hat Community: If you are running Red Hat on your computer, this is the community to resolve all your Linux-related queries.

Man Pages and Documentation

And last but not least is the native documentation of each application. In Linux world, this documentation is called Man Pages. Simply use the following commands to open built in docs of an application.

man <command>
info <command>

These man pages are like a user manual of an application or a command. Through it, you can reolve the type of errors where you are not using the correct switches or parameters while firing the command.

Step 5: Revert Changes or Use Backups

One of the useful strategies to tackle errors is to rollback the changes made earlier. It can be done either through commands or by using backups to restore the original state. Let's see how to do it.

Uninstall Problematic Packages

If you've identified a problematic package, you must remove it to get rid of errors.

sudo apt remove <package>   # On a Debian/Ubuntu system
sudo yum remove <package>   # On a RHEL/CentOS system

Restore Configuration Files

Most applications automatically create a backup of a configuration file, if any. You just need to know to use this backup to overwrite the changed configuration file. Here's an example:

sudo cp /etc/<config_file_name>.bak /etc/<config_file_name>

Use System Snapshots

On a Ubuntu computer, one can use the Timeshift tool to revert the system to its previous state.

sudo timeshift --restore

If you are not using this tool on your Ubuntu system, install it without giving it a second thought.

Common Linux Errors and Solutions

Let's see some of the most common Linux errors we may encounter and the ways to deal with them. Most of these problems are easy to tackle provided you know how to deal with them. Let's take a look.

1. Permission Denied

Sometimes you may get the 'Permission Denied' message. To rectify this problem, all you need to do is to modify the file permissions of the program in question.

chmod +x <file_name>

2. Command Not Found

This error occurs when you try to use a package that's missing on your system or the correct path to the executable file is missing in the $PATH environment variable. To fix it, do the following:

sudo apt install <package>
echo $PATH

3. Disk Full

If you get errors or warnings related to limited or no space left on your disk drive, use the following commands to resolve this issue.

df -h
sudo rm -rf /path/to/large/file_name

4. Dependency Issues

Sometimes, the installation of the primary package becomes a nightmare due to dependency conflicts. To resolve such issues, use the following command.

sudo apt --fix-broken install

5. Network Connectivity Issues

If you are experiencing network issues on your Linux system, try the following commands to fix it.

ping google.com
sudo systemctl restart NetworkManager

Conclusion

Debugging Linux errors doesn’t have to be daunting. By systematically analyzing the issue, utilizing logs and debugging tools, and leveraging community resources, you’ll quickly become proficient at troubleshooting.

Remember, every error you resolve enhances your Linux skills and confidence!