Unveiling the Power of 6 Linux Troubleshooting Commands

Unveiling the Power of 6 Linux Troubleshooting Commands
Unveiling the Power of 6 Linux Troubleshooting Commands

Unveiling the Power of 6 Linux Troubleshooting Commands - This Tutorial introduces some tools that go beyond df -h and du -sh. Tools that will enable you to troubleshoot a variety of issues with file systems and storage.

The tutorial includes examples demonstrating the practical application of each command, such as listing open files, finding processes using specific file systems, terminating processes with fuser, using chroot for troubleshooting, and monitoring I/O activities with iostat, iotop, and vmstat.

6 Linux Troubleshooting Commands

Explore six crucial Linux troubleshooting commands beyond df -h and du -sh. From lsof to vmstat, discover tools for effective system diagnostics and problem resolution. 

Each command is explained with practical examples, empowering you to enhance your Linux troubleshooting skills and ensure seamless system maintenance. 

Dive into this comprehensive guide to elevate your understanding of critical commands and optimize the performance of your Linux environment.

1. lsof

List open files with lsof.

When invoked without options, lsof will list all open files. You can see the command (init in this case), its PID (1) and the user (root) has openend the root directory and /sbin/init. The FD (file descriptor) columns shows that / is both the root directory (rtd) and current working directory (cwd) for the /sbin/init command. The FD column displays rtd for root directory, cwd for current directory and txt for text (both including data and code).

root@debian7:~# lsof | head -4
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 254,0 4096 2 /
init 1 root rtd DIR 254,0 4096 2 /
init 1 root txt REG 254,0 36992 130856 /sbin/

Other options in the FD column besides w for writing, are r for reading and u for both reading and writing. You can look at open files for a process id by typing lsof -p PID. For init this would look like this:

lsof -p 1

The screenshot below shows basic use of lsof to prove that vi keeps a .swp file open (even when stopped in background) on our freshly mounted file system.

[root@RHEL65 ~]# df -h | grep sdb
/dev/sdb1 541M 17M 497M 4% /srv/project33
[root@RHEL65 ~]# vi /srv/project33/busyfile.txt
[1]+ Stopped vi /srv/project33/busyfile.txt
[root@RHEL65 ~]# lsof /srv/*
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
vi 3243 root 3u REG 8,17 4096 12 /srv/project33/.busyfile.txt.swp

Here we see that rsyslog has a couple of log files open for writing (the FD column).

root@debian7:~# lsof /var/log/*
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsyslogd 2013 root 1w REG 254,0 454297 1308187 /var/log/syslog
rsyslogd 2013 root 2w REG 254,0 419328 1308189 /var/log/kern.log
rsyslogd 2013 root 5w REG 254,0 116725 1308200 /var/log/debug
rsyslogd 2013 root 6w REG 254,0 309847 1308201 /var/log/messages
rsyslogd 2013 root 7w REG 254,0 17591 1308188 /var/log/daemon.log
rsyslogd 2013 root 8w REG 254,0 101768 1308186 /var/log/auth.log

You can specify a specific user with lsof -u. This example shows the current working directory for a couple of command line programs.

[paul@RHEL65 ~]$ lsof -u paul | grep home
bash 3302 paul cwd DIR 253,0 4096 788024 /home/paul
lsof 3329 paul cwd DIR 253,0 4096 788024 /home/paul
grep 3330 paul cwd DIR 253,0 4096 788024 /home/paul
lsof 3331 paul cwd DIR 253,0 4096 788024 /home/paul

The -u switch of lsof also supports the ^ character meaning 'not'. To see all open files, but not those open by root:

lsof -u^root

2. fuser

The fuser command will display the 'user' of a file system.

In this example we still have a vi process in background and we use fuser to find the process id of the process using this file system.

[root@RHEL65 ~]# jobs
[1]+ Stopped vi /srv/project33/busyfile.txt
[root@RHEL65 ~]# fuser -m /srv/project33/
/srv/project33/: 3243

Adding the -u switch will also display the user name.

[root@RHEL65 ~]# fuser -m -u /srv/project33/
/srv/project33/: 3243(root)

You can quickly kill all processes that are using a specific file (or directory) with the -k switch.

[root@RHEL65 ~]# fuser -m -k -u /srv/project33/
/srv/project33/: 3243(root)
[1]+ Killed vi /srv/project33/busyfile.txt
[root@RHEL65 ~]# fuser -m -u /srv/project33/
[root@RHEL65 ~]#

This example shows all processes that are using the current directory (bash and vi in this case).

root@debian7:~/test42# vi file42
[1]+ Stopped vi file42
root@debian7:~/test42# fuser -v .
 USER PID ACCESS COMMAND
/root/test42: root 2909 ..c.. bash
 root 3113 ..c.. vi

This example shows that the vi command actually accesses /usr/bin/vim.basic as an executable file.

root@debian7:~/test42# fuser -v $(which vi)
 USER PID ACCESS COMMAND
/usr/bin/vim.basic: root 3113 ...e. vi

The last example shows how to find the process that is accessing a specific file.

[root@RHEL65 ~]# vi /srv/project33/busyfile.txt
[1]+ Stopped vi /srv/project33/busyfile.txt
[root@RHEL65 ~]# fuser -v -m /srv/project33/busyfile.txt
 USER PID ACCESS COMMAND
/srv/project33/busyfile.txt:
 root 13938 F.... vi
[root@RHEL65 ~]# ps -fp 13938
UID PID PPID C STIME TTY TIME CMD
root 13938 3110 0 15:47 pts/0 00:00:00 vi /srv/project33/busyfile.txt

3. chroot

The chroot command creates a shell with an alternate root directory. It effectively hides anything outside of this directory.

In the example below we assume that our system refuses to start (maybe because there is a problem with /etc/fstab or the mounting of the root file system).

We start a live system (booted from cd/dvd/usb) to troubleshoot our server. The live system will not use our main hard disk as root device

root@livecd:~# df -h | grep root
rootfs 186M 11M 175M 6% /
/dev/loop0 807M 807M 0 100% /lib/live/mount/rootfs/filesystem.squashfs
root@livecd:~# mount | grep root
/dev/loop0 on /lib/live/mount/rootfs/filesystem.squashfs type squashfs (ro)

We create some test file on the current rootfs.

root@livecd:~# touch /file42
root@livecd:~# mkdir /dir42
root@livecd:~# ls /
bin dir42 home lib64 opt run srv usr
boot etc initrd.img media proc sbin sys var
dev file42 lib mnt root selinux tmp vmlinuz

First we mount the root file system from the disk (which is on lvm so we use /dev/mapper instead of /dev/sda5).

root@livecd:~# mount /dev/mapper/packer--debian--7-root /mnt

We are now ready to chroot into the rootfs on disk.

root@livecd:~# cd /mnt
root@livecd:/mnt# chroot /mnt
root@livecd:/# ls /
bin dev initrd.img lost+found opt run srv usr vmlinuz
boot etc lib media proc sbin sys vagrant
data home lib64 mnt root selinux tmp var

Our test files (file42 and dir42) are not visible because they are out of the chrooted environment.

Note that the hostname of the chrooted environment is identical to the existing hostname.

To exit the chrooted file system:

root@livecd:/# exit
exit
root@livecd:~# ls /
bin dir42 home lib64 opt run srv usr
boot etc initrd.img media proc sbin sys var
dev file42 lib mnt root selinux tmp vmlinuz

4. iostat

iostat reports IO statitics every given period of time. It also includes a small cpu usage summary. This example shows iostat running every ten seconds with /dev/sdc and /dev/sde showing a lot of write activity.

[root@RHEL65 ~]# iostat 10 3
Linux 2.6.32-431.el6.x86_64 (RHEL65) 06/16/2014 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
 5.81 0.00 3.15 0.18 0.00 90.85
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 42.08 1204.10 1634.88 1743708 2367530
sdb 1.20 7.69 45.78 11134 66292
sdc 0.92 5.30 45.82 7672 66348
sdd 0.91 5.29 45.78 7656 66292
sde 1.04 6.28 91.49 9100 132496
sdf 0.70 3.40 91.46 4918 132440
sdg 0.69 3.40 91.46 4918 132440
dm-0 191.68 1045.78 1362.30 1514434 1972808
dm-1 49.26 150.54 243.55 218000 352696
avg-cpu: %user %nice %system %iowait %steal %idle
 56.11 0.00 16.83 0.10 0.00 26.95
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 257.01 10185.97 76.95 101656 768
sdb 0.00 0.00 0.00 0 0
sdc 3.81 1.60 2953.11 16 29472
sdd 0.00 0.00 0.00 0 0
sde 4.91 1.60 4813.63 16 48040
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
dm-0 283.77 10185.97 76.95 101656 768
dm-1 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
 67.65 0.00 31.11 0.11 0.00 1.13
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 466.86 26961.09 178.28 238336 1576
sdb 0.00 0.00 0.00 0 0
sdc 31.45 0.90 24997.29 8 220976
sdd 0.00 0.00 0.00 0 0
sde 0.34 0.00 5.43 0 48
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
dm-0 503.62 26938.46 178.28 238136 1576
dm-1 2.83 22.62 0.00 200 0
[root@RHEL65 ~]#

Other options are to specify the disks you want to monitor (every 5 seconds here):

iostat sdd sde sdf 5

Or to show statistics per partition:

iostat -p sde -p sdf 5

5. iotop

iotop works like the top command but orders processes by input/output instead of by CPU.

By default iotop will show all processes. This example uses iotop -o to only display processes with actual I/O.

[root@RHEL65 ~]# iotop -o
Total DISK READ: 8.63 M/s | Total DISK WRITE: 0.00 B/s
 TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
15000 be/4 root 2.43 M/s 0.00 B/s 0.00 % 14.60 % tar cjf /srv/di...
25000 be/4 root 6.20 M/s 0.00 B/s 0.00 % 6.15 % tar czf /srv/di...
24988 be/4 root 0.00 B/s 7.21 M/s 0.00 % 0.00 % gzip
25003 be/4 root 0.00 B/s 1591.19 K/s 0.00 % 0.00 % gzip
25004 be/4 root 0.00 B/s 193.51 K/s 0.00 % 0.00 % bzip2

Use the -b switch to create a log of iotop output (instead of the default interactive view).

[root@RHEL65 ~]# iotop -bod 10
Total DISK READ: 12.82 M/s | Total DISK WRITE: 5.69 M/s
 TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
25153 be/4 root 2.05 M/s 0.00 B/s 0.00 % 7.81 % tar cjf /srv/di...
25152 be/4 root 10.77 M/s 0.00 B/s 0.00 % 2.94 % tar czf /srv/di...
25144 be/4 root 408.54 B/s 0.00 B/s 0.00 % 0.05 % python /usr/sbi...
12516 be/3 root 0.00 B/s 1491.33 K/s 0.00 % 0.04 % [jbd2/sdc1-8]
12522 be/3 root 0.00 B/s 45.48 K/s 0.00 % 0.01 % [jbd2/sde1-8]
25158 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [flush-8:64]
25155 be/4 root 0.00 B/s 493.12 K/s 0.00 % 0.00 % bzip2
25156 be/4 root 0.00 B/s 2.81 M/s 0.00 % 0.00 % gzip
25159 be/4 root 0.00 B/s 528.63 K/s 0.00 % 0.00 % [flush-8:32]

This is an example of iotop to track disk I/O every ten seconds for one user named vagrant (and only one process of this user, but this can be omitted). The -a switch accumulates I/O over time.

[root@RHEL65 ~]# iotop -q -a -u vagrant -b -p 5216 -d 10 -n 10
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
 TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
 5216 be/4 vagrant 0.00 B 0.00 B 0.00 % 0.00 % gzip
Total DISK READ: 818.22 B/s | Total DISK WRITE: 20.78 M/s
 5216 be/4 vagrant 0.00 B 213.89 M 0.00 % 0.00 % gzip
Total DISK READ: 2045.95 B/s | Total DISK WRITE: 23.16 M/s
 5216 be/4 vagrant 0.00 B 430.70 M 0.00 % 0.00 % gzip
Total DISK READ: 1227.50 B/s | Total DISK WRITE: 22.37 M/s
 5216 be/4 vagrant 0.00 B 642.02 M 0.00 % 0.00 % gzip
Total DISK READ: 818.35 B/s | Total DISK WRITE: 16.44 M/s
 5216 be/4 vagrant 0.00 B 834.09 M 0.00 % 0.00 % gzip
Total DISK READ: 6.95 M/s | Total DISK WRITE: 8.74 M/s
 5216 be/4 vagrant 0.00 B 920.69 M 0.00 % 0.00 % gzip
Total DISK READ: 21.71 M/s | Total DISK WRITE: 11.99 M/s

6. vmstat

While vmstat is mainly a memory monitoring tool, it is worth mentioning here for its reporting on summary I/O data for block devices and swap space.

This example shows some disk activity (underneath the -----io---- column), without swapping.

[root@RHEL65 ~]# vmstat 5 10
procs ----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 5420 9092 14020 340876 7 12 235 252 77 100 2 1 98 0 0
 2 0 5420 6104 13840 338176 0 0 7401 7812 747 1887 38 12 50 0 0
 2 0 5420 10136 13696 336012 0 0 11334 14 1725 4036 76 24 0 0 0
 0 0 5420 14160 13404 341552 0 0 10161 9914 1174 1924 67 15 18 0 0
 0 0 5420 14300 13420 341564 0 0 0 16 28 18 0 0 100 0 0
 0 0 5420 14300 13420 341564 0 0 0 0 22 16 0 0 100 0 0
...
[root@RHEL65 ~]#

You can benefit from vmstat's ability to display memory in kilobytes, megabytes or even kibibytes and mebibytes using -S (followed by k K m or M).

[root@RHEL65 ~]# vmstat -SM 5 10
procs ----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 5 14 11 334 0 0 259 255 79 107 2 1 97 0 0
 0 0 5 14 11 334 0 0 0 2 21 18 0 0 100 0 0
 0 0 5 15 11 334 0 0 6 0 35 31 0 0 100 0 0
 2 0 5 6 11 336 0 0 17100 7814 1378 2945 48 21 31 0 0
 2 0 5 6 11 336 0 0 13193 14 1662 3343 78 22 0 0 0
 2 0 5 13 11 330 0 0 11656 9781 1419 2642 82 18 0 0 0
 2 0 5 9 11 334 0 0 10705 2716 1504 2657 81 19 0 0 0
 1 0 5 14 11 336 0 0 6467 3788 765 1384 43 9 48 0 0
 0 0 5 14 11 336 0 0 0 13 28 24 0 0 100 0 0
 0 0 5 14 11 336 0 0 0 0 20 15 0 0 100 0 0
[root@RHEL65 ~]#

Closing statement

In conclusion, mastering these Linux troubleshooting commands empowers users to diagnose and resolve a variety of system issues efficiently, contributing to the overall stability and performance of a Linux environment.

Bangkit Ade Saputra
Bangkit Ade Saputra At the end of the day, my job involves people. we're complicated, we're always changing, we have millions of things going on in our lives, and changing jobs is always a big decision.

Post a Comment for "Unveiling the Power of 6 Linux Troubleshooting Commands"