Linux Performance Analysis and Tools.pdf

资源描述

《Linux Performance Analysis and Tools.pdf》由会员分享，可在线阅读，更多相关《Linux Performance Analysis and Tools.pdf（115页珍藏版）》请在三一文库上搜索。

1、 Linux Performance Analysis andTools Brendan Gregg Lead Performance Engineer brendangregg SCaLE11x February, 2013 Sunday, February 24, 13 Applications DBs,allservertypes,. SystemLibraries SystemCallInterface VFS Sockets Scheduler ext3/. ZFS TCP/UDP LVM IP Virtual Memory BlockDeviceInterface Etherne

2、t DeviceDrivers Linux Kernel Find the Bottleneck Disk Disk Port Port Transports Expander Interconnect Network Controller I/O Bridge I/O Controller Interface connect CPU 1 Memory Bus DRAM I/O Bus Operating System Hardware CPU Inter- Sunday, February 24, 13 whoami Lead Performance Engineer Work/Resear

3、ch: tools, visualizations, methodologies Was BrendanSun Microsystems, Oracle, now Joyent Sunday, February 24, 13 Joyent High-Performance Cloud Infrastructure Compete on cloud instance/OS performance Public/private cloud provider OS-Virtualization for bare metal performance (Zones) Core developers of

4、 SmartOS and node.js KVM for Linux guests Sunday, February 24, 13 SCaLE10x:Cloud PerformanceAnalysis Example perf issues, including new tools and visualizations: http:/dtrace.org/blogs/brendan/2012/01/30/performance-analysis-talk-at-scale10x/ Sunday, February 24, 13 SCaLE11x:Linux PerformanceAnalysi

5、s Systems Performance ENTERPRISE ANDTHE CLOUD Brendan Gregg Prentice Hall,2013 The primary operating system for my next book: (secondary is the OpenSolaris-illumos-based SmartOS) Sunday, February 24, 13 Agenda Background Linux Analysis and Tools Basic Intermediate Advanced Methodologies Challenges S

6、unday, February 24, 13 Performance Why do performance analysis? Reduce IT spend fi nd and eliminate waste, fi nd areas to tune, and do more with less Build scalable architectures understand system limits and develop around them Solve issues locate bottlenecks and latency outliers Sunday, February 24

7、, 13 Systems Performance Why study the operating system? Find and fi x kernel-based perf issues 2-20% wins: I/O or buffer size tuning, NUMA confi g, etc 2-200x wins: bugs, disabled features, perturbations causing latency outliers Kernels change, new devices are added, workloads scale, and new perf i

8、ssues are encountered. Analyze application perf from kernel/system context 2-2000x wins: identifying and eliminating unnecessary work Sunday, February 24, 13 Perspectives System analysis can be top-down, or bottom-up: Workload Application System Libraries System Calls Devices Workload Analysis Resou

9、rce Analysis Stack Kernel Developers Operating System Software Sysadmins Sunday, February 24, 13 Applications DBs,allservertypes,. SystemLibraries SystemCallInterface VFS Sockets Scheduler ext3/. ZFS TCP/UDP LVM IP Virtual Memory BlockDeviceInt. Ethernet DeviceDrivers Linux Kernel Kernel Internals E

10、ventually youll need to know some kernel internals user-level kernel-level Operating System Sunday, February 24, 13 Common System Metrics $ iostat Linux 3.2.6-3.fc16.x86_64 (node104) 02/20/2013 _x86_64_ (1 CPU) avg-cpu: %user 0.02 %nice %system %iowait 0.00 0.10 0.04 %steal 0.00 %idle 99.84 Device:

11、vda vdb tps 0.24 0.06 kB_read/s 7.37 5.51 kB_wrtn/s 2.15 7.79 kB_read 80735422 60333940 kB_wrtn 23571828 85320072 Its also worth studying common system metrics (iostat, .), even if you intend to use a monitoring product. Monitoring products often use the same metrics, read from /proc. Sunday, Februa

12、ry 24, 13 Analysis andTools Sunday, February 24, 13 Analysis andTools A quick tour of tools, to show what can be done Then, some methodologies for applying them Sunday, February 24, 13 Applications DBs,allservertypes,. SystemLibraries SystemCallInterface VFS Sockets Scheduler ext3/. ZFS TCP/UDP LVM

13、IP Virtual Memory BlockDeviceInterface Ethernet DeviceDrivers Analysis andTools Disk Disk Port Port I/O Controller Network Controller CPU 1 DRAM I/O Bridge Operating System Hardware Sunday, February 24, 13 Applications DBs,allservertypes,. SystemLibraries SystemCallInterface VFS Sockets Scheduler ex

14、t3/. ZFS TCP/UDP LVM IP Virtual Memory BlockDeviceInterface Ethernet DeviceDrivers perf dtrace stap top tcpdump ip nicstat dtrace Various: sar Analysis andTools CPU 1 Operating System perf slabtop dstat free DRAM top netstat strace Hardware perf pidstat mpstat dstat perf I/O Bridge I/O Controller Ne

15、twork Controller dstat Disk Disk Port Port /proc vmstat ping iostat iotop blktrace dtrace Sunday, February 24, 13 Tools:Basic uptime top or htop mpstat iostat vmstat free ping nicstat dstat Sunday, February 24, 13 uptime Shows load averages, which are also shown by other tools: $ uptime 16:23:34 up

16、126 days, 1:03, 1 user, load average: 5.09, 2.12, 1.82 This counts runnable threads (tasks), on-CPU, or, runnable and waiting. Linux includes tasks blocked on disk I/O. These are exponentially-damped moving averages, with time constants of 1, 5 and 15 minutes. With three values you can see if load i

17、s increasing, steady, or decreasing. If the load is greater than the CPU count, it might mean the CPUs are saturated (100% utilized), and threads are suffering scheduler latency. Might. Theres that disk I/O factor too. This is only useful as a clue. Use other tools to investigate! Sunday, February 2

18、4, 13 top System-wide and per-process summaries: $ top top - 01:38:11 up 63 days, 1:17, 2 users, load average: 1.57, 1.81, 1.77 Tasks: 256 total, 2 running, 254 sleeping, 0 stopped, 0 zombie Cpu(s): 2.0%us, 3.6%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 49548744k total, 16746572k used,

19、 32802172k free, 182900k buffers Swap: 100663292k total, 0k used, 100663292k free, 14925240k cached PID 11721 11715 10 51 11724 1 USER web web root root admin root PR 20 20 20 20 20 20 NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 0 623m 50m 4984 R 93 0.1 0:59.50 node 0 619m 20m 4916 S 25 0.0 0:07.52 no

20、de 0 0 0 0 S 1 0.0 248:52.56 ksoftirqd/2 0 0 0 0 S 0 0.0 0:35.66 events/0 0 19412 1444 960 R 0 0.0 0:00.07 top 0 23772 1948 1296 S 0 0.0 0:04.35 init . %CPU = interval sum for all CPUs (varies on other OSes) top can consume CPU (syscalls to read /proc) Straight-forward. Or is it? Sunday, February 24

21、, 13 top,cont. Interview questions: 1. Does it show all CPU consumers? 2. A process has high %CPU next steps for analysis? Sunday, February 24, 13 top,cont. 1. top can miss: short-lived processes kernel threads (tasks), unless included (see top options) 2. analyzing high CPU processes: identify why

22、profi le code path identify what execution or stall cycles High %CPU time may be stall cycles on memory I/O upgrading to faster CPUs doesnt help! Sunday, February 24, 13 htop Super top. Super confi gurable. Eg, basic CPU visualization: Sunday, February 24, 13 mpstat Check for hot threads, unbalanced

23、 workloads: $ mpstat -P ALL 1 02:47:49 CPU 02:47:50 all 02:47:50 0 02:47:50 1 02:47:50 2 02:47:50 3 02:47:50 4 02:47:50 5 02:47:50 6 02:47:50 7 02:47:50 8 %usr 54.37 22.00 19.00 24.00 100.00 100.00 100.00 100.00 16.00 100.00 %nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %sys %iowait 33.12

24、0.00 57.00 0.00 65.00 0.00 52.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 63.00 0.00 0.00 0.00 %irq 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %soft %steal %guest 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0

25、.00 0.00 0.00 0.00 0.00 %idle 12.50 21.00 16.00 24.00 0.00 0.00 0.00 0.00 21.00 0.00 . Columns are summarized system-wide in top(1)s header Sunday, February 24, 13 iostat Disk I/O statistics. 1st output is summary since boot. $ iostat -xkdz 1 Linux 2.6.35-32-server (prod21) 02/20/13 _x86_64_ (16 CPU

26、) Device: sda sdb rrqm/s 0.00 0.00 wrqm/s 0.00 0.35 r/s 0.00 0.00 w/s 0.00 0.05 rkB/s 0.00 0.10 wkB/s 0.00 1.58 / . . . / . Device: sdb rrqm/s 0.00 wrqm/s 0.00 r/s 591.00 w/s 0.00 rkB/s 2364.00 wkB/s 0.00 / . . . . . . / / avgqu-sz 0.00 0.00 0.00 await r_await w_await 0.84 0.84 0.00 3.82 3.47 3.86 2

27、.31 2.31 0.00 svctm 0.84 0.30 2.31 %util 0.00 0.00 0.00 . . . / avgqu-sz 0.95 await r_await w_await 1.61 1.61 0.00 svctm 1.61 %util 95.00 workload input resulting performance Sunday, February 24, 13 iostat,cont. %util: usefulness depends on target virtual devices backed by multiple disks may accept

28、more work a 100% utilization Also calculate I/O controller stats by summing their devices One nit: would like to see disk errors too. Add a “-e”? Sunday, February 24, 13 vmstat Virtual-Memory statistics, and other high-level summaries: $ vmstat 1 procs -memory- -swap- -io- r b swpd free buff cache s

29、i so bi bo 15 0 2852 46686812 279456 1401196 0 0 0 0 16 0 2852 46685192 279456 1401196 0 0 0 0 15 0 2852 46685952 279456 1401196 0 0 0 56 15 0 2852 46685960 279456 1401196 0 0 0 0 -system- -cpu- in cs us sy id wa 0 0 0 0 100 0 2136 36607 56 33 11 0 2150 36905 54 35 11 0 2173 36645 54 33 13 0 . First

30、 line of output includes some summary-since-boot values “r” = total number of runnable threads, including those running Swapping (aka paging) allows over-subscription of main memory by swapping pages to disk, but costs performance Sunday, February 24, 13 free Memory usage summary (Kbytes default): $

31、 free total Mem: 49548744 used 32787912 free 16760832 shared 0 buffers 61588 cached 342696 -/+ buffers/cache: Swap: 100663292 32383628 0 17165116 100663292 buffers: block device I/O cache cached: virtual page cache Sunday, February 24, 13 ping Simple network test (ICMP): Used to measure network late

32、ncy. Actually kernel kernel IP stack latency, including how the network handles ICMP. Tells us some, but not a lot (above is an exception). Lots of other/better tools for this (eg, hping). Try using TCP. $ ping PING (63.234.226.9): 56 data bytes 64 bytes from 63.234.226.9: icmp_seq=0 ttl=56 time=7

33、37.737 ms Request timeout for icmp_seq 1 64 bytes from 63.234.226.9: icmp_seq=2 ttl=56 time=819.457 ms 64 bytes from 63.234.226.9: icmp_seq=3 ttl=56 time=897.835 ms 64 bytes from 63.234.226.9: icmp_seq=4 ttl=56 time=669.052 ms 64 bytes from 63.234.226.9: icmp_seq=5 ttl=56 time=799.932 ms C - ping s

34、tatistics - 6 packets transmitted, 5 packets received, 16.7% packet loss round-trip min/avg/max/stddev = 669.052/784.803/897.835/77.226 ms Sunday, February 24, 13 nicstat Network statistics tool, ver 1.92 on Linux: # nicstat -z 1 Time Int rKB/s 01:20:58 eth0 0.07 01:20:58 eth4 0.28 01:20:58 vlan123

35、0.00 01:20:58 br0 0.00 Time Int rKB/s 01:20:59 eth4 42376.0 Time Int rKB/s 01:21:00 eth0 0.05 01:21:00 eth4 41834.7 Time Int rKB/s 01:21:01 eth4 42017.9 wKB/s rPk/s wPk/s 0.00 0.95 0.02 0.01 0.20 0.10 0.00 0.00 0.02 0.00 0.00 0.00 wKB/s rPk/s wPk/s 974.5 28589.4 14002.1 wKB/s rPk/s wPk/s 0.00 1.00 0

36、.00 977.9 28221.5 14058.3 wKB/s rPk/s wPk/s 979.0 28345.0 14073.0 rAvs 79.43 1451.3 42.00 42.00 rAvs 1517.8 rAvs 56.00 1517.9 rAvs 1517.9 wAvs 64.81 80.11 64.81 42.07 wAvs 71.27 wAvs 0.00 71.23 wAvs 71.24 %Util 0.00 0.00 0.00 0.00 %Util 35.5 %Util 0.00 35.1 %Util 35.2 Sat 0.00 0.00 0.00 0.00 Sat 0.0

37、0 Sat 0.00 0.00 Sat 0.00 . This was the tool I wanted, and fi nally wrote it out of frustration (Tim Cook ported and enhanced it on Linux) Calculate network controller stats by summing interfaces Sunday, February 24, 13 dstat A better vmstat-like tool. Does coloring (FWIW). Sunday, February 24, 13 T

38、ools:Basic,recap uptime top or htop mpstat iostat vmstat free ping nicstat dstat Sunday, February 24, 13 Applications DBs,allservertypes,. SystemLibraries SystemCallInterface VFS Sockets Scheduler ext3/. ZFS TCP/UDP LVM IP Virtual Memory BlockDeviceInterface Ethernet DeviceDrivers Tools:Basic,recap

39、Disk Disk Port Port 1 DRAM Operating System iostat Hardware top mpstat dstat CPU dstat vmstat dstat free top I/O Bridge nicstat I/O Controller Network Controller infer infer ping Sunday, February 24, 13 Tools:Intermediate sar netstat pidstat strace tcpdump blktrace iotop slabtop sysctl /proc Sunday,

40、 February 24, 13 sar System Activity Reporter. Eg, paging statistics -B: $ sar -B 1 Linux 3.2.6-3.fc16.x86_64 (node104) 02/20/2013 _x86_64_ (1 CPU) 05:24:34 05:24:35 05:24:36 05:24:37 05:24:38 05:24:39 05:24:40 PM PM PM PM PM PM PM pgpgin/s pgpgout/s 0.00 0.00 19.80 0.00 12.12 0.00 0.00 0.00 220.00

41、0.00 2206.06 0.00 fault/s 267.68 265.35 1339.39 534.00 644.00 6188.89 majflt/s 0.00 0.99 1.01 0.00 3.00 17.17 pgfree/s pgscank/s pgscand/s pgsteal/s 29.29 0.00 0.00 0.00 28.71 0.00 0.00 0.00 2763.64 0.00 1035.35 1035.35 28.00 0.00 0.00 0.00 74.00 0.00 0.00 0.00 5222.22 2919.19 0.00 2919.19 %vmeff 0.

42、00 0.00 100.00 0.00 0.00 100.00 . Confi gure to archive statistics from cron Many, many statistics available: -d: block device statistics, -q: run queue statistics, . Same statistics as shown by other tools (vmstat, iostat, .) Sunday, February 24, 13 netstat Various network protocol statistics using

43、 -s: $ netstat -s . Tcp: 127116 active connections openings 165223 passive connection openings 12904 failed connection attempts 19873 connection resets received 20 connections established 662889209 segments received 354923419 segments send out 405146 segments retransmited 6 bad segments received. 26

44、379 resets sent . TcpExt: 2142 invalid SYN cookies received 3350 resets received for embryonic SYN_RECV sockets 7460 packets pruned from receive queue because of socket buffer overrun 2932 ICMP packets dropped because they were out-of-window 96670 TCP sockets finished time wait in fast timer 86 time wait sockets recycled by time stamp 1007 packets rejects in established connections because of

展开阅读全文