Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Tuesday, February 17, 2009

vmstat - Virtual Memory Statistics

If someone asks me to check how Linux/UNIX system is performing now, the first thing I do is vmstat. A lot of people just check for CPU and memory utilization statistics in vmstat. But in reality, it gives more information than just CPU and memory information. In this posting, I am going to explain the detail of vmstat.
vmstat stands for virtual memory statistics; it collects and displays summary information about memory, processes, interrupts, paging, and block I/O information. By specifying the interval, it can be used to observe system activity interactively.
Most commonly people will use 2 numeric arguments in vmstat; the first is delay or sleep between updates and the second is how many updates you want to see before vmstat quits. Please note this is not the full syntax of vmstat and also it can vary between OSs. Please refer to your OS man page for more information.
To run vmstat with 7 updates, 10 seconds apart type
#vmstat 10 7
Please note, in some systems, reported metrics might be slightly different. The heading that I am writing now is reported in Oracle Linux (Unbreakable Oracle Linux)
Process Block: Provides details of the process which are waiting for something (it can be CPU, IO etc; can be potentially for any resource)
r  -->  Processes waiting for CPU. More the count we observe, more processes waiting to run. If we just observe a spike in the count, we shouldn’t treat them as bottlenecks. If the value is constantly high (most people treat 2 * CPU count as high), it hints that CPU is the bottleneck.
b  -->  Uninterruptible sleeping processes, also known as “blocked” processes. These processes are most likely waiting for I/O but could be for something else too
w  -->  number of processes that can be run but have been swapped out to the swap area. This parameter gives hint about the memory bottleneck. Please remember, only some system reports this count
Memory Block: Provides detailed memory statistics
Swpd  -->  Amount of virtual memory or swapped memory used
Free  -->  Amount of free physical memory (RAM)
Buff  -->  Amount of memory used as buffers. This memory is used to store file metadata such as i-nodes and data from raw block devices
Cache  -->  amount of physical memory used as a cache (Mostly cached file).
Note: Most of the systems report memory block value in KB. Remember I said most; not all. So check the man page.
 
Swap Block: Provided memory swap information
si  -->  Rate at which the memory is swapped back from the disk to the physical RAM
so  -->  Rate at which the memory is swapped out to the disk from physical RAM
Note: Most of the systems report swap block value in KB. Check man page
I/O Block: I/O related information
bi  -->  Rate at which the system reads the data from block devices (in blocks/sec)
bo  -->  Rate at which the system sends data to the block devices (in blocks/sec)
System Block: System information
in  -->  Number of interrupts received by the system per second
cs  -->  Rate of context switching in the process space (in number/sec)
CPU block: Most used CPU related information
Us  -->  Shows the percentage of CPU spent in user processes. Most of the user/application/database processes come under the user processes category
Sy  -->  Percentage of CPU used by system processes, such as all root/kernel processes
Id  -->  Percentage of free CPU
Wa  -->  Percentage spent in “waiting for I/O”
A lot of people have problems here. Some people say us + sy +id + wa=100 and some other says us + sy +id =100. I stick to second (I/O doesn’t consume CPU). 
Interpretation with respect to performance:
The first line of the output is an average of all the metrics since the system was restarted. So, ignore that line since it does not show the current status. The other lines show the metrics in real-time.
Ideally, r/b/w values under procs block with close to 0 or 0 itself. If one or value counter values are constantly reporting high values, it means that system may not have sufficient CPU or Memory or I/O bandwidth.
If the value of swpd under swap is too high and it keeps changing, it means that system is running short of memory.
The data under “io” indicates the flow of data to and from the disks. This shows how much disk activity is going on, which does not necessarily indicate some problem(obviously data has to go to disk in order to be persistent). If we see some large number under “proc” and then “b” column (processes being blocked) and high I/O, the issue could be an I/O contention.
Rule of thumb in Performance
Adding more CPU, Memory, or I/O bandwidth to the system is not the solution to the problem always; this is just postponing of the problem to the future and it can blow anytime. The real solution is to tune the application(every component in the architecture) as far as possible. Adding hardware capacity or buying powerful hardware should be the last option.
As usual, comments are always welcome.
-Thiru

Thanks to Anonymous for pointing out the issue in bi/bo.

Wednesday, February 4, 2009

Statspack

I have worked in databases including IBM's DB2 (infact I am at IBM certified DB2 DBA for version 8.1), Open source MYSQL, Oracle (From Version 8). As a performance tester, I used to wonder how Oracle along can provide such wide range of monitoring facility that no other vendor is able to match. From System wide monitoring using statspack/AWR to session level trace, Oracle's monitoring capability is amazing. This article I am going to cover couple of basic things about Oracle’s very own monitoring utility Statspack. Statspack is the build in tool and installation script comes along with the database itself. (No need to pay even an extra penny from your pocket). All you need is just to install the same and start using the same.

Quote from Oracle Database Documentation about statspack:
"The Statspack package is a set of SQL, PL/SQL, and SQL*Plus scripts that allow the collection, automation, storage, and viewing of performance data. Statspack stores the performance statistics permanently in Oracle tables, which can later be used for reporting and analysis. The data collected can be analyzed using Statspack reports, which includes an instance health and load summary page, high resource SQL statements, and the traditional wait events and initialization parameters."

Installation:
To invoke the Statspack setup, all you need to do is call the spcreate.sql script which is available in ORACLE_HOME/rdbms/admin. PERFSTAT user own all PL/SQL code, database objects including tables, sequence, constrains etc., In windows, you can login into Oracle user (using SQL* Plus) which has enough privilege to install and run the following 
%ORACLE_HOME%\rdbms\admin\spcreate
In Linux/Unix OS, run the following
$ORACLE_HOME/rdbms/admin/spcreate

During installation it will ask for PERFSTAT schema’s password (usually people use perfstat as password), permanent tables and temporary tablespace. SPCREATE.SQL install script in turn automatically calls the following scripts.
SPCUSR.SQL: Creates the PERFSTAT user and grants privileges required to collect the performance data from V$ tables.
SPCTAB.SQL: Creates the tables which are going to stores performance data.
SPCPKG.SQL: Creates the package required for monitoring, data purging, reporting

Installation script dumps errors (if any) into SPCUSR.LIS, SPCTAB.LIS, and SPCPKG.LIS output files. 

How it works:
Snapshot of database’s performance was taken, stored in the PERFSTAT tables and will be assigned a unique SNAP_ID for the INSTANCE. Typically we can take snapshots for every pre-defined interval and between snapshots, we can generate performance report. If instance is restarted between snapshots, then reports will be meaning less. Snapshot can be taken at various levels depending up level of monitoring data required. Snapshot level can range from 1 to 10. Default level will be 5. More the level, more performance data (consumes bit high resource also).

Taking snapshot
Login as PERFSTAT user or user which can has execute privilege on statspack package and call statspack.snap function. Few examples.

exec statspack.snap
exec statspack.snap(I_SNAP_LEVEL=>7)

Statspack report
Statspack report will give instance-wide statistics between two snapshots. Between snapshots, we can generate reports (but if instance is restarted between snapshot, report will not be meaningful). Just call SPREPORT.SQL and provide begin SNAP_ID, end SNAP_ID and report file name. 

Oracle 10g and above has come up with something (feature which combines Statspack & ADDM) called Automatic Workload Repository(AWR). But statspack is still supported.

Friday, January 23, 2009

Sun Management Center - Sun's Ways of monitoring


"Sun Management Center", a product from Sun Microsystems for monitoring the Spark and x86 hardware running Solaris and Linux. It provides in-depth monitoring and diagnostics of servers and its services. Sun MC is based on server-agent model.

Architecture
Sun Management Center software includes three component layers: console, server, and agent. The product is based on the manager and agent architecture:

Console layer: Console layer is the interface to end users. It exposes web, JWS and console interfaces. Mutiple user can access the same Sun MC at the same time.

Server layer: server is the core, which talks to Console layer and agent layer. It acts as the central repository and stores data(both historical and current). It includes the components such as configuration manager, event manager, topology manager etc., Sun Management 4.0 uses PostgreSQL(open source) db to store data whereas the previous version 3.6 uses Oracle to store data.

Agent layer: Agent layer monitors, gather information about the server/system in which it is deployed and it communicates from server using SNMP(modules are used for gathering monitoring data. Different modules are used for different purposes). Agent apart from monitoring, also has the cabability to manage the nodes. The agent uses rule (it will get from server layer) to raise the alarm if the rule is not met.


Modules: Modules are the components in agent layer responsible for monitoring. They can dynamically loaded, invoked, started, stopped and unisntalled in Sun MC. Kernal reader, file scanning, directory scanning, config reader, fault manager, print spooler, process monitoring are some of the modules.

Like nmon, Sun MC is free to download and use (you can pay and get support). Like Glance for monitoring HP machines, Sun MC can be used to monitor the Sun based systems. Next time when you are planning to do performance testing, tuning on Spack or x86 hardware running Solaris, try Sun MC.

Monday, December 22, 2008

Glance - HP Way of optimizing the system's performance

I have found a webcast which explains how to use HP glance for monitoring and tuning HP Unix systems. Click here for more information. 

Few things to notice
1. We need to buy license from HP (or you can use 60 day trial version)
2. It even supports monitoring AIX, Linux, Solaris apart from HP Unix.
3. Supports drill down to find out what is going on system

To me Glance looks like more powerful nmon. For those of you who are sick of monitoring systems using vmstat, sar, iostat etc  glance (and nmon) are good tools.

Cheers,
Thiru

Thursday, December 11, 2008

Remote monitoring Linux/unix servers from linux/unix server

Steps for Executing the command on remote host

  1. Create/update .netrc file in user home directory and add credentials of all the machines that we need to be accessed remotely. Format should be as follows

machine login password

 

  1. Execute “chmod 600 .netrc” so that only owner will have read/write access.
  2. To execute the command on remote host use the following command
rexec

To remote monitor
rexec remotehost vmstat 1 10

Wednesday, August 13, 2008

OS Monitoring

Monitoring a server isn't something you should do chaotically. You need to have a clear plan—a set of goals that you hope to achieve. Troubleshooting server performance problems is a key reason for monitoring. Not just to plot good looking graphs and show it to superiors. Without monitoring, tuning is almost an impossible activity.

Basically monitoring is done for 2 purposes
  1. Benchmark
  2. Tuning

While monitoring OS, following are the basic things need to be considered regardless of OS platform.

  1. CPU Utilization
  2. Memory, paging
  3. Throughput & retransmission statistics
  4. TCP statistics
  5. Disk statistics

Not all things need to be presented in report. Presenting 10-20 page report containing only relevant is always better than presenting a 100 page report (album). If you want to show your hard work, create annexure section and attached to it (I may not call it album here. After-all we are showing our hard work at relevant place. He he).

Linux Monitoring

Monitoring Linux based systems are not as complex as many people thing. All we need to know is just simple, basic shell scripting and few basic commands.

To know the version of Linux kernel
uname –a

vmstat command
vmstat reports information about processes, memory, paging, block IO, traps, and CPU activity.

Basic syntax
vmstat [[delay] count]

Example
vmstat 10 7

The first statistics that are printed are averaged over the system uptime.Don’t consider this unless it really make sense.

See man page for more information

iostat command
iostat displays kernel I/O statistics on terminal, device and cpu operations.

Basic syntax
iostat [[delay] count]

Example
Iostat 10 7

The first statistics that are printed are averaged over the system uptime.Don’t consider this unless it really make sense.

netstat command

netstat prints network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. There are a number of output formats, depending on the options for the information presented.

See man page for more information

Nmon utility

Quote from IBM
This free tool gives you a huge amount of information all on one screen. Even though IBM doesn't officially support the tool and you must use it at your own risk, you can get a wealth of performance statistics. Why use five or six tools when one free tool can give you everything you need?

The nmon tool runs on:

  1. AIX® 4.1.5, 4.2.0 , 4.3.2, and 4.3.3 (nmon Version 9a: This version is functionally established and will not be developed further.)
  2. AIX 5.1, 5.2, and 5.3 (nmon Version 10: This version now supports AIX 5.3 and POWER5™ processor-based machines, with SMT and shared CPU micro-partitions.
  3. Linux® SUSE SLES 9, Red Hat EL 3 and 4, Debian on pSeries® p5, and OpenPower™
    Linux SUSE, Red Hat, and many recent distributions on x86 (Intel and AMD in 32-bit mode)
  4. Linux SUSE and Red Hat on zSeries® or mainframe

Click here for more information.

Windows Monitoring
Windows provide huge set of monitoring counters to end users. Not all need to be monitored always, but all counters might be useful at one point or another. All windows OS related counters can be monitoring using perfmon (just goto run and type perfmon).


I am planning to write Step by step instruction on how to monitoring windows using perfmon later in as a separate blog.

Tuesday, July 15, 2008

Understanding Architecture

Most important phase of performance tuning is to understand Architecture of the System under testing. Without understanding the end to end architecture, performance tuning is like “searching needle in hay”.
Few basic components and its definitions
  1. Load balancer -> Technique to spread work between two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, throughput, or response time.
  2. Web server -> A computer program that is responsible for accepting HTTP requests from web clients, which are known as web browsers, and serving them HTTP responses along with optional data contents, which usually are web pages such as HTML documents and linked objects such as images, css etc.,(usually static content)
  3. LDAP server -> The Lightweight Directory Access Protocol, or LDAP, is an application protocol for querying and modifying directory services running over TCP/IP. The most common example is the telephone directory, which consists of a series of names (either of persons or organizations) organized alphabetically, with each name having an address and phone number attached. Due to this basic design (among other factors) LDAP is often used by other services for authentication(SSA).
  4. Application Server -> An application server is a software engine that delivers applications to client computers or devices, typically through the Internet and using the HyperText Transfer Protocol. Application servers are distinguished from web servers by the extensive use of server-side dynamic content and frequent integration with database engines. An application server handles most, if not all, of the business logic and data access of the application.
  5. Database server -> complex set of software programs that controls the organization, storage, management, and retrieval of data in a database.

During testing, monitoring and logging all the components + OS involved is mandatory. Without monitoring all components including network, improvement/tuning acheived is pure luck.

Next blog plan: monitoring

-Thiru