병목현상과 관련하여

주낙권 · ‎11-11-2008

병목현상과 관련하여 여쭙니다.

1. disk병목이라고 했을 때, 판단하는 기준치가 있을까요?

가령 `sar의 어느 필드가 일정부분이상 높을 때 disk병목이구나`

이런거요^^

2. Mirror가 구성되어 있는데 vxfsd라는 프로세스가 cpu를

자주 사용하더라구요. 제 생각에는 미러가 구성되어 있어서

sync하기 위해서 그러는거 같은데요

mirror가 구성되면 주기적으로 몇초?몇분?만에 sync를 하나요?

감사해요~

김병수 · ‎11-12-2008

Disk Bottlenecks

Disk Bottleneck Recipe Ingredients:

- Consistent high utilization on at least one disk device (GBL_DISK_UTIL_PEAK

or highest BYDSK_UTIL > 50%).

- Significant queuing lengths (GBL_DISK_SUBSYSTEM_QUEUE > 3 or any

BYDSK_REQUEST_QUEUE > 1).

- Processes or threads blocked on I/O wait reasons (PROC_STOP_REASON =

CACHE, DISK, IO).

Disk bottlenecks are easy to solve: Just recode all your programs to keep all their data

locked in memory all the time! Hey, memory is cheap! Sadly, this isn't always (say ever)

possible, so the next bestest alternative is to focus your disk tuning efforts on the I/O

hotspots. The perfect scenario for disk I/O is to spread the applications' I/O activity out

over as many different I/O cards, LUNs, and physical spindles as possible to maximize

overall throughput and avoid bottlenecks on any particular I/O path. Sadly, this isn't

always possible either because of the constraints of the application, downtime for

reconfigurations, etc.

To find the hotspots, use a performance tool that shows utilization on the different disk

devices. Both sar and iostat have by-disk information, as of course do Glance and

MeasureWare. We usually start by looking at historical data and focus on the disks that

are most heavily utilized at the specific times when there is a perceived problem with

performance. Using PerfView, you can draw a Class Compare graph of all disks using

the BYDSK_UTIL metric to see utilization trends, and use the BYDSK_REQUEST_QUEUE to

look for queuing. If you're not looking at the data from times when a problem occurs, you

may be tuning the wrong things! If a disk is busy over 50% of the time, and there's a

queue on the disk, then there's an opportunity to tune. Note that MeasureWare's metric

GBL_DISK_UTIL_PEAK is not an average, nor does it track just one disk over time. This

metric is showing you the utilization of the busiest disk of all the disks for a given

interval, and of course a different disk could be the busiest disk every interval. The other

useful global metric for disk bottlenecks is the GBL_DISK_SUBSYSTEM_QUEUE that shows

you the average number of processes blocked on wait reasons related to Disk I/O, similar

to how GBL_PRI_QUEUE works for CPU.

If your busiest disk is a swap device, then you have a memory bottleneck masquerading

as a disk bottleneck and you should address the memory issues first if possible. Also, see

the discussion above under System (Disk) Setup for optimizing swap device

configurations for performance.

Glance can be particularly useful if you can run it while a disk bottleneck is in progress,

because there are separate reports from the perspective of By-Filesystem, By-Logical

Volume, and By-Disk. You can also see the logical (read/write syscall) I/O versus

physical I/O breakdown as well as physical I/O split by type (Filesystem, Raw, Virtual

Memory (paging), and System (inode activity)). In Glance, you can sort the process list

on PROC_DISK_PHYS_IO_RATE, then select the processes doing most of the I/O and bring

up their list of open file descriptors that may help pinpoint the specific files that are

involved. The problem with all the system perftools is that the internals of the disk

hardware are opaque to them. You can have disk arrays that show up as a single "disk" in

the perftool, and specialized tools may be needed to analyze the internals of the array.

The disk array vendor is where you'd go for these tools.

Some general tips for improving disk I/O throughput include:

- Spread your disk I/O out as much as possible. It is better to keep 10 disks 10% busy

than one disk 100% busy. Try to spread busy filesystems (and/or logical volumes) out

across multiple physical disks.

- Avoid excessive logging. Different applications may have configuration controls that

you can manipulate. For VxFS, managing the intent log is important. For suggested

VxFS mount options, see the System Setup section above.

In most cases, a very few processes will be responsible for most of the I/O overhead on a

system. Watch for I/O “abuse”: applications that create huge numbers of files or ones

that do large numbers of opens/closes of scratch files. You can tell if this is a problem if

you see a lot of “System”-type I/O on a busy disk (BYDSK_SYSTEM_IO_RATE), or you see a

high volume and low hit rate on the Dynamic Name Lookup Cache (GBL_MEM_DNLC_HIT,

at the end of Glance’s Disk Report). To track things down, you can look for processes

doing lots of I/O and spending significant amounts of time in System CPU. If you catch

them live, drill down into Glance’s Process System Calls report to see what calls they’re

making. Unfortunately, unless you own the source code to the application (or the person

who does owes you a big favor), there is little you can do to correct inefficient I/O

programming.

그럼~~~

monoworld · ‎11-21-2008

흠.. 자료 감사합니다.

스크롤 압박이

대락 아래와 같은 현상이 나오면 병목이죠....

*High disk utilization

*Large disk queue length

*Large percentage of time waiting for disk I/O

*Large physical I/O rates

*Low buffer cache hit ratio

*Large run queue with idle CPU

특히 queue가 많으면 확실한 병목이죠...

사용률 높아도 잘빠지면 병목이라고 판단하기는

애매하지만 사용률 높고 큐 쌓이면 성능 저하 확실하죠.

범주

Company

Local Language

포럼

토론 게시판

포럼

토론 게시판

토론 게시판

포럼

토론 게시판

포럼

토론 게시판

포럼

포럼

토론 게시판

포럼

토론 게시판

포럼

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

커뮤니티

접촉

다른 HPE 사이트

토론 게시판

포럼

블로그

병목현상과 관련하여