多路徑環(huán)境下RHCS和GFS的timeout配置

作者：時(shí)間：2016-10-10 來(lái)源：網(wǎng)絡(luò )

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對面交流
  海量資料庫查詢(xún)

適用環(huán)境：Cluster or GFS on RHEL4 and later

本文引用地址：http://dyxdggzs.com/article/201610/306016.htm

故障現象：日志報錯

openais[3345]: [CMAN ] lost contact with quorum device

目前只要客戶(hù)有共享存儲，在部署Cluster和GFS的時(shí)候，都建議配置quorum disk。而上面這個(gè)報錯相信大家都不會(huì )陌生吧。這個(gè)問(wèn)題一般是因為qdisk進(jìn)程太長(cháng)時(shí)間沒(méi)有與cman/ais通信，超過(guò)了qdisk的poll投票時(shí)間，從而此節點(diǎn)被斷開(kāi)。特別是在配置了multipath、rdac等多路徑軟件環(huán)境中做鏈路失效切換測試時(shí)，由于failover的時(shí)間可能比較長(cháng)，造成鏈路切換之前qdisk就已經(jīng)丟失了，節點(diǎn)直接被reboot，而這當然不是我們所期待的結果。那怎么解決這個(gè)問(wèn)題呢?

先來(lái)了解幾個(gè)基本概念：

① 集群要認為一個(gè)節點(diǎn)健康，需要以下3要素

· CMAN認為該節點(diǎn)online

· 該節點(diǎn)能足夠連續的讀寫(xiě)quorum disk

· 該節點(diǎn)heuristic有足夠的score

② qdisk包括兩個(gè)主要線(xiàn)程：主線(xiàn)程負責循環(huán)和進(jìn)行I/O操作;第二線(xiàn)程負責heuristic相關(guān)。

主線(xiàn)程另一個(gè)工作就是每隔一段時(shí)間告訴cman/ais自己還活著(zhù)。如果qdisk超過(guò)quorum_dev_poll的時(shí)間而沒(méi)有和cman/ais通信，cman就會(huì )聲明說(shuō)此節點(diǎn)與quorum disk斷開(kāi)連接，此時(shí)日志便會(huì )有如上報錯。默認的cman.h里

#define DEFAULT_QUORUMDEV_POLL 10000

單位是ms，即10秒。修改quorum_dev_poll需要在cluster.conf文件里修改cman標簽：

cman quorum_dev_poll=50000>/cman>

③我們平時(shí)指的qdisk timeout是指連續一段時(shí)間對quorum disk的讀寫(xiě)都是失敗。假如cluster.conf里

quorumd device=/dev/sdb1 interval=3 min_score=2 tko=13 votes=2>

其中

interval=3

This is the frequency of read/write cycles, in seconds.讀寫(xiě)quorum disk的頻率

tko=13

This is the number of cycles a node must miss in order to be declared dead.連續失敗多少次則判定此節點(diǎn)死掉

qdisk_timeout = interval x tko

④再來(lái)看看RHEL5里cman timeout是怎么去配置的，

token

This timeout specifies in milliseconds until a token loss is declared after not receiving a token. This is the time spent detecting a failure of a processor in the current configuration. Reforming a new configuration takes about 50 milliseconds in addition to this timeout. The default is 1000 milliseconds. 連續多長(cháng)時(shí)間沒(méi)有收到token就判定令牌丟失。默認1秒，其中有50ms是生成一個(gè)新的配置的時(shí)間。

retransmits_before_loss

This value identifies how many token retransmits should be attempted before forming a new configuration. If this value is set, retransmit and hold will be automati- cally calculated from retransmits_before_loss and token. The default is 4 retransmissions. 連續丟失幾次token，才會(huì )生成新的cluster配置(將丟失token的節點(diǎn)踢出集群)。默認4次。

token_retransmit

This timeout specifies in milliseconds after how long before receiving a token the token is retransmitted. This will be automatically calculated if token is modi- fied. It is not recommended to alter this value without guidance from the openais community. The default is 238 milliseconds. 重發(fā)token的時(shí)間間隔，這個(gè)值是由上面的token和token_retransmit自動(dòng)計算的。(1000-50)/4≈238ms

如果出現上面說(shuō)的丟失心跳token的時(shí)候，日志會(huì )出現如下報錯：

openais[3345]: [TOTEM] The token was lost in the OPERATIONAL state.

注意單位為毫秒。另外，也可以修改cman的標簽：

注：RHEL4并未使用openais的架構，因此只能通過(guò)deadnode_timeout來(lái)修改。

好，有了前面的基礎，不難想象到各個(gè)timeout值，用T(*)表示，應有如下關(guān)系：

T(MPIO)

RH官方有如下建議：

T(qdisk) = T(MPIO) × 1.3

T(cman) = T(MPIO) × 2.7

參考文檔：

Red Hat Knowledgebase

、man page of