用普罗米修斯打造系统基础监控

普罗米修斯打造系统基础监控

开始

环境：

系统	软件	版本	作用
Centos 7.0	grafana	4.1.1	展示
Centos 7.0	node_exporter	latest	client
Centos 7.0	prometheus	1.4.1	收集node_exporter信息（核心程序）
Centos 7.0	alertmanager	0.5.1	报警

系统基础配置

配置时间与系统时区

yum install ntpdate -y
rm -rf /etc/localtime
cp -a /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ntpdate 0.cn.pool.ntp.org

prothemus配置

global:
  scrape_interval:     15s  # 全局配置默认抓取频率 默认是1分钟收集一次
  evaluation_interval: 15s 
  external_labels:
      monitor: 'codelab-monitor'
rule_files:
scrape_configs:
  - job_name: 'prometheus'

    scrape_interval: 5s  # 这里是覆盖上面全局抓取频率 
    static_configs:
      - targets: ['localhost:9090']

  - job_name: '系统监控'
    scrape_interval: 5s  # 这里是覆盖上面全局抓取频率 
    static_configs:
      - targets: ['192.168.159.141:8080','192.168.159.144:8080']  # 要抓取的主机node_exporter端口与ip
        labels:
          group: "test_minior" # 分组

grafana 配置

配置数据源

grafana配置

配置dashboard

导入1860仪表盘,最终效果如下图

grafana配置2

alertmanager 配置监控报警

alertmanager 配置需要和prometheus配置想配合来实现监控报警

alertmanager配置

global:
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: "test@163.com"
  smtp_auth_username: "test@163.com"
  smtp_auth_password: '123456'
route:
  group_by: [Alertname]
  receiver: email-me
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 1m
receivers:
- name: email-me
  email_configs:
  - to: "aaaaaaa@qq.com"

prometheus配置

yml配置文件添加以下代码

rule_files:
  - rules/alert.rules
alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "IP:9093"

启动目录创建rules目录并添加alert.rules。

以下为alert.rules的内容

ALERT cpu_threshold_exceeded
  IF (100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m])))) > 50
  ANNOTATIONS {
    summary = "服务器ip: {{ $labels.instance }} CPU 使用率过高",
    description = " {{ $labels.instance }} 服务器CPU使用率过高，当前值为： {{ $value }}.",
  }

ALERT service_down
    IF up == 0
    ANNOTATIONS {
      summary = "Instance {{ $labels.instance }} is down",
    }
ALERT system_disk_path
  IF (100 * (1 - (node_filesystem_free{device=~"^/dev/xvd[abc]1$",mountpoint="/"} / node_filesystem_size{device=~"^/dev/xvd[abc]1$",mountpoint="/"}))) > 70   根据自己的磁盘来进行修改相应的磁盘。
  ANNOTATIONS {
    summary = "服务器根目录使用率已达到70%以上，请尽快处理",
    description = " {{ $labels.instance }} 这台服务器剩余空间已不足，当前使用率为： {{ $value }}.",
  }

以上分别 CPU 服务器是否正在运行和磁盘使用情况报警

ok完成，其他一些监控还可以根据自己的需求添加，启动程序即可

以上,希望能给看文章你的带来一些帮助。

End