Ceph active+undersized+degraded

x2 pgmap v95: 64 pgs, 1 pools, 0 bytes data, 0 objects. 107 MB used, 22335 GB / 22335 GB avail. 64 active+undersized+degraded. $ ceph osd dump | grep 'replicated size'. pool 2 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool stripe_width 0. [[email protected] ~]# ceph osd map rbd test osdmap e426 pool 'rbd' (0) object 'test' -> pg 0.40e8aab5 (0.b5) -> up ([4], p4) acting ([4], p4) [[email protected] ~]# ceph -s cluster e4d48d99-6a00-4697-b0c5-4e9b3123e5a3 health HEALTH_WARN 96 pgs degraded 31 pgs stuck unclean 96 pgs undersized recovery 10/42 objects degraded (23.810%) 1/9 in osds are down ...I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn't found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes downNov 06, 2017 · 这里我们前往 ceph-2 节点,手动停止了 osd.4,然后查看此时 PG : 0.44的状态,可见,它此刻的状态是active+undersized+degraded,当一个 PG 所在的 OSD 挂掉之后,这个 PG 就会进入undersized+degraded 状态,而后面的[0,7]的意义就是还有两个 0.44 的副本存活在 osd.0 和 osd.7 上。那么 ... Active Undersized on new pool. I'm new to Ceph and wanted to tinker a bit at home. My cluster is comprised of 4 ODroid H2+, each with 2x16TB hard drives. I created an EC Pool with 4+2. I thought that would be safe, as I've got four devices, each with two OSDs. However, after the pool was created, my pool is in HEALTH_WARN.PG 异常状态- active+undersized+degraded. 自己搭的3个OSD节点的集群的健康状态经常处在"WARN"状态,replicas设置为3,OSD节点数量大于3,存放的data数量也不多,ceph -s 不是期待的health ok,而是active+undersized+degraded。. 被这个问题困扰有段时间,因为对Ceph不太了解而一直 ...这里使用的 ceph version 15.2.9,ceph-deploy 2.0.1. ceph安装前准备工作 1.升级系统内核到4系或以上. 我这里升级到了4.17,升级步骤此处省略。 2.firewalld、iptables、SElinux关闭Ceph日常运维管理 集群监控管理 集群整体运行状态 [[email protected] ~]# ceph -s cluster: id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd: 4 osds: 4 up (since 27h), 4 in (since 19h ...Ceph日常运维管理 集群监控管理 集群整体运行状态 [[email protected] ~]# ceph -s cluster: id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd: 4 osds: 4 up (since 27h), 4 in (since 19h ...Ceph is a clustered storage solution that can use any number of commodity servers and hard drives. These can then be made available as object, block or file system storage through a unified interface to your applications or servers.To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 lists.ceph.io. Sign In Sign Up Sign In Sign Up Manage this list × Keyboard Shortcuts. Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all ...Ceph is a clustered storage solution that can use any number of commodity servers and hard drives. These can then be made available as object, block or file system storage through a unified interface to your applications or servers.1 ceph health detail 2 HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 3 stuck undersized; 2 pgs undersized 4 pg 17.58 is stuck unclean for 61033.947719, current state 5 active+undersized+degraded, last acting [2,0] 6 pg 17.16 is stuck unclean for 61033.948201, current state 7 active+undersized+degraded, last acting ... ceph-deploy部署ceph集群 环境介绍 主机名 ip地址 操作系统 角色 备注 ceph-node1 10.153.204.13 Centos7.6 mon、osd、mds、mgr、rgw、ceph-deploy chronyd时钟同步(主) ceph-node2 10.130.22.45 ...title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。執行 ceph -s 發現有很多 pg 狀態卡住不動。. cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up ...Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.To: ceph-***@lists.ceph.com Subject: [ceph-users] active+undersized+degraded Hi, After managing to configure the osd server I created a pool "data" and removed pool "rbd" and now the cluster is stuck in active+undersized+degraded $ ceph status cluster 046b0180-dc3f-4846-924f-41d9729d48c8 health HEALTH_WARN 64 pgs degraded 64 pgs stuck uncleanpgmap v95: 64 pgs, 1 pools, 0 bytes data, 0 objects. 107 MB used, 22335 GB / 22335 GB avail. 64 active+undersized+degraded. $ ceph osd dump | grep 'replicated size'. pool 2 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool stripe_width 0. Hi all, I just installed a new Ceph Pacific Cluster with ceph-ansible which worked very well. But now one of the mon servers sends every 600s an email with this subject: ALERT localhost/trap: trap timeout and this body: Summary output : trap timeout Group : localhost Service : trap Time noticed : Wed Aug 11 21:06:16 2021 Secs until next alert : Members : localhost Detailed text (if any ...Ceph日常运维管理 集群监控管理 集群整体运行状态 [[email protected] ~]# ceph -s cluster: id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd: 4 osds: 4 up (since 27h), 4 in (since 19h ...New OSDs were added into an existing Ceph cluster and several of the placement groups failed to re-balance and recover. This lead the cluster to flagging a HEALTH_WARN state and several PGs are stuck in a degraded state. cluster xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx health HEALTH_WARN 2 pgs degraded 2 pgs stuck degraded 4 pgs stuck unclean 2 pgs stuck undersized 2 pgs undersized recovery 35 ...Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. [email protected]:~$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 120 GiB 120 GiB 278 MiB 278 MiB 0.23 TOTAL 120 GiB 120 GiB 278 MiB 278 MiB 0.23 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 38 GiB mypool 3 32 0 B 0 0 B 0 38 GiB cephfs-metadata 4 32 3.6 KiB 41 168 KiB 0 38 GiB cephfs-data 5 64 0 B 0 0 B 0 38 ...I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:# ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn’t found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes down Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. Jul 24, 2020 · 2、使用sudo ceph pg pgid mark_unfound_lost delete对卡住的部分直接删除; PG卡住在undersized+degraded. 这种情况常见在osd发生down和out之后,如果集群规模比较大(osd数量在1000以上),其中的一些磁盘默默地坏掉,其osd也默认被out掉,长时间不进行人为处理,就有可能出现这种情况 [email protected] 160 participants 182 discussions Start a n N ew thread Octopus 15.2.8 slow ops causing inactive PGs upon disk replacement by Justin Goetz ...ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn’t found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes down Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all threads ; j l: Jump to MailingList overview # ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... 459412 modified 2017-04-21 21:06:27. 1] host = node2 [osd. Osd - Scrub and Repair¶ Summary¶. 0 (wait for the cluster to rebalance):. 25 is stuck unclean since forever, current state active+undersized+degraded, last acting [1,0] pg 1. 0 1 ceph osd crush reweight osd. sudo reboot. To do that you can find the object by checking the pg directory ...0.3 9 0 9 9 0 36864 6 6 active+undersized+degraded+remapped 2014-10-17 16:02:23.710060 11'9 16:34 [2,1] 2 [2] 2 0'0 2014-10-17 16:01:48.864991 0'0 2014-10-17 16:01:48.864991 </pre> It should notice that the full_ratio has changed back to 0.8 but does not for some reasonSubject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. [[email protected] ceph]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.39067 root default -3 0.09769 host ceph-node01 0 hdd 0.09769 osd.0 down 1.00000 1.00000 -5 0.09769 host ceph-node02 1 hdd 0.09769 osd.1 up 1.00000 1.00000 -7 0.19530 host ceph-node03 2 hdd 0.19530 osd.2 up 1.00000 1.00000 [[email protected] ceph]# ...Yes, ceph health getting better. Since Yesterday: Code: # ceph health HEALTH_WARN 1987253/8010258 objects misplaced (24.809%); Degraded data redundancy: 970715/8010258 objects degraded (12.118%), 187 pgs degraded, 187 pgs undersized. less misplaced and degraded data.To: ceph-***@lists.ceph.com Subject: [ceph-users] active+undersized+degraded Hi, After managing to configure the osd server I created a pool "data" and removed pool "rbd" and now the cluster is stuck in active+undersized+degraded $ ceph status cluster 046b0180-dc3f-4846-924f-41d9729d48c8 health HEALTH_WARN 64 pgs degraded 64 pgs stuck uncleanTo stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 # ceph health detail HEALTH_WARN Degraded data redundancy: 7 pgs undersized PG_DEGRADED Degraded data redundancy: 7 pgs undersized pg 39.7 is stuck undersized for 1398599.590587, current state active+undersized+remapped, last acting [10,1] pg 39.1e is stuck undersized for 1398600.838131, current state active+undersized, last acting [1,10] pg 39 ...Feb 26, 2018 · When ceph has a status with pgs undersized (and degraded) it is effectively non-functional. All services that rely on it will also break. Both glance and ceph-radosgw were broken in my testing environment. ceph status cluster: id: 6547bd3e-1397-11e2-82e5-53567c8d32 dc health: HEALTH_WARN Reduced data availability: 132 pgs inactive Degraded data ... Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. May 07, 2020 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453 '2 20716:11343 ... ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn't found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes downFeb 26, 2018 · When ceph has a status with pgs undersized (and degraded) it is effectively non-functional. All services that rely on it will also break. Both glance and ceph-radosgw were broken in my testing environment. ceph status cluster: id: 6547bd3e-1397-11e2-82e5-53567c8d32 dc health: HEALTH_WARN Reduced data availability: 132 pgs inactive Degraded data ... $ ceph pg stat1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%)$ ceph pg dump | grep remappeddumped all13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10,23 ... lists.ceph.io. Sign In Sign Up Sign In Sign Up Manage this list × Keyboard Shortcuts. Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all ...To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 New OSDs were added into an existing Ceph cluster and several of the placement groups failed to re-balance and recover. This lead the cluster to flagging a HEALTH_WARN state and several PGs are stuck in a degraded state. cluster xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx health HEALTH_WARN 2 pgs degraded 2 pgs stuck degraded 4 pgs stuck unclean 2 pgs stuck undersized 2 pgs undersized recovery 35 ...$ ceph health detail HEALTH_WARN Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded PG_DEGRADED Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded pg 3.7f is active+undersized+degraded+remapped+backfilling, acting [21,29] 3.5.3 总结 title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. Yes, ceph health getting better. Since Yesterday: Code: # ceph health HEALTH_WARN 1987253/8010258 objects misplaced (24.809%); Degraded data redundancy: 970715/8010258 objects degraded (12.118%), 187 pgs degraded, 187 pgs undersized. less misplaced and degraded data.Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. cluster: id: 7912846f-a2bd-407d-8032-0bdb9adf2c50 health: HEALTH_WARN Degraded data redundancy: 139/651 objects degraded (21.352%), 29 pgs degraded services: mon: 1 daemons, quorum node01 (age 96m) mgr: node01(active, since 95m) mds: 1/1 daemons up osd: 4 osds: 4 up (since 3m), 3 in (since 16s); 17 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools ...ceph (luminous version ... 155 pgs unclean, 155 pgs degraded, 155 pgs undersized services: mon: 3 daemons, quorum cephsvr-128040,cephsvr-128214,cephsvr-128215 mgr ... I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:# ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... When ceph restores an OSD, performance may seem quite slow. This is due the default settings where ceph has quite conservative values depending on your application workload. Especially if you're running workloads with many small objects (files), the default values may seem too slow.cluster: id: 7912846f-a2bd-407d-8032-0bdb9adf2c50 health: HEALTH_WARN Degraded data redundancy: 139/651 objects degraded (21.352%), 29 pgs degraded services: mon: 1 daemons, quorum node01 (age 96m) mgr: node01(active, since 95m) mds: 1/1 daemons up osd: 4 osds: 4 up (since 3m), 3 in (since 16s); 17 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools ...# ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 Mar 29, 2022 · 第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。 多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略: Jul 18, 2017 · [[email protected] ~]# ceph -w .... 2017-07-27 10:59:05.779734 mon.0 [INF] pgmap v22116: 744 pgs: 70 active+undersized+degraded, 17 active+remapped, 657 active+clean; 1532 MB data, 5062 MB used, 544 GB / 549 GB avail; 41/1290 objects degraded (3.178%); 7/1290 objects misplaced (0.543%) 2017-07-27 10:59:37.916361 mon.0 [INF] pgmap v22117: 744 ... Jul 18, 2017 · [[email protected] ~]# ceph -w .... 2017-07-27 10:59:05.779734 mon.0 [INF] pgmap v22116: 744 pgs: 70 active+undersized+degraded, 17 active+remapped, 657 active+clean; 1532 MB data, 5062 MB used, 544 GB / 549 GB avail; 41/1290 objects degraded (3.178%); 7/1290 objects misplaced (0.543%) 2017-07-27 10:59:37.916361 mon.0 [INF] pgmap v22117: 744 ... ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn't found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes [email protected]:~$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 120 GiB 120 GiB 278 MiB 278 MiB 0.23 TOTAL 120 GiB 120 GiB 278 MiB 278 MiB 0.23 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 38 GiB mypool 3 32 0 B 0 0 B 0 38 GiB cephfs-metadata 4 32 3.6 KiB 41 168 KiB 0 38 GiB cephfs-data 5 64 0 B 0 0 B 0 38 ...Ceph is a clustered storage solution that can use any number of commodity servers and hard drives. These can then be made available as object, block or file system storage through a unified interface to your applications or servers.# ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... ceph (luminous version ... 155 pgs unclean, 155 pgs degraded, 155 pgs undersized services: mon: 3 daemons, quorum cephsvr-128040,cephsvr-128214,cephsvr-128215 mgr ... cluster: id: 7912846f-a2bd-407d-8032-0bdb9adf2c50 health: HEALTH_WARN Degraded data redundancy: 139/651 objects degraded (21.352%), 29 pgs degraded services: mon: 1 daemons, quorum node01 (age 96m) mgr: node01(active, since 95m) mds: 1/1 daemons up osd: 4 osds: 4 up (since 3m), 3 in (since 16s); 17 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools ...Hi all, I just installed a new Ceph Pacific Cluster with ceph-ansible which worked very well. But now one of the mon servers sends every 600s an email with this subject: ALERT localhost/trap: trap timeout and this body: Summary output : trap timeout Group : localhost Service : trap Time noticed : Wed Aug 11 21:06:16 2021 Secs until next alert : Members : localhost Detailed text (if any ...pgmap v95: 64 pgs, 1 pools, 0 bytes data, 0 objects. 107 MB used, 22335 GB / 22335 GB avail. 64 active+undersized+degraded. $ ceph osd dump | grep 'replicated size'. pool 2 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool stripe_width 0. To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available.Do ceph have high availability, I config 2 node like this. cluster: id: 07df97db-f315-4c78-9d2a-ab85007a1856 health: HEALTH_WARN Reduced data availability: 32 pgs inactive Degraded data redundancy: 374/590 objects degraded (63.390%), 18 pgs degraded, 32 pgs undersized services: mon: 2 daemons, quorum ceph1,ceph2 mgr: ceph1(active), standbys: ceph2 mds: mycephfs-1/1/1 up {0=ceph1=up:active}, 1 ...Feb 26, 2018 · When ceph has a status with pgs undersized (and degraded) it is effectively non-functional. All services that rely on it will also break. Both glance and ceph-radosgw were broken in my testing environment. ceph status cluster: id: 6547bd3e-1397-11e2-82e5-53567c8d32 dc health: HEALTH_WARN Reduced data availability: 132 pgs inactive Degraded data ... I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:Re: mon hang when run ceph -s command after execute "ceph osd in osd.<x>" command by Neha Ojha I think this bug has already been fixed in 14.2.22, and you are testing with 14.2.21.Do ceph have high availability, I config 2 node like this. cluster: id: 07df97db-f315-4c78-9d2a-ab85007a1856 health: HEALTH_WARN Reduced data availability: 32 pgs inactive Degraded data redundancy: 374/590 objects degraded (63.390%), 18 pgs degraded, 32 pgs undersized services: mon: 2 daemons, quorum ceph1,ceph2 mgr: ceph1(active), standbys: ceph2 mds: mycephfs-1/1/1 up {0=ceph1=up:active}, 1 ...title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn’t found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes down The scheduled RequestBackfill happens as expected 16251:2014-10-17 16:05:43.795615 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16287 ...Active Undersized on new pool. I'm new to Ceph and wanted to tinker a bit at home. My cluster is comprised of 4 ODroid H2+, each with 2x16TB hard drives. I created an EC Pool with 4+2. I thought that would be safe, as I've got four devices, each with two OSDs. However, after the pool was created, my pool is in HEALTH_WARN.For the most part this seemed to be working, but then I had 1 object degraded and 88xxx objects misplaced: # ceph health detail HEALTH_WARN 11 pgs stuck unclean; recovery 1/66089446 objects degraded (0.000%); recovery 88844/66089446 objects misplaced (0.134%) pg 2.e7f is stuck unclean for 88398.251351, current state active+remapped, last acting ... Jul 18, 2017 · [[email protected] ~]# ceph -w .... 2017-07-27 10:59:05.779734 mon.0 [INF] pgmap v22116: 744 pgs: 70 active+undersized+degraded, 17 active+remapped, 657 active+clean; 1532 MB data, 5062 MB used, 544 GB / 549 GB avail; 41/1290 objects degraded (3.178%); 7/1290 objects misplaced (0.543%) 2017-07-27 10:59:37.916361 mon.0 [INF] pgmap v22117: 744 ... Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all threads ; j l: Jump to MailingList overview Do ceph have high availability, I config 2 node like this. cluster: id: 07df97db-f315-4c78-9d2a-ab85007a1856 health: HEALTH_WARN Reduced data availability: 32 pgs inactive Degraded data redundancy: 374/590 objects degraded (63.390%), 18 pgs degraded, 32 pgs undersized services: mon: 2 daemons, quorum ceph1,ceph2 mgr: ceph1(active), standbys: ceph2 mds: mycephfs-1/1/1 up {0=ceph1=up:active}, 1 ...Yes, ceph health getting better. Since Yesterday: Code: # ceph health HEALTH_WARN 1987253/8010258 objects misplaced (24.809%); Degraded data redundancy: 970715/8010258 objects degraded (12.118%), 187 pgs degraded, 187 pgs undersized. less misplaced and degraded data.Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. 被这个问题困扰有段时间,因为对Ceph不太了解而一直没有找到解决方案,直到最近发邮件到社区才得到解决 [1]。 PG状态的含义 PG的非正常状态说明可以参考 [2], undersized 与 degraded 的含义记录于此: undersized The placement group has fewer copies than the configured pool replication level. degraded Ceph has not replicated some objects in the placement group the correct number of times yet.Degradation refers to that Ceph marks all PG on the OSD as Degraded after some failures such as OSD hang up. The degraded cluster can read and write data normally. The degraded PG is just a minor problem, not a serious problem. Undersized means that the current number of surviving PG copies is 2, which is less than 3.# ceph pg dump_stuck unclean ok pg_stat state up up_primary acting acting_primary 0.28 active+undersized+degraded [1,2] 1 [1,2] 1 0.27 active+undersized+degraded [1,2] 1 [1,2] 1 0.26 active+undersized+degraded [1,2] 1 [1,2] 1: ceph pg scrub {pg-id}, deep-scrub {pg-id}Ceph日常运维管理 集群监控管理 集群整体运行状态 [[email protected] ~]# ceph -s cluster: id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd: 4 osds: 4 up (since 27h), 4 in (since 19h ...Mar 29, 2022 · 第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。 多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略: $ ceph pg stat1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%)$ ceph pg dump | grep remappeddumped all13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10,23 ... title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。Dec 03, 2019 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 Jul 12, 2018 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... 執行 ceph -s 發現有很多 pg 狀態卡住不動。. cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up ...May 07, 2020 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453 '2 20716:11343 ... [[email protected] ceph]# ceph health detail HEALTH_WARN Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized PG_DEGRADED Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized pg 4.0 is stuck undersized for 782.927699, current state active+undersized+degraded, last ...The scheduled RequestBackfill happens as expected 16251:2014-10-17 16:05:43.795615 7f0aac930700 10 osd.1 pg_epoch: 16 pg[0.2( v 11'11 (11'5,11'11] local-les=16 n=11 ec=1 les/c 16/13 14/15/12) [1,2]/[1] r=0 lpr=15 pi=12-14/2 bft=2 crt=11'5 lcod 11'10 mlcod 0'0 active+undersized+degraded+remapped+backfill_toofull] handle_peering_event: epoch_sent: 16 epoch_requested: 16 RequestBackfill 16287 ...title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 It's possible create a Ceph cluster with 4 servers, which has differents disk sizes: Server A - 2x 4TB Server B, C - 2x 8TB Server D - 2x 4TB This is ok? Thanks--- ... active+undersized+degraded+remapped+backfill_wait, last acting [2,10] pg 21.6f is stuck undersized for 62453.277248, current stateJul 24, 2020 · 2、使用sudo ceph pg pgid mark_unfound_lost delete对卡住的部分直接删除; PG卡住在undersized+degraded. 这种情况常见在osd发生down和out之后,如果集群规模比较大(osd数量在1000以上),其中的一些磁盘默默地坏掉,其osd也默认被out掉,长时间不进行人为处理,就有可能出现这种情况 Mar 29, 2022 · 第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。 多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略: Dec 11, 2018 · # ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) # ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... 執行 ceph -s 發現有很多 pg 狀態卡住不動。. cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up ... [email protected] > ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded] 5.2 Placement groups never get clean # Edit source When you create a cluster and your cluster remains in active , active+remapped , or active+degraded status and never achieves an active+clean status, you likely have a problem with the configuration.To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. $ ceph health detail HEALTH_WARN Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded PG_DEGRADED Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded pg 3.7f is active+undersized+degraded+remapped+backfilling, acting [21,29] 3.5.3 总结 Apr 01, 2022 · [email protected]:~$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 120 GiB 120 GiB 278 MiB 278 MiB 0.23 TOTAL 120 GiB 120 GiB 278 MiB 278 MiB 0.23 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 38 GiB mypool 3 32 0 B 0 0 B 0 38 GiB cephfs-metadata 4 32 3.6 KiB 41 168 KiB 0 38 GiB cephfs-data 5 64 0 B 0 0 B 0 38 ... Oct 04, 2019 · 而PG处于active+degraded状态是因为一个OSD处于active状态但是这个OSD上的PG并没有保存所有的对象。 当一个OSDdown了,Ceph会将这个OSD上的PG都标记为降级。当这个挂掉的OSD重新上线之后,OSD们必须重新peer。 [email protected] 160 participants 182 discussions Start a n N ew thread Octopus 15.2.8 slow ops causing inactive PGs upon disk replacement by Justin Goetz ...May 07, 2019 · $ systemctl stop [email protected] b. 查看PG状态 $ bin/ceph pg stat 20 pgs: 20 active+undersized+degraded; 14512 kB data, 302 GB used, 6388 GB / 6691 GB avail; 12/36 objects degraded (33.333%) c. 查看集群监控状态 $ bin/ceph health detail Hi all, I just installed a new Ceph Pacific Cluster with ceph-ansible which worked very well. But now one of the mon servers sends every 600s an email with this subject: ALERT localhost/trap: trap timeout and this body: Summary output : trap timeout Group : localhost Service : trap Time noticed : Wed Aug 11 21:06:16 2021 Secs until next alert : Members : localhost Detailed text (if any [email protected] > ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded] 5.2 Placement groups never get clean # Edit source When you create a cluster and your cluster remains in active , active+remapped , or active+degraded status and never achieves an active+clean status, you likely have a problem with the configuration.I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:被这个问题困扰有段时间,因为对Ceph不太了解而一直没有找到解决方案,直到最近发邮件到社区才得到解决 [1]。 PG状态的含义 PG的非正常状态说明可以参考 [2], undersized 与 degraded 的含义记录于此: undersized The placement group has fewer copies than the configured pool replication level. degraded Ceph has not replicated some objects in the placement group the correct number of times yet.Ceph日常运维管理 集群监控管理 集群整体运行状态 [[email protected] ~]# ceph -s cluster: id: 8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd: 4 osds: 4 up (since 27h), 4 in (since 19h ... [[email protected] ~]# ceph -w | grep backfill 2017-06-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s ...Mar 29, 2022 · 第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。 多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略: Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. Oct 04, 2019 · 而PG处于active+degraded状态是因为一个OSD处于active状态但是这个OSD上的PG并没有保存所有的对象。 当一个OSDdown了,Ceph会将这个OSD上的PG都标记为降级。当这个挂掉的OSD重新上线之后,OSD们必须重新peer。 Jan 28, 2016 · [[email protected] ~] # ceph -s cluster: id: 240a5732-02e5-11eb-8f5a-000c2945a4b1 health: HEALTH_WARN Degraded data redundancy: 3972/11916 objects degraded (33.333%), 64 pgs degraded, 65 pgs undersized 65 pgs not deep-scrubbed in time 65 pgs not scrubbed in time services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 8d) mgr: ceph02.zopypt (active ... Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. ceph (luminous version ... 155 pgs unclean, 155 pgs degraded, 155 pgs undersized services: mon: 3 daemons, quorum cephsvr-128040,cephsvr-128214,cephsvr-128215 mgr ... [[email protected] ~]# ceph -w | grep backfill 2017-06-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s ...degraded Ceph has not replicated some objects in the placement group the correct number of times yet. inconsistent Ceph detects inconsistencies in the one or more replicas of an object in the placement group (e.g. objects are the wrong size, objects are missing from one replica after recovery finished, etc.). peering这里使用的 ceph version 15.2.9,ceph-deploy 2.0.1. ceph安装前准备工作 1.升级系统内核到4系或以上. 我这里升级到了4.17,升级步骤此处省略。 2.firewalld、iptables、SElinux关闭title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 Re: mon hang when run ceph -s command after execute "ceph osd in osd.<x>" command by Neha Ojha I think this bug has already been fixed in 14.2.22, and you are testing with 14.2.21.May 07, 2019 · $ systemctl stop [email protected] b. 查看PG状态 $ bin/ceph pg stat 20 pgs: 20 active+undersized+degraded; 14512 kB data, 302 GB used, 6388 GB / 6691 GB avail; 12/36 objects degraded (33.333%) c. 查看集群监控状态 $ bin/ceph health detail Mar 29, 2018 · PG很快恢复到active+undersized+degraded,并删除在新节点生成的数据。 ceph osd pool set volumes crush_ruleset 3: PG 很快变为 active+remapped。 ceph osd pool set volumes crush_ruleset2: PG很快恢复到 active+clean: ceph osd pool set volumes size 2: PG 很快变为 active+clean,并且后台删除旧节点数据。 ceph ... lists.ceph.io. Sign In Sign Up Sign In Sign Up Manage this list × Keyboard Shortcuts. Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all ...cluster: id: 7912846f-a2bd-407d-8032-0bdb9adf2c50 health: HEALTH_WARN Degraded data redundancy: 139/651 objects degraded (21.352%), 29 pgs degraded services: mon: 1 daemons, quorum node01 (age 96m) mgr: node01(active, since 95m) mds: 1/1 daemons up osd: 4 osds: 4 up (since 3m), 3 in (since 16s); 17 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools ...PG 异常状态- active+undersized+degraded. 自己搭的3个OSD节点的集群的健康状态经常处在"WARN"状态,replicas设置为3,OSD节点数量大于3,存放的data数量也不多,ceph -s 不是期待的health ok,而是active+undersized+degraded。. 被这个问题困扰有段时间,因为对Ceph不太了解而一直 ...To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 PG_DEGRADED Degraded data redundancy: 12 pgs undersized pg 2.1d is stuck undersized for 115.728186, current state active+undersized, last acting [3,7] pg 2.22 is stuck undersized for 115.737825, current state active+undersized, last acting [6,3] pg 2.29 is stuck undersized for 115.736686, current state active+undersized, last acting [6,5]title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 Dec 17, 2015 · cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up:active} osdmap e92: 7 osds: 7 up, 7 in pgmap v685: 832 ... Thread View. j: Next unread message ; k: Previous unread message ; j a: Jump to all threads ; j l: Jump to MailingList overview When ceph restores an OSD, performance may seem quite slow. This is due the default settings where ceph has quite conservative values depending on your application workload. Especially if you're running workloads with many small objects (files), the default values may seem too slow.Apr 01, 2022 · [email protected]:~$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 120 GiB 120 GiB 278 MiB 278 MiB 0.23 TOTAL 120 GiB 120 GiB 278 MiB 278 MiB 0.23 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 38 GiB mypool 3 32 0 B 0 0 B 0 38 GiB cephfs-metadata 4 32 3.6 KiB 41 168 KiB 0 38 GiB cephfs-data 5 64 0 B 0 0 B 0 38 ... # ceph pg dump_stuck unclean ok pg_stat state up up_primary acting acting_primary 0.28 active+undersized+degraded [1,2] 1 [1,2] 1 0.27 active+undersized+degraded [1,2] 1 [1,2] 1 0.26 active+undersized+degraded [1,2] 1 [1,2] 1: ceph pg scrub {pg-id}, deep-scrub {pg-id}PG 异常状态- active+undersized+degraded. 自己搭的3个OSD节点的集群的健康状态经常处在"WARN"状态,replicas设置为3,OSD节点数量大于3,存放的data数量也不多,ceph -s 不是期待的health ok,而是active+undersized+degraded。. 被这个问题困扰有段时间,因为对Ceph不太了解而一直 ...I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean. I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.. Detail messages follow:Mar 29, 2022 · 第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。 多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略: [email protected]:~$ ceph -s cluster: id: ce766f84-6dde-4ba0-9c57-ddb62431f1cd health: HEALTH_WARN Degraded data redundancy: 6/682 objects degraded (0.880%), 5 pgs degraded, 32 pgs undersized services: mon: 2 daemons, quorum testbed-node-0,testbed-node-1 (age 61m) mgr: testbed-node-1(active, since 60m), standbys: testbed-node- mds: cephfs:1 {0=testbed-node-1=up:active} 1 up:standby osd [email protected] 160 participants 182 discussions Start a n N ew thread Octopus 15.2.8 slow ops causing inactive PGs upon disk replacement by Justin Goetz ...[[email protected] ~]# ceph -w | grep backfill 2017-06-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s ...Dec 17, 2015 · cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up:active} osdmap e92: 7 osds: 7 up, 7 in pgmap v685: 832 ... Dec 11, 2018 · # ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) # ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... PG 异常状态- active+undersized+degraded. 自己搭的3个OSD节点的集群的健康状态经常处在"WARN"状态,replicas设置为3,OSD节点数量大于3,存放的data数量也不多,ceph -s 不是期待的health ok,而是active+undersized+degraded。. 被这个问题困扰有段时间,因为对Ceph不太了解而一直 ...執行 ceph -s 發現有很多 pg 狀態卡住不動。. cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up ...May 07, 2020 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453 '2 20716:11343 ... $ ceph health detail HEALTH_WARN Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded PG_DEGRADED Degraded data redundancy: 6/57927 objects degraded (0.010%), 1 pg unclean, 1 pg degraded pg 3.7f is active+undersized+degraded+remapped+backfilling, acting [21,29] 3.5.3 总结 To: ceph-***@lists.ceph.com Subject: [ceph-users] active+undersized+degraded Hi, After managing to configure the osd server I created a pool "data" and removed pool "rbd" and now the cluster is stuck in active+undersized+degraded $ ceph status cluster 046b0180-dc3f-4846-924f-41d9729d48c8 health HEALTH_WARN 64 pgs degraded 64 pgs stuck uncleanSubject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.Yes, ceph health getting better. Since Yesterday: Code: # ceph health HEALTH_WARN 1987253/8010258 objects misplaced (24.809%); Degraded data redundancy: 970715/8010258 objects degraded (12.118%), 187 pgs degraded, 187 pgs undersized. less misplaced and degraded data.Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. 0.3 9 0 9 9 0 36864 6 6 active+undersized+degraded+remapped 2014-10-17 16:02:23.710060 11'9 16:34 [2,1] 2 [2] 2 0'0 2014-10-17 16:01:48.864991 0'0 2014-10-17 16:01:48.864991 </pre> It should notice that the full_ratio has changed back to 0.8 but does not for some reason被这个问题困扰有段时间,因为对Ceph不太了解而一直没有找到解决方案,直到最近发邮件到社区才得到解决 [1]。 PG状态的含义 PG的非正常状态说明可以参考 [2], undersized 与 degraded 的含义记录于此: undersized The placement group has fewer copies than the configured pool replication level. degraded Ceph has not replicated some objects in the placement group the correct number of times yet.執行 ceph -s 發現有很多 pg 狀態卡住不動。. cluster 1a1d374a-c6e9-48cb-9b45-525a6fdaa91e health HEALTH_WARN 64 pgs degraded 64 pgs stale 64 pgs stuck degraded 64 pgs stuck stale 64 pgs stuck unclean 64 pgs stuck undersized 64 pgs undersized monmap e1: 1 mons at {twin-storage-01=172.16.91.1:6789/0} election epoch 2, quorum 0 twin-storage-01 mdsmap e5: 1/1/1 up {0=twin-storage-01=up ...$ ceph pg stat1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%)$ ceph pg dump | grep remappeddumped all13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10,23 ... Re: mon hang when run ceph -s command after execute "ceph osd in osd.<x>" command by Neha Ojha I think this bug has already been fixed in 14.2.22, and you are testing with 14.2.21.[[email protected] ceph]# ceph health detail HEALTH_WARN Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized PG_DEGRADED Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized pg 4.0 is stuck undersized for 782.927699, current state active+undersized+degraded, last ...Jul 12, 2018 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... The reason for this is for ceph cluster to account for a full host failure (12osds). All osds have the same storage space and same storage class (hdd). # ceph osd erasure-code-profile get hdd_k22_m14_osd crush-device-class=hdd crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=22 m=14 plugin=jerasure technique=reed ...Dec 03, 2019 · $ ceph pg stat 1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%) $ ceph pg dump | grep remapped dumped all 13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10 ... 0.3 9 0 9 9 0 36864 6 6 active+undersized+degraded+remapped 2014-10-17 16:02:23.710060 11'9 16:34 [2,1] 2 [2] 2 0'0 2014-10-17 16:01:48.864991 0'0 2014-10-17 16:01:48.864991 </pre> It should notice that the full_ratio has changed back to 0.8 but does not for some reasonFeb 26, 2018 · When ceph has a status with pgs undersized (and degraded) it is effectively non-functional. All services that rely on it will also break. Both glance and ceph-radosgw were broken in my testing environment. ceph status cluster: id: 6547bd3e-1397-11e2-82e5-53567c8d32 dc health: HEALTH_WARN Reduced data availability: 132 pgs inactive Degraded data ... To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 Nov 05, 2014 · pg 6.0 says [1] is acting and [1] is up. It seems strange that different PGs have a different idea of which OSDs are up. The web site hasn't been updated to state what "undersized" means, but the code says that means there are fewer replicas than requested. That's different from degraded, which is when not all of the replicas are available. # ceph health detail HEALTH_WARN Degraded data redundancy: 7 pgs undersized PG_DEGRADED Degraded data redundancy: 7 pgs undersized pg 39.7 is stuck undersized for 1398599.590587, current state active+undersized+remapped, last acting [10,1] pg 39.1e is stuck undersized for 1398600.838131, current state active+undersized, last acting [1,10] pg 39 ...ceph health detail HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%) pg 2.4 is active+degraded, 78 unfound This means that the storage cluster knows that some objects (or newer copies of existing objects) exist, but it hasn't found copies of them. One example of how this might come about for a PG whose data is on ceph-osds 1 and 2: 1 goes downOct 04, 2019 · 而PG处于active+degraded状态是因为一个OSD处于active状态但是这个OSD上的PG并没有保存所有的对象。 当一个OSDdown了,Ceph会将这个OSD上的PG都标记为降级。当这个挂掉的OSD重新上线之后,OSD们必须重新peer。 The reason for this is for ceph cluster to account for a full host failure (12osds). All osds have the same storage space and same storage class (hdd). # ceph osd erasure-code-profile get hdd_k22_m14_osd crush-device-class=hdd crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=22 m=14 plugin=jerasure technique=reed ...Degraded data redundancy: 97087/326349843 objects degraded (0.030%), 1092 pgs degraded, 1048 pgs undersized 4 daemons have recently crashed 1 slow ops, oldest one blocked for 1916198 sec, mon.ceph-osd-140 has slow ops [[email protected] ceph]# ceph health detail HEALTH_WARN Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized PG_DEGRADED Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized pg 4.0 is stuck undersized for 782.927699, current state active+undersized+degraded, last ...Jan 06, 2021 · # ceph health detail HEALTH_WARN Degraded data redundancy: 7 pgs undersized PG_DEGRADED Degraded data redundancy: 7 pgs undersized pg 39.7 is stuck undersized for 1398599.590587, current state active+undersized+remapped, last acting [10,1] pg 39.1e is stuck undersized for 1398600.838131, current state active+undersized, last acting [1,10] pg 39.2d is stuck undersized for 1398600.848232, current state active+undersized, last acting [10,1] pg 39.58 is stuck undersized for 1398600.850871 ... title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。ceph (luminous version ... 155 pgs unclean, 155 pgs degraded, 155 pgs undersized services: mon: 3 daemons, quorum cephsvr-128040,cephsvr-128214,cephsvr-128215 mgr ... Each has a Monitor, Manager and Metadata service running successfully. Prior to creating the cephFS, all was good and green! As soon as I created a CephFS and added it as storage, I began to get the yellow exclamation mark and athe following notice: <Degraded data redundancy: 22/66 objects degraded (33.333%), 13 pgs degraded, 160 pgs undersized>.May 07, 2019 · $ systemctl stop [email protected] b. 查看PG状态 $ bin/ceph pg stat 20 pgs: 20 active+undersized+degraded; 14512 kB data, 302 GB used, 6388 GB / 6691 GB avail; 12/36 objects degraded (33.333%) c. 查看集群监控状态 $ bin/ceph health detail Jul 24, 2020 · 2、使用sudo ceph pg pgid mark_unfound_lost delete对卡住的部分直接删除; PG卡住在undersized+degraded. 这种情况常见在osd发生down和out之后,如果集群规模比较大(osd数量在1000以上),其中的一些磁盘默默地坏掉,其osd也默认被out掉,长时间不进行人为处理,就有可能出现这种情况 [[email protected] ceph]# ceph health detail HEALTH_WARN Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized PG_DEGRADED Degraded data redundancy: 183/4017 objects degraded (4.556%), 15 pgs degraded, 16 pgs undersized pg 4.0 is stuck undersized for 782.927699, current state active+undersized+degraded, last ...ceph-deploy部署ceph集群 环境介绍 主机名 ip地址 操作系统 角色 备注 ceph-node1 10.153.204.13 Centos7.6 mon、osd、mds、mgr、rgw、ceph-deploy chronyd时钟同步(主) ceph-node2 10.130.22.45 ...[[email protected] ~]# ceph -w | grep backfill 2017-06-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s ...[[email protected] ~]# ceph osd map rbd test osdmap e426 pool 'rbd' (0) object 'test' -> pg 0.40e8aab5 (0.b5) -> up ([4], p4) acting ([4], p4) [[email protected] ~]# ceph -s cluster e4d48d99-6a00-4697-b0c5-4e9b3123e5a3 health HEALTH_WARN 96 pgs degraded 31 pgs stuck unclean 96 pgs undersized recovery 10/42 objects degraded (23.810%) 1/9 in osds are down ...Degraded data redundancy: 97087/326349843 objects degraded (0.030%), 1092 pgs degraded, 1048 pgs undersized 4 daemons have recently crashed 1 slow ops, oldest one blocked for 1916198 sec, mon.ceph-osd-140 has slow ops 我尝试通过默认集群名来避免 添加Ceph OSDs ... 1 pg inactive Degraded data ... 1.0 GiB used, 651 GiB / 652 GiB avail pgs: 100.000 % pgs not active 1 ... # ceph -s cluster: id:8230a918-a0de-4784-9ab8-cd2a2b8671d0 health: HEALTH_WARN application not enabled on 1 pool(s) services: mon:3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 27h) mgr: cephnode01(active, since 53m), standbys: cephnode03, cephnode02 osd:4 osds:4 up (since 27h),4in(since 19h) rgw:1 daemon active (cephnode01) data ... Apr 01, 2022 · [email protected]:~$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 120 GiB 120 GiB 278 MiB 278 MiB 0.23 TOTAL 120 GiB 120 GiB 278 MiB 278 MiB 0.23 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 38 GiB mypool 3 32 0 B 0 0 B 0 38 GiB cephfs-metadata 4 32 3.6 KiB 41 168 KiB 0 38 GiB cephfs-data 5 64 0 B 0 0 B 0 38 ... New OSDs were added into an existing Ceph cluster and several of the placement groups failed to re-balance and recover. This lead the cluster to flagging a HEALTH_WARN state and several PGs are stuck in a degraded state. cluster xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx health HEALTH_WARN 2 pgs degraded 2 pgs stuck degraded 4 pgs stuck unclean 2 pgs stuck undersized 2 pgs undersized recovery 35 ...title: 记一次节点重启osd启动失败分析 date: 2018-08-30 12:44:10 tags: ceph; categories: ceph 前言. 因机房要断电维护,通知需要把所有机器都关机。 Re: mon hang when run ceph -s command after execute "ceph osd in osd.<x>" command by Neha Ojha I think this bug has already been fixed in 14.2.22, and you are testing with 14.2.21.ceph-deploy部署ceph集群 环境介绍 主机名 ip地址 操作系统 角色 备注 ceph-node1 10.153.204.13 Centos7.6 mon、osd、mds、mgr、rgw、ceph-deploy chronyd时钟同步(主) ceph-node2 10.130.22.45 ...$ ceph pg stat1416 pgs: 6 active+clean+remapped, 1288 active+clean, 3 stale+active+clean, 119 active+undersized+degraded; 74940 MB data, 250 GB used, 185 TB / 185 TB avail; 1292/48152 objects degraded (2.683%)$ ceph pg dump | grep remappeddumped all13.cd 0 0 0 0 0 0 2 2 active+clean+remapped 2018-07-03 20:26:14.478665 9453'2 20716:11343 [10,23 ... To stop a specific daemon instance on a Ceph Node, execute one of the following: sudo systemctl stop [email protected]{id} sudo systemctl stop [email protected]{hostname} sudo systemctl stop [email protected]{hostname} For example: sudo systemctl stop [email protected] sudo systemctl stop [email protected] sudo systemctl stop [email protected] Ceph 组件服务日志 pgmap v95: 64 pgs, 1 pools, 0 bytes data, 0 objects. 107 MB used, 22335 GB / 22335 GB avail. 64 active+undersized+degraded. $ ceph osd dump | grep 'replicated size'. pool 2 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool stripe_width 0.