Entire proxmox Xen Node has grey question marks + database container gone The 2019 Stack Overflow Developer Survey Results Are InSuddenly mysql/mariadb won't startConfiguring nagios notification settings to be very frequentProxmox: Node's load at 70, 2 containers are not responding, but top shows nothingTrying to recover MySQL database from a crashed server and I keep getting “mysqldump couldn't execute 'show create table doesn't exist (1146)”How do I migrate Proxmox 3.x openVZ containers to Proxmox 4.x LXC?Monitoring Proxmox LXC with ZabbixUsing NFS Mounts as Long Term Container Storageexecstop mysql error on stop and restartConfiguring MySQL to Listen for Remote Connections on UbuntuSystemctl doesn't see a service running

What force causes entropy to increase?

Is there a way to generate uniformly distributed points on a sphere from a fixed amount of random real numbers per point?

Lightning List Vew Search Not Returning Expected Results

Hello, Goodbye, Adios, Aloha

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?

Using `min_active_rowversion` for global temporary tables

Guaranteed memory layout for standard layout struct with a single array member of primitive type

Am I ethically obligated to go into work on an off day if the reason is sudden?

How to read αἱμύλιος or when to aspirate

Variable with quotation marks "$()"

Why doesn't shell automatically fix "useless use of cat"?

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

Likelihood that a superbug or lethal virus could come from a landfill

What do I do when my TA workload is more than expected?

How to make Illustrator type tool selection automatically adapt with text length

The repository 'http://dl.google.com/linux/chrome/deb stable Release' is not signed

Does adding complexity mean a more secure cipher?

How do you keep chess fun when your opponent constantly beats you?

How can I define good in a religion that claims no moral authority?

Does Parliament need to approve the new Brexit delay to 31 October 2019?

How to quickly solve partial fractions equation?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Why can't wing-mounted spoilers be used to steepen approaches?

Are spiders unable to hurt humans, especially very small spiders?

Entire proxmox Xen Node has grey question marks + database container gone

The 2019 Stack Overflow Developer Survey Results Are InSuddenly mysql/mariadb won't startConfiguring nagios notification settings to be very frequentProxmox: Node's load at 70, 2 containers are not responding, but top shows nothingTrying to recover MySQL database from a crashed server and I keep getting “mysqldump couldn't execute 'show create table doesn't exist (1146)”How do I migrate Proxmox 3.x openVZ containers to Proxmox 4.x LXC?Monitoring Proxmox LXC with ZabbixUsing NFS Mounts as Long Term Container Storageexecstop mysql error on stop and restartConfiguring MySQL to Listen for Remote Connections on UbuntuSystemctl doesn't see a service running

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

Firstly, i've recently taken on the management of a proxmox cluster which I have had no experience managing previously (i'm completely new to cluster management, but not too bad at linux).

pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-6-pve)

I have 2 xen nodes which run a number of containers and VMs within them. Yesterday, a container on Xen2, which runs a mysql database, stopped responding. I was able to log in to the container via ssh and attempted to restart mysql only to receive an error along the lines that it was unable to connect to the mysql.sock. So I decided to simply shutdown the container and start it back up. I chose 'shutdown' in proxmox UI for the container, which it then shutdown. Then I clicked 'start', in which proxmox logs recorded:

CT 110 - Start ERROR: command 'systemctl start pve-container@110' failed: exit code 1

So, I've tried running the 'system start ...' via ssh. It takes a while, and then I get the following:

Job for pve-container@110.service failed because a timeout was exceeded.
See "systemctl status pve-container@110.service" and "journalctl -xe" for details.

Here is the output of 'systemctl status ...':

● pve-container@110.service - PVE LXC Container: 110
 Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
 Active: failed (Result: timeout) since Thu 2018-06-07 08:35:22 BST; 43s ago
 Docs: man:lxc-start
 man:lxc
 man:pct
 Process: 1603366 ExecStart=/usr/bin/lxc-start -n 110 (code=killed, signal=TERM)
 Tasks: 1 (limit: 4915)
 CGroup: /system.slice/system-pvex2dcontainer.slice/pve-container@110.service
 └─1532500 [lxc monitor] /var/lib/lxc 110

Jun 07 08:33:52 xen2 systemd[1]: Starting PVE LXC Container: 110...
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

and 'journalctl -xe':

Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
-- Subject: Unit pve-container@110.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@110.service has failed.
--
-- The result is failed.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

Shortly after attempting to restart the container the first time, the entire xen2 node started displaying grey questions marks along side all it's VM/containers and they lost their labels (see screenshot):

enter image description here

Despite this, all the other VMs/Containers within xen2 are still functioning fine. So, I've then decided to run the following commands to see what would happen:

service pvedaemon restart (nothing changed)
service pveproxy restart (nothing changed)
service pvestatd restart (The VMs started showing names within proxmox UI (but not containers), but this only lasted 10-15 minutes)

I'm hesitant to upgrade or restart the entire xen node due to the unknown side of configuration and what potential pitfalls may lie ahead and that its business critical to have at least something running. Furthermore, i've ran through /var/log/syslog and didn't see anything that indicated why the container crashed.

Ideally, I want to achieve:
Determine why the database container crashed (110)
Successfully start up the database container again
Determine why the xen2 node isn't reporting data to the UI about it's VM/Containers
Fix the reporting data in the UI for the node
Again, please appreciate i'm new to proxmox, but I do know my away around linux.

Thank you for any tips/knowledge on troubleshooting this problem. If there is any other info you'd like me to share, please let me know.

Cheers,
David

asked Jun 7 '18 at 8:08

David

2641822

add a comment |

Firstly, i've recently taken on the management of a proxmox cluster which I have had no experience managing previously (i'm completely new to cluster management, but not too bad at linux).

pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-6-pve)

CT 110 - Start ERROR: command 'systemctl start pve-container@110' failed: exit code 1

So, I've tried running the 'system start ...' via ssh. It takes a while, and then I get the following:

Job for pve-container@110.service failed because a timeout was exceeded.
See "systemctl status pve-container@110.service" and "journalctl -xe" for details.

Here is the output of 'systemctl status ...':

● pve-container@110.service - PVE LXC Container: 110
 Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
 Active: failed (Result: timeout) since Thu 2018-06-07 08:35:22 BST; 43s ago
 Docs: man:lxc-start
 man:lxc
 man:pct
 Process: 1603366 ExecStart=/usr/bin/lxc-start -n 110 (code=killed, signal=TERM)
 Tasks: 1 (limit: 4915)
 CGroup: /system.slice/system-pvex2dcontainer.slice/pve-container@110.service
 └─1532500 [lxc monitor] /var/lib/lxc 110

Jun 07 08:33:52 xen2 systemd[1]: Starting PVE LXC Container: 110...
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

and 'journalctl -xe':

Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
-- Subject: Unit pve-container@110.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@110.service has failed.
--
-- The result is failed.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

enter image description here

Despite this, all the other VMs/Containers within xen2 are still functioning fine. So, I've then decided to run the following commands to see what would happen:

Thank you for any tips/knowledge on troubleshooting this problem. If there is any other info you'd like me to share, please let me know.

Cheers,
David

asked Jun 7 '18 at 8:08

David

2641822

add a comment |

Firstly, i've recently taken on the management of a proxmox cluster which I have had no experience managing previously (i'm completely new to cluster management, but not too bad at linux).

pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-6-pve)

CT 110 - Start ERROR: command 'systemctl start pve-container@110' failed: exit code 1

So, I've tried running the 'system start ...' via ssh. It takes a while, and then I get the following:

Job for pve-container@110.service failed because a timeout was exceeded.
See "systemctl status pve-container@110.service" and "journalctl -xe" for details.

Here is the output of 'systemctl status ...':

● pve-container@110.service - PVE LXC Container: 110
 Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
 Active: failed (Result: timeout) since Thu 2018-06-07 08:35:22 BST; 43s ago
 Docs: man:lxc-start
 man:lxc
 man:pct
 Process: 1603366 ExecStart=/usr/bin/lxc-start -n 110 (code=killed, signal=TERM)
 Tasks: 1 (limit: 4915)
 CGroup: /system.slice/system-pvex2dcontainer.slice/pve-container@110.service
 └─1532500 [lxc monitor] /var/lib/lxc 110

Jun 07 08:33:52 xen2 systemd[1]: Starting PVE LXC Container: 110...
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

and 'journalctl -xe':

Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
-- Subject: Unit pve-container@110.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@110.service has failed.
--
-- The result is failed.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

enter image description here

Despite this, all the other VMs/Containers within xen2 are still functioning fine. So, I've then decided to run the following commands to see what would happen:

Thank you for any tips/knowledge on troubleshooting this problem. If there is any other info you'd like me to share, please let me know.

Cheers,
David

asked Jun 7 '18 at 8:08

David

2641822

Firstly, i've recently taken on the management of a proxmox cluster which I have had no experience managing previously (i'm completely new to cluster management, but not too bad at linux).

pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.13-6-pve)

CT 110 - Start ERROR: command 'systemctl start pve-container@110' failed: exit code 1

So, I've tried running the 'system start ...' via ssh. It takes a while, and then I get the following:

Job for pve-container@110.service failed because a timeout was exceeded.
See "systemctl status pve-container@110.service" and "journalctl -xe" for details.

Here is the output of 'systemctl status ...':

● pve-container@110.service - PVE LXC Container: 110
 Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
 Active: failed (Result: timeout) since Thu 2018-06-07 08:35:22 BST; 43s ago
 Docs: man:lxc-start
 man:lxc
 man:pct
 Process: 1603366 ExecStart=/usr/bin/lxc-start -n 110 (code=killed, signal=TERM)
 Tasks: 1 (limit: 4915)
 CGroup: /system.slice/system-pvex2dcontainer.slice/pve-container@110.service
 └─1532500 [lxc monitor] /var/lib/lxc 110

Jun 07 08:33:52 xen2 systemd[1]: Starting PVE LXC Container: 110...
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

and 'journalctl -xe':

Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Start operation timed out. Terminating.
Jun 07 08:35:22 xen2 systemd[1]: Failed to start PVE LXC Container: 110.
-- Subject: Unit pve-container@110.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@110.service has failed.
--
-- The result is failed.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Unit entered failed state.
Jun 07 08:35:22 xen2 systemd[1]: pve-container@110.service: Failed with result 'timeout'.

enter image description here

Despite this, all the other VMs/Containers within xen2 are still functioning fine. So, I've then decided to run the following commands to see what would happen:

Thank you for any tips/knowledge on troubleshooting this problem. If there is any other info you'd like me to share, please let me know.

Cheers,
David

linux mysql cluster proxmox percona

asked Jun 7 '18 at 8:08

David

2641822

asked Jun 7 '18 at 8:08

David

2641822

asked Jun 7 '18 at 8:08

David

2641822

asked Jun 7 '18 at 8:08

David

2641822

asked Jun 7 '18 at 8:08

David

2641822

add a comment |

1 Answer
1

active

oldest

votes

Just stumbled over the same problem (one cluster node only showed grey question marks and the containers lost their labels). In my case this was shortly after a proxmox update (from 5.3 to 5.4). After doing similiar things like the OP I finally figured out that my sshd was not listing on port 22 anymore. After restarting sshd it was not ok immediately but needed about 15min or so. Then everything was fine again.

answered 1 hour ago

Argl Bargl

New contributor

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f915584%2fentire-proxmox-xen-node-has-grey-question-marks-database-container-gone%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered 1 hour ago

Argl Bargl

New contributor

add a comment |

answered 1 hour ago

Argl Bargl

New contributor

add a comment |

answered 1 hour ago

Argl Bargl

New contributor

answered 1 hour ago

Argl Bargl

New contributor

answered 1 hour ago

Argl Bargl

New contributor

answered 1 hour ago

Argl Bargl

answered 1 hour ago

Argl Bargl

New contributor

Argl Bargl is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Irtsgbr

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1