The server diagnostics mode checks server equipment status, adds information on it in DCImanager and prepares the server for a new user.
Preparing for server diagnostics
To diagnose the server:
- Enter the server IP and MAC-addresses in DCImanager;
- Ensure the availability of DCImanager server for diagnostics;
- Set up the network boot;
- Connect the server to a PDU or IPMI.
- Select the diagnostics template in Settings → OS templates
- Diag-x86_64 — for network boot with iPXE;
- Diag-x86_64-noipxe —for network boot with noiPXE.
- Select the interfaces the DHCP-server runs on in Settings → Global settings → Check before releasing → the Interfaces field.
Server diagnostics
Manual start
Go to Main menu → Servers → Operations.
- Operation type — select "Run diagnostics";
- Run diagnostics — select a diagnostics template;
- Clear discs — select the checkbox to clear hard drives during diagnostics. Selecting this checkbox will zero first 512 Bytes of the hard drive. This option can be used only if diagnostics templates support this feature;
- Full hard drive erase — select the checkboxSystemRescueCD to erase whole hard drives. The whole hard drive will be zeroed. It may take a few hours depending on hard drive size and speed. This option is available only by selecting "Clear discs" option;
- Inform upon completion — select the checkbox if you want to be informed when the operation is completed or the server becomes accessible via SSH
Auto start
Diagnostics run automatically:
- during the server search. Learn more in the article Server search;
- when a server is released, if the option Check before releasing is selected in Settings → Global settings. Also in the Global settings, you can set the auto diagnostics options: Clear discs, Full hard drive erase, Diagnostics templates. Learn more in the article Global settings.
How it works
Server diagnostics algorithm :
- The system creates a block in the DHCP-server configuration file, which enables to work with the server's MAC-address
- The server passes authorization through DHCP.
- The server uploads the diagnostics template.
- The server check script starts.
- The "Server has hardware issues" status is set for the server.
- The system clarifies:
- The processor model.
- Amount of RAM.
- The presence of a hardware RAID-controller.
- The presence of HDDs (may not work properly if the server had a hardware raid controller)
- Hard drive slots.
- The system checks:
- Local connection speed.
- Read rate and SMART-information of HDD.
- If IPMI is detected, then the system configures:
- Network settings (IP-address, mask, gateway)
- New user and new password.
- If the Add IPMI automatically option is enabled in IPMI will be added to the server.
- All the information is sent to DCImanager.
- The server is powered off if the Power off servers upon checking option is enabled in Settings → Global settings. Otherwise, the server is rebooted in the normal course.
- DCImanager processes diagnostics results:
- DCImanager compares that the platform corresponds to the detected server equipment:
- The number of processors should be more than 0, but not exceed the value specified for the platform type.
- The amount of RAM should be more than 0, but not exceed the value specified for the platform type.
- The number of HDD should be more than 0, but not exceed the value specified for the platform type.
If the results differ from the values specified for the platform type, DCImanager creates a new platform and assigns it to the server.
- HDD is plugged off from the server. If the hardware RAID is found, only HHD that were added during the previous diagnostics will be plugged off. The HDD that was specified manually will remain. Generally, if the hardware RAID was found on the server, then DCImanager cannot receive correct HDD information.
- Read rate and SMART-information of HDD will be checked. Check parameters are specified in Types of equipment → HDD → HDD types.
- Local connection speed will be checked.
- If sockets and scalability are not set for the CPU in Types of equipment → Processors, the administrator will be asked to specify the missing data.
- The system checks, whether the status "Server has hardware issues" should be removed. The status will be removed, if the following requirements are met:
- Local connection speed is within the bounds of <LocalSpeedThreshold*Port_Speed/100> to <Port_Speed>. LocalSpeedTreshold - is a parameters in the DCImanager configuration file, in %( /usr/local/mgr5/etc/dcimgr.conf by default). The default value is 80%. For example, the default threshold for port 100 MB/sec is 80 MB/sec. Local connection speed, in this case, should be from 80 to 100 MB/sec.
- Hardware RAID is not present.
- HDD parameters (read speed and SMART-criteria) are within the limits.
- DCImanager compares that the platform corresponds to the detected server equipment:
To check the last diagnostics results go to the section Main menu → Servers → Edit → Diagnostic results block.
If the diagnostic process is interrupted on the server, the server will have the "Server has hardware issues" status.
To remove the status after diagnostics go to the section Main menu → Servers → Edit and enter the necessary fields, which are empty. For example, if the server platform type wasn't defined, in the server edit form you will see "No platform" in the "Platform type" field, and the warning "A platform type is not selected for this server".