In the era of highly complex system configuration, it is not peculiar to get high server load spikes. Relying on better server optimization and load mitigation approaches can make the work half. There are three general steps taken for high server load troubleshooting in Linux server:-

  1. Look out for a resource which is overloaded: Here we search for the resource which is getting hogged. Atop is an ideal tool if we are troubleshooting the physical server or you may use the regular top command if we are over OS virtualization environment. If VPS node is loaded, you might start with VZtop.
  2. Look for the service which is hogging that resource: Once we get identified of the resources then we can use specialized tools to look for services which are making that resource hogged.
  3. Look for the virtual host who is using that service: Once we get acquainted with the specific service then what is needed is to accomplish specific service activities.

While optimizing getting known of the services which are not causing a resource to hog is equally important to services which are causing it. Having a habit of frequently checking all command outputs in a normal server will give you the power to immediately see what is wrong.

There has been steep growth in data traffic owing to the proliferation of connected device, which has increased the demand on the data center for web hosting service providers. Industry prefers standard Intel architecture for techno solutions and data center consistency as it is scaling up network infrastructure. Intel has released its first XEON processor with SOC (system on chip) technology on its 14nm process technology which accomplishes I/O capabilities over a same chip. The new processor is delivering up to 3 to 4 time’s faster performance per node comparing to Intel® Atom™ processor C2750, part of its 2-gen 64-bit System on chip products family. Initially, Intel will have two XEON D processor Xeon D-1520 and Xeon D-1540 which is targeted at data center including dedicated hosting and storages. Initial products are optimized for hosting providers for a variety of loads.

Enhancing Uptime with VMware HA and DRS rules:

VMware Ha and DRS can improve server performance and uptime working together if correct configuration and planning are implemented. VMware’s HA (High Availability) and DRS (Distributed Resource Scheduler) are two of the software company's virtualization management.

VMotion is used to scale the server across multiple hosts and HA restarts the virtual machine in case of host failure. ESX/ESXi version has already had these features but they are not exclusive to each other so they have to work in unison. DRS does not concern about a number of the virtual machine per host, consequently not making an adjustment on virtual machine count.

There are two categories for setting up DRS and HA according to the classification of the virtual machine that we want to separate and don’t want to separate.

  • Separating VMs seems easy and may seem to protect against failure but as the business environment grows, keeping VMs separate becomes more complex. It cannot be considered much secure as if network issues occur then still we might have trouble. The separation of VMs with required hardware in blade server environment might come beneficial.A rack outage may be catastrophic to virtual environment owing to the higher density of virtual machines per rack with blades. In blade system, we must limit the numbers of blade enclosure in a single rack which may prove better for HRS and DA protocols.
  • We may want to keep some VM on the same host.E.g. A front-end server and database server communicate frequently so it is preferable to keep them on the same host. By residing them on the same VLAN will keep network traffic internal to host and decreases the excessive physical network traffic.Here the problem arises if they both go down. If they both are interdependent on each other then it is less meaningful to put one to start without other one and also they should start in correct order.We can overcome startup order problem by putting VM in Vapp.
    Here a lot of stress is over priority startup of VMs, having VMs that do not start automatically is a lucrative option.HA allows VMs to violate the constraint of resources. This does mean if the cluster is not having sufficient resources, it needs to start up then it will not get started. By not starting development and test VMs does help with additional resources while overriding the resource constraints may make it start. Here clusters might be running slower but it is beneficial than not running a system at all.

Making safe action with documentation:

We don’t have the simple way to extract rules or rules with HA and DRS configuration. The only way is to completely disable DRS and HA. Unlike other tools, there is no undo button. Keeping notes of rules for HA start-up priority, resource pool setting, and VM separation may be the safe action to perform