Bolt on Instance Diagnostic intended for Amazon EC2 | Netflix TechBlog

http techblog.netflix.com 2017 04 introducing-bolt-on-instance-diagnostic.html
http techblog.netflix.com 2017 04 introducing-bolt-on-instance-diagnostic.html

Introducing Bolt: On-Instance Diagnostics regarding AWS Components

Every single day, thousands regarding Netflix engineers construct, test, and release software and services on AWS. To be able to support their work, we've developed a new range of gear and services in order to help them rapidly diagnose and resolve issues.

One of the most popular instruments is Chaos Goof, which randomly terminates instances in generation to test each of our systems' resilience. On the other hand, Chaos Monkey can sometimes cause troubles that are challenging to diagnose, especially when the example is running a variety of services.

To address this problem, we've designed Bolt, a brand new tool that offers on-instance diagnostics for AWS components. Bolt can be used to collect some sort of variety of info from an occasion, including:

  • System metrics (CPU, memory, drive I/O, etc. )
  • Method information (list involving processes, CPU and memory usage, and so on. )
  • Network information (list of open ports, network traffic, and so on. )
  • AWS component info (list of working AWS services, their particular configuration, etc. )

Bolt can always be used to diagnose a wide selection of problems, like:

  • High CPU or maybe memory usage
  • Slow networking performance
  • Failed AWS services
  • Application crashes

Bolt is definitely easy to make use of. It can become installed on any instance with an internet connection. After installed, Bolt can easily be run by the command line or via some sort of web interface.

Bolt is open source in addition to available on GitHub.

Exactly how Bolt Works

Bolt is definitely a Python app that uses the variety of approaches to collect info from an occasion. These techniques consist of:

  • The Python psutil library to be able to collect system metrics and process data.
  • The Python netifaces library to collect network information.
  • The AWS Python SDK to collect information regarding AWS components.

Bolt can be operate in two modes:

  • Diagnostic mode: This kind of mode collects a snapshot of info from the instance. This information can be used for you to diagnose problems of which are occurring with the time Bolt is run.
  • Monitoring mode: This mode collects information from the particular instance over period. This information can easily be used to track the performance of the example and identify trends that may reveal potential problems.

Bolt can be set up to collect various types of data depending on typically the needs of this user. For instance, an user may well choose to accumulate only system metrics and process info, or they may possibly choose to collect all of the particular information that Bolt can provide.

Using Bolt

Bolt can be used to diagnose the wide range regarding problems. Here usually are a few illustrations:

High CPU or maybe memory usage: Bolt can easily be used to be able to identify the procedure or processes the fact that are using the particular most CPU or maybe memory. This data can be applied to troubleshoot functionality problems.

Slow networking performance: Bolt can turn out to be used to discover the source associated with slow network efficiency. This information might be used in order to troubleshoot network issues and improve functionality.

Failed AWS companies: Bolt can be used to identify the particular cause of hit a brick wall AWS services. This information can turn out to be used to troubleshoot AWS problems and even restore service.

App crashes: Bolt can become used to discover the cause of application crashes. This kind of information can end up being used to troubleshoot application problems and improve stability.

Bolt is usually a powerful tool that can always be used to detect a wide collection of problems. This is easy to work with and can end up being installed on any kind of instance with a good internet network.

Summary

Bolt is a valuable instrument for diagnosing problems on AWS occasions. It is effortless to use in addition to can provide the wealth of data that can always be used to troubleshoot problems and increase performance.

We encourage an individual to try Bolt and see just how it can aid you improve the reliability and efficiency of your AWS applications.

Additional Resources