Incident.MOOG Product Frequently Asked Questions (FAQ)

What is IT Incident Management?

Incident Management is an IT service management (ITSM) process area. Incident is defined as “an unplanned outage”. The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. Incidents that cannot be resolved quickly by the help desk will be assigned to specialist technical support groups. A resolution or work-around should be established as quickly as possible in order to restore the service.

What is IT Event Management (or Manager of Managers)?

Event Management, as defined by ITIL (Information Technology Infrastructure Library), is the process that monitors all events that occur through the IT infrastructure. It allows for normal operation and also detects and escalates exception conditions. An event can be defined as any detectable occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service. An event, if not managed proactively, might develop into an incident, which is an “unplanned outage”.

In recent years, IT Event Management has also been called “Manager of Managers (MoM)”, which attempted to expand from the original domain-focused events such as servers, to all-domain events including applications, servers, databases and networks.

What is IT Operations Analytics?

IT Operations Analytics (ITOA) (also known as Advanced Operational Analytics, or IT Data Analytics) technologies are primarily used to discover complex patterns in high volumes of often “noisy” IT system availability and performance data. Forrester Research defines IT analytics as “The use of mathematical algorithms and other innovations to extract meaningful information from the sea of raw data collected by management and monitoring technologies.” In recent areas, ITOA tools have received a significant interest from IT operations team to improve the overall Event and Incident Management process.

What product does Moogsoft offer?

Moogsoft’s flagship product is Incident.MOOG.

What is Incident.MOOG?

Incident.MOOG is a real-time Manager of Managers (MoM) for IT Ops and DevOps teams to detect, collaborate, and resolve service incidents as they develop and unfold.

Why do I need Moogsoft’s solutions?

If your customers frequently report service incidents before you even know, Moogsoft can help: A top tier web-scale company with hundreds of millions of active users, was able to detect incidents more than 24 hours before corresponding tickets were created by IBM Netcool.

If your business increasingly depends on a dynamic infrastructure, Moogsoft can help: A $450M SaaS company had cut its DevOps time to software release from 14 days to 1 day.

If your IT war room operators are overloaded with “spam” events and don’t know where to look first, Moogsoft can help: A Fortune 100 Manufacturer has cut their raw events from 115 millions a day to 250 actionable situations, with cross-domain causes and impacts presented on a single pane of glass.

How effective is Incident.MOOG?

Based on our results from customer production environments*, Incident.MOOG:

  • Detected outages 12-24 hours before IBM Netcool and similar system;
  • Reduced event to actionable incidents (called situations in Moogsoft) by up to 99.998% after a brief learning/priming, or by 90% with minimal configuration.
  • Identified 100% of actionable issues, of which 30% are usually recurring incidents that can be permanently remediated.

(*Source: Trials conducted in 2014, with event feeds taken from up to 100,000 managed elements. These results were achieved in less than 30 days. Click here for a detailed report from a Fortune 100 Manufacturer.)

Why do IT Ops teams like it?

With Incident.MOOG, IT Ops teams can spend less time manually removing “spam” events and alerts, see clearly cross-domain root causes and impacts as they unfold in real-time, collaborate instantly with the most relevant stakeholders, and automatically capture and recycle knowledge of remediation. Incident.MOOG helps IT Ops restore customer confidence in IT, faster.

Why do DevOps teams like it?

With Incident.MOOG, DevOps teams can rapidly and continuously pinpoint root cause for software failures throughout the DevOps process. Incident.MOOG helps DevOps teams cut time to launch production software and services, faster.

How does Incident.MOOG work?

The Incident.MOOG platform consists of a patented, real-time machine learning engine, a just-in-time virtual war room, and an open RESTful API.

The machine-learning engine can in real-time contextualize hundreds of millions of evens into just hundreds of real, actionable incidents.

The virtual war room can instantly present cross-domain root causes and impacts in a single pane of glass for the most relevant stakeholders, and automatically capture and recycle knowledge of remediation.

The open API allows rapid, two-way, and custom integrations with popular IT Service Management (ITSM) systems.

Unlike other MoMs that analyze one domain silo at a time by relying on preset rules and postmortem analysis, Incident.MOOG excels in real-time detection of incidents across the entire IT infrastructure spectrum, and in real-time presentation of cross-domain root causes and impacts in a single pane of glass.

As a result, IT Ops and DevOps teams can significantly reduce their manual, ineffective incident management workloads, Mean Time to Repair (MTTR), and time to launch new services.

How is Incident.MOOG different from IBM Tivoli-Netcool, CA Spectrum, BMC TrueSight Event Manager, and EMC Smarts?

In the words of Phil Tee, inventor of Netcool and Incident.MOOG, “Incident.MOOG is what Netcool would have been, had Netcool been invented today instead of in 1993.”

These legacy Incident Management solutions still rely on a 1990s’ static, manual, rule-based approach, processing events one at a time and ranking them using static rules and severity levels. However, many IT organizations had migrated to virtualization, BYOD, mobility, cloud, and DevOps practices. Within these software-defined environments, there is no simple and direct relationship between application problems and underlying hardware or software, as infrastructure is being dynamically provisioned. Rules that legacy MoMs use will only work if the infrastructure remains static. Today, application behavior is too complex and dynamic to model with rules.

Moogsoft provides the next-generation manager-of-managers (MoM), Incident.MOOG that sits across the entire IT environment and domain-specific event management tools. It is design to process billions of events across domains (public cloud and private cloud) and across layers of stack (apps, middleware, databases, compute, storage, network).

What features and capabilities does Incident.MOOG offer vs. legacy MoM (e.g. IBM Tivoli-Netcool, CA Spectrum, BMC TrueSight Event Manager, and EMC Smarts)?

The following table compares the key differences in capabilities between Incident.MOOG and legacy MoM:

Key Capabilities

How is Incident.MOOG different from log analyzers (e.g. Splunk and Sumo Logic)?

Splunk and its various add-on modules apply an individual algorithm to index and analyze log file data after the fact. These tools are very useful in forensic analysis and can illuminate what has occurred historically in large IT infrastructures. To the extent that the infrastructure doesn’t change, this analysis may also be useful in predicting some future scenarios (disk full, etc.) However, these tools do not work in real-time and they cannot process millions of events per day to detect problems as they unfold. Splunk, Sumo Logic, etc. can function as a data feed to Incident.MOOG.

How is Incident.MOOG different from Application Performance Management (APM) tools (e.g. AppDynamics, New Relic, Compuware)?

Application Performance Monitoring (APM) solutions detect issues at the application layer, but they do not provide visibility into the status of the underlying infrastructure (compute, storage and network) – hence they do not provide root causes and impacts that slice across layer 1 through 7 silos. APM products such as AppDynamics, New Relic and Compuware can be a good data feed into Incident.MOOG, providing application domain specific events.

How does Incident.MOOG fit into the Information Technology Infrastructure Library (ITIL) methodology?

Incident.MOOG is unique in that it empowers the ITIL processes of incident management and problem management to be robust and effective in production, even if the configuration management database (CMDB) is incomplete. Incident.MOOG leverages ITIL processes and documentation that may exist in the enterprise while continually adapting to changes that may not be documented.

Where does Incident.MOOG fit architecturally vs. Monitoring Tools and IT Service Management?

When events and alarms are unfolding in real-time, IT war room operators need to know immediately where to look first: Which domain experts do I engage immediately: APM? VM cluster? Database cluster? Network? Public Cloud Provider?

As shown below, Incident.MOOG sits above domain-specific monitoring tools, and provides cross-domain, real-time root causes and impacts in a single virtual situation room. And by integrating bi-directionally with ITSM Trouble Ticket Systems, fewer, real tickets will be recorded into the trouble ticket systems because the remaining events will be “spam”. If customers call during root cause analysis, the ticket systems will have the latest root causes and impacts so the helpdesk representatives will be well informed. And when tickets are clear, they are instantly synchronized to Incident.MOOG.

Support Incident.MOOG

How do I feed event sources into Incident.MOOG? Do I need to deploy agents or other types of data-collecting software in order to run Incident.MOOG?

No. Incident.MOOG taps into your existing data feeds and does not require any additional software or re-tooling of your event stream.

What types of data feeds are supported by Incident.MOOG?

Incident.MOOG receives any event feed with any textual data, time stamps and clearly defined fields (examples: SNMP, syslog, IMAP, SMTP, any Unix file/socket descriptor), and unstructured data such as customer sentiment on social media (example: Twitter). These event sources can be taken from the entire IT infrastructure spectrum, including:

  • Application Performance Monitoring/Management (APM) tools: AppDynamics, New Relic, Compuware and others;
  • Network Performance Monitoring and Diagnostics tools: JDSU, Riverbed, Fluke Networks and others;
  • Event Managers: IBM Tivoli Netcool, BMC Event Manager, CA Spectrum, EMC Smarts, Microsoft System Center, SolarWinds and others;
  • Log files: Splunk and others;
  • Open source based infrastructure monitoring tools: Nagios and others.

In summary, Incident.MOOG takes event feeds from domain-specific monitoring tools spread across public cloud, private cloud, applications, databases, automation, compute (physical and virtual), networks (physical and virtual), and storage (physical and virtual). At customer sites today, Incident.MOOG is processing hundreds of millions of events per day from these sources.

What does Moogsoft use machine learning technologies for?

Moogsoft uses machine learning for real-time detection of IT incidents as they unfold across the entire IT infrastructure spectrum: Wide across static, and dynamic infrastructure defined by software, across private and public cloud, and deep across all domains including apps, middleware, databases, compute, storage, network, IoT.

Why does Moogsoft use machine learning?

Having invented and commercialized IBM Tivoli Netcool in the 90s, our founders sat back and observed the advent of web-scale, virtualization, software-defined X, BYOD, mobility, cloud, big data, and lately IoT.

They came to realize that the sheer volume, velocity, and variety of events and alerts emitted by the modern dynamic IT infrastructure, simply couldn’t be sustained by legacy event managers’ rule-based architecture. In large enterprise and service provider environment, these rules have grown to 1000s – now requiring some heavy computing power.

The only way to detect issues at wire-speed among these dynamically generated events – characterized as “unknown unknowns”, is to think about a fundamentally different technology. Moogsoft machine learning was born.

How long has Moogsoft been developing machine learning?

Moogsoft started researching machine learning algorithms about 8 years ago in UK – working closely with universities. Our founders, scientists and product engineers then developed a prototype by working very closely with actual customers, including a web-scale company and a global financial services company. Both provided the seed funding for the initial technology. The result was a highly innovative product that had delivered breakthrough results, leading to incorporation of the company in 2012.

What is machine learning?

Machine learning is a type of algorithm that provides computers with the ability to identify meaning in data without being explicitly programmed to find pre-defined features. Machine learning focuses on the development of computer programs that can adapt to change when exposed to new data.

How advanced is Moogsoft’s machine learning?

Moogsoft’s machine learning is most advanced in three distinct areas:

Architected from the ground up for unsupervised machine learning, with no reliance on rules.

Focused on real-time clustering of related events into real, actionable incidents – we call them “situations”. This is not retrospective, or forensic analysis.

Built with multiple unsupervised and supervised algorithms to cluster related events into situations. We use multiple algorithms to inspect each event in many ways, including time, linguistic similarity, topology, Ops-team-defined-template, and, deterministic-cookbook.

How does Moogsoft machine learning work?

During the initial cleaning process, our machine learning engine removes noise, blacklists unwanted events, and de-duplicates many others. This typically reduces hundreds of millions of raw events per day down to 1 million alerts. At this point, the engine then contextualizes the still large volume of alerts into much fewer incidents – again, we call them “situations”. The engine looks at multiple variables to assess how “surprising or abnormal” an event is, and how it relates to other events. These variables are:

Time: The engine uses unsupervised learning to identify clusters of alerts that are temporally correlated, identifying underlying service outages or situations. The engine spots unusual patterns in the timestamps of events that may indicate that these events are related.

Linguistic: The engine uses unsupervised learning to detect linguistic relationships in events. It groups alerts according to the similarity of linguistic attributes.

Topology: The engine uses unsupervised machine learning to cluster events based on their network proximity – events from a similar location as being potentially correlated.
Ops-Team-Defined-Template: IT Ops teams can create a template using a discovered situation, which can then be used to compare against a future situation. If there is a close match, IT Ops can use the template to either reject the future situation as a noise, or kick off a specific remediation script/process, or do something in between.

Moogsoft Machine-Learned-Feedback: Our engine can automatically learn from what the IT Ops team did from a situation previously and re-apply those actions. For example, ignore, or execute a set of remediation scripts.

Deterministic Cookbook-based (optional): It gives you complete control over which alerts get clustered into Situations. It allows you to create Situations according to a pre-defined Recipe (streaming SQL filters trigger the application of selected algorithms to events). The Cookbook gives you the power to create situations in a fully deterministic fashion, while retaining the power of the machine learning algorithms.

Moogsoft founders, technologists and engineers figured out an elegant, algebraic algorithm, which uses traditionally offline techniques to render computable in real time. Rather than analyzing data for every possible combination of events to situations, this approach narrows down the combinations dramatically – minimizing computational complexity. This results in our advantages in processing speed and scale.

What advantages does Moogsoft have over other machine learning techniques that sound similar at the surface?

Because Moogsoft machine learning has been designed to work in real-time, using multiple patented algorithms to detect “anomalies” in multiple ways, we have exceled in three areas:

Speed: Works in real-time as incidents unfold, leading to faster MTTR and MTTD.

Scale: Proven to scale to 115 million raw events a day from the most diverse event feeds across the entire IT infrastructure, with no false negatives, and with only 10% false positives.

Simplicity: Moogsoft machine learning engine is simple to use. Because our engine reduces “noise” significantly, it has less data to handle and is therefore less wieldy. This presents fewer, actionable root cause and impacts in a single virtual war room for cross-domain stakeholders, faster creation and recycling of knowledge articles, and faster synchronization with ticketing systems.

Is Incident.MOOG available for free trial?

Yes. Send us an email at or fill out our contact form to arrange a phone call with a member of Team Moogsoft.

What are the options for deploying Incident.MOOG?

Incident.MOOG is typically installed on customer-site as licensed software. A cloud-resident SaaS solution is work-in-progress.

What hardware do I need to support Incident.MOOG?

A typical installation requires two Intel servers each with 64GB RAM, running MySQL and Apache and a 2TB disk.

What browsers Incident.MOOG UI supports?

These are the browsers that are currently supported.

Operating System Recommended Browser Support Versions Not Recommended
OS 10.9 Chrome 39/40 Safari 7.x FireFox 34/35
Windows7 Chrome 39/40 FireFox 34/35 IE9/10/11/8
CentOS 6.4/6.5 FireFox 31.4 ESR
RHEL 6.4/6.5 FireFox 31.4 ESR (Not Tested)

Browser Support

Why was Moogsoft founded?

Having invented and commercialized IBM Tivoli Netcool in the 90s, our founders sat back and watched the advent of web-scale, virtualization, software-defined X, BYOD, mobility, cloud, big data, and lately IoT.

They came to realize that the sheer volume, velocity, and variety of events and alerts emitted by the modern dynamic IT infrastructure, simply couldn’t be sustained by legacy event managers’ rule-based architecture. In large enterprise and service provider environment, these rules have grown to 1000s – now requiring some heavy computing power.

The only way to detect issues at wire-speed among these dynamically generated events – characterized as “unknown unknowns”, is to think about a fundamentally different technology. Moogsoft machine learning was born.

Who founded Moogsoft?

Moogsoft was founded by Phil Tee and Mike Silvey to bring sorely-needed innovation to service management and thereby enable IT operations to meet the challenges of the 21st century service economy. More than 20 years ago, Phil Tee invented Netcool (now known as IBM Tivoli Netcool) and Mike Silvey brought it to market. Now they are delivering the next-generation solution that is long overdue.

Where is Moogsoft?

Moogsoft, Inc. is a privately held software firm headquartered in San Francisco, California, with additional offices in Surbiton, UK (London) and Hoboken, New Jersey (New York City area).

How open and customizable is Incident.MOOG?

Incident.MOOG can be customized extensively. The event ingestion mechanism uses an industry standard language (JavaScript) with our own extension APIs (REST, JDBC etc.) to allow manipulation and processing of incoming data if needed, the advanced machine learning algorithms can be tuned and tailored to a customer environment, and we have an open, RESTful interface for rapid, two-way, and custom integrations with 3rd party IT Service Management systems (e.g. BMC Remedy, ServiceNow and others).