Gain visibility into user activity and discover Shadow IT cloud services

The Company

Skyhigh Networks, USA

The Project

The Enterprise Connector is an on-premise software that is installed on customer's hardware. It periodically checks for the new log files created by the firewalls and processes them. It extracts information from the logs such as the IP address, type of action such as ingress or egress, the bytes transferred, and the event timestamp. This information is then summarized, batched, compressed, and securely transmitted to cloud servers for further processing.

Tech Stack
  • Java 8

  • Scala

  • Akka

  • Akka Streams

  • LevelDB

  • ReactJS

The Challenges

There were various challenges that we faced when working with this approach including but not limited to

  • If the log file size if large, the processor, instead of streaming would capture a large chunk of memory for the processing. In case of memory issues, the entire process would crash bringing down the entire processor

  • When running the batching and summarizing data on-premise, there were memory leaks in the application code that would cause the application to crash

  • Additionally, in the event of processor crash, no one would know until days later when no reports were available to the customers.

The Approach
  • Diagnose process by reading the thread dumps and reproducing the issues in-house using the simulated environments matching the customer's production environment.

  • Writing unit tests to reproduce the bugs in an isolated fashion whenever possible so that once the solution is in place, there is a way to validate it.

  • Working with product team to capture the customer issues and their expectations.

The Solution
  • Apply streaming solution when processing individual log files generated by firewalls. This ensured that even with limited hardware stack, the program would not fail because of the memory issues as it would rely on back-pressure technique as defined in Reactive Manifesto

  • Creating a Master-Workers relationship so that Master acts as orchestrator and multiple workers can pull the work of processing the log files. This helped in balancing the workload of parsing, processing, summarizing and batching the information across multiple actors. So, even in the case of a failed worker (or actor), the entire process does not crashes and causes the minimum impact on the entire system

  • Redesign & Implement the system to watch proactively for the failures in the system and report to engineers for further diagnosis instead of customers finding out empty reports few days later.

Open-Source Contribution
  • During one of the customer issues, there was an opportunity to isolate and reproduce the issue with the leveldb database. The code is available on GitHub under the MIT license.

  • While implementing the work distribution using Master-Worker relationship, there was an opportunity to contribute to the akka project. The merged pull request is available at GitHub.

Responsibilities
  • Lead the product development efforts by coordinating with remote teams, Product Management, QA, and Customer Success

  • Design, Implement, Test and deploy new features and bug fixes.

  • Lead the development for UI/UX for Enterprise Connector Admin Dashboard.

  • Conduct sprint planning, backlog and retrospective meetings.

  • Work with the customers to debug, and resolve issues with their installations.

  • Work with the customer to gather new requirements for Enterprise Connector in collaboration with Customer Success team.

  • Conduct interviews for hiring in Engineering organization for Software Engineering and Product Management roles.

  • Work in collaboration with Developer productivity team to create automation tools to speed up the Software development and delivery process.

  • Mentor new engineers and college grads.

The Impact

The work resulted in positive outcomes such as

  • Reduction in system failures deployed on-premise. This is due to balanced workloads in combination of stream processing of log files and fixing the memory leaks in the application code.

  • Improved performance due to the implementation of stream processing instead of reading a file at once.

  • Better monitoring capabilities resulting in proactive resolution to errors. This also resulted in better customer satisfaction and reduced customer issues.

Team Structure
  • India → 2 Developers, 2 QA

  • California → 2 Engineers, 1 QA, 1 Product Manager, 1 Scrum Master