PicScout's Engineering Blog: January 2016

Thursday, January 28, 2016

Logentries - Multi-platform Centralized Logging

Background:

Many times we develop websites that receive a request and perform multiple operations, including using other services which are not located under the same website.

The problem:

When we analyze error causes and unexpected results we need to track the flow from the request send, through the different processing stages up to the point where the response is received.

This requires the inspection of the different logs of the various system components.

We wanted to have a centralized logging where we could see the different logs in one location.

The problem was that the system components are in different platforms and languages (Windows/Linux, JS/.Net/C++).

We already use Kibana as a centralized logging for our in-house applications but here we have a website that is accessed from all over the world and logging data to the Kibana requires the exposure of an external endpoint.

The solution:

We chose to use Logentries.com site where log data from any environment is centralized and stored in the cloud. One can then search and monitor it.

It is very easy to use and provides two ways to achieve this; either by combining libraries directly in the application (e.g Javascript, .Net log4net Appender) or by adding an agent that listens to the application log file and sends it to the centralized logging site.

Logentries has simple and user-friendly UI with some useful features such as Aggregated Live-Tail Search, Custom Tags, Context View, Graphs.

This solution certainly meets our requirements.

Monday, January 18, 2016

.NET Compiler Platform "Roslyn"

At PicScout, we use .NET Compiler Platform, better known by its codename "Roslyn",
an open source compiler and code analysis API for C#.

(https://roslyn.codeplex.com/)

The compilers are available via the traditional command-line programs and also as APIs, which are available natively from within .NET code.

Roslyn exposes modules for syntactic (lexical) analysis of code, semantic analysis, dynamic compilation to CIL, and code emission.

Recently, I used Roslyn for updating hundreds of files to support an addition to our logging framework.

I needed to add a new member in each class that was using the logger and modify all the calls to the logger, of which there were several hundreds.

We came up with two ways of handling this challenge.

After reading a class into an object representing it, one possible way of adding a member is to supply a syntactical definition of a new member, and then re-generate back the source code for the class.

The problem with this approach was the relative difficulty of configuring a new member correctly.

Here is how it might look:

Generating the code:

private readonly OtherClass _b = new OtherClass(1, "abc");

Another option, which is more direct, was to simply get the properties of the class and use them.

For example, we know where the class definition ends and we can append a new line containing the member definition.

Here is how it looks:

Get class details:

Insert the new line (new member):

After that, replacing the calls to the new logger is a simple matter of search - replace.

Wednesday, January 6, 2016

PhantomJS

The Problem:

When given a website and an image on that website, the task is to take a screen capture of that image on the page it appears on the site.

Solution 1:

Manually: enter the website, find the given image (scroll if needed), take the screenshot and save it to the disk.

But what can you do when you have thousands of screenshots to take per hour?

You can employ hundreds of people to handle this scale, but...

This is kind of expensive and what should you do if your scale increases or decreases?

Solution 2:

Automate it: if only we could write a piece of software that could do exactly what we need...

So what do we actually need? Something that can:

1) Imitate a browser

2) Find an image on a webpage

3) Take the screenshot

Let me introduce PhantomJS:

PhantomJS is commonly known as Headless Web Kit with JavaScript API.

Headless refers to the fact that the program can be run from the command line without a window system.

JavaScript API means that we can easily write scripts that interact with PhantomJS which is useful if one needs to find an image on a webpage for instance.

Web Kit is the open-source web browsing engine that powers popular browsers like Chrome.

How to use it?

There are many ways to use PhantomJS, here at PicScout we use Selenium Web Driver to run PhantomJS. Selenium can control PhantomJS in the same way that it does any other browser.

How does it help me to take a screen capture?

As we said before, PhantomJS can run JavaScript, so all we have left to do is to write a short script that searches for the image location on the page and let PhantomJS run the script.

After receiving the location we can use PhantomJS to take a screen capture, despite the fact that PhantomJS is a Headless browser it still can render a web page as well as a web driver.

Code sample:

Running several instances of PhantomJS – problems and solutions

Problem #1: zombie processes. In our app we create and kill PhantomJS processes, we have noticed that after some time there are many zombie instances of PhantomJS.

Solution #1: for unknown reasons, occasionally we are unable to create a new PhantomJS instance. This happen when an exception is thrown and a new PhantomJS process starts. Now we need to manually find the process id and kill it.

Problem #2: low success rate on high CPU usage - when CPU reached 100% we were receiving a lot of errors from PhantomJS.

Solution #2: number of PhantomJS instances should be set according to 'computing power'. Notice that most of the time PhantomJS won't consume much CPU but there are websites for which this isn't the case, you should take this into consideration when you decide how many PhantomJS processes you would like to run.

Problem #3: sometimes the screen capture fails without any apparent reason.

Solution #3: we were able to increase the screen capture success rate by using a retry mechanism.