Sunday, July 17, 2016

Google Analytics

Google Analytics allows us to measure our websites traffic by viewing statistic reports and getting detailed information about our site’s performance.

Here are a few of the many questions that we can answer about our site by using Google Analytics:


•  How many visit the site?
•  Where do the visitors come from?
•  What directed them to our site?
•  Which pages are popular and viewed most?

How to:

• Install Google Analytics by creating an account and signing up. 

• Paste the Tracking code you get into the bottom of your HTML content of each page you are planning to track. (Any site that hasn’t been configured yet will say Tracking Unknown until you add the code to your website). This will allow Analytics and your website to talk to one another and interpret information about visits to your site.

• Set up Goals. A Goal is a webpage which a visitor reaches once they have completed an action that you desire. Goals help you make smarter decisions about your design by telling you:  Which page was visited most, the geographic location of converted visitors, the keywords that standout in your page and etc.

• For websites with a search box, set up Site Searches. This will track searches made on your website so you can learn more about what your visitors are looking for on specific pages.

• Finally, you will be able to view data by going into Audience Overview Report:

Audience- Gives info about the visitors such as Age, Gender, Interests, Location, Behavior (How often they visit the site).




Acquisition reports- Info about what drove visitors to your website. You will see your traffic broken down by main categories (All Traffic > Channels) and specific sources (All Traffic > Source/Medium).

Behavior reports- Info about your content. Particularly, the top entry pages on your website (Site Content > Landing Pages), and the top exit pages on your website (Site Content > Exit Pages).



You can also learn how fast your website loads (Site Speed) as well as find specific suggestions from Google on how to make your website faster (Site Speed > Speed Suggestions).

In Picscout:


By using Google Analytics we can conclude how to improve our site’s content and design, understand where visitors may be losing interest and falling off the path along the way (known as “Pain Points"), and learn how to convert more visitors. 

Tuesday, June 28, 2016

Deploying Entity Framework Model First


Deploying Entity Framework Model First


While working on a project that uses Entity Framework, we noticed that the auto-generated script, created from the Model, overwrites the old scheme and moreover, erases all the data.

Since the project was relatively small and the team working on it was not big, this was not a problem.  However, once the project started to be manually tested by QA, we had to find a simple deployment process to update the DB without restarting it.

The problem:

We didn't want to use EF migrations because of two reasons:
  1. It requires to do each change twice: First – in the DB, and Second – in the code.
  2. We didn’t want to manage every small change.

The solution:

We found a simple and easy solution that satisfies our needs:

Step 1

We ran the auto-generated script on a side DB (that will be erased after the process)
So at that point we had 2 DBs:   
Old DB – the original DB which exists in production and isn't updated.
New DB– the new and temporary DB which has the updated scheme.

Step 2

We executed the SqlPackage command line tool (part of SQL Server) on the New DB and created a .dacpac file.
A "dacpac" is a file with a .dacpac extension, which holds a collection of object definitions that one could find in a SQL Server database such as tables, stored procedures, views etc.

Step 3

Using the created .dacpac file we executed the Publish command on the SqlPackage tool, which compares between the two DBs and allows actions such as add/remove/update on fields, types, SPs and more.
The main problem with the publish command is with the implementation that could cause a serious performance issue. In some cases, the original table is copied to a side table with all the data, then a new table is created with the new scheme and lastly, the data from the side table is copied into the new table. Therefore, in case of a large data set it could take a long time.

Summary

DB migrations is a known issue, and there are a lot of good solutions out there. In our special case we decided to use a simple and easy solution which can be implemented with basic tools that arrive with the SQL Server version we had. This solution does not fit in any situation, however, for us it did the work.

Thursday, April 28, 2016

Writing Web-Based Client-Side Unit-Tests with Jasmine and Blanket

Preface

When writing a website, or more often - a one page app - there is a need to test it, just like any other piece of code.

There are several types of tests, of course, including unit tests and integration tests.
While integration tests test flows of the entire application, end to end, and thus simulate user interaction (which requires special browser-based testing package), unit tests run specific functions.
However, when writing an entire application in JavaScript, running pieces of code is a bit more tricky.

On one hand, we are not used to writing unit tests in JavaScript, and run the tests completely in the browser. On the other hand, calling JavaScript code and then testing various members for values is much more easily done when written directly in JavaScript.

Luckily, the good people of the web has given us several JavaScript based packages for writing unit tests. I'll talk about Jasmine, and add some words about Blanket, that integrates with Jasmine to perform code-coverage.

Jasmine

Jasmine is a JavaScript based library to perform unit tests. It consists of several parts:
  1. The Runner
  2. Tests Framework
  3. Plug-ins

1. The Runner

The runner is an HTML file with base code that loads the tests framework and runs the tests. It will not do anything when you take it out-of-the-box. You have to add your own scripts in there, so consider it a template.

The base HTML looks like this:
<link rel="shortcut icon" type="image/png" href="jasmine/lib/jasmine-2.0.0/jasmine_favicon.png">
<link rel="stylesheet" type="text/css" href="jasmine/lib/jasmine-2.0.0/jasmine.css">

<script type="text/javascript" src="jasmine/lib/jasmine-2.0.0/jasmine.js"></script>
<script type="text/javascript" src="jasmine/lib/jasmine-2.0.0/jasmine-html.js"></script>
<script type="text/javascript" src="jasmine/lib/jasmine-2.0.0/boot.js"></script>

Next, you need to add your own application scripts:
<script type="text/javascript" src="src/myApp.js"></script>

And finally comes your tests scripts:
<script type="text/javascript" src="tests/myAppTestSpec.js"></script>

2. Tests Framework

Jasmine have several files that create the tests framework. The most basic ones are the one listed above, in the basic HTML example. Let's go over them quickly:

jasmine.js

The most basic requirement. This is the actual framework.

jasmine-html.js

This one is used to generate HTML reports. It is a requirement, even if you don't want HTML reports.

boot.js

This one was added in version 2.0 of Jasmine, and it performs the entire initialization process.


Writing Tests


Structure

The unit tests in Jasmine are called "Specs", and are wrapped in "Suites". It look like this:
describe("A suite", function() {
  it("contains spec with an expectation", function() {
    expect(true).toBe(true);
  });
});

The describe function describes a test suite, while the it function specifies a test.
Note that those two get as parameters a name and a simple function block, and that the it block is being called in the body of the describe function block. This means you can store "global" members for each test suite. Also it means that the tested code comes inside the it block, along with any assertions.

Expectations (a.k.a. Asserts in other test suites)

When writing a unit test you expect something to happen, and you assert if it doesn't. While in other test suites you usually use the term Assert to perform such operation, in Jasmine you simply Expect something.

The syntax for expectations is straight forward:
expect(true).toBe(true);

There are many "matchers" you can user with the expect function, including but not limited to:
  • toBe - test the value to actually BE some object (using '===').
  • toEqual - test the value to EQUAL some other value.
  • toMatch - tests a string against a regular expression.
  • toBeDefined / toBeUndefined - compares the value against 'undefined'.
  • toBeTruthy / toBeFalsy - tests the value for JavaScript's truthiness or falsiness.
  • toThrow / toThrowError - if the object is a function, expects it to throw an exception.
You can also negate the expectation by adding not between the expect and the matcher.

Spies

You can also use Jasmine to test if a function has been called. In addition, you can (actually, need) to define what happens when the function is called. The syntax looks like this:
spyOn(someObject, "functionName").and.callThrough();
spyOn(someObject, "functionName").and.returnValue(123);
spyOn(someObject, "functionName").and.callFake( ... alternative function implementation ... );
spyOn(someObject, "functionName").and.throwError("Error message");
spyOn(someObject, "functionName").and.stub();
Then, you can check (expect) if the function was called using:
expect(someObject.functionName).toHaveBeenCalled();
or
expect(someObject.functionName).toHaveBeenCalledWith(... comma separated list of parameters ...);

More info

There are many features you can use with Jasmine. You can read all about it in the official documentation at http://jasmine.github.io/

3. Plug-ins

Well, I'll only talk about Blanket, the code coverage utility that integrates with Jasmine.

In the runner, add the following line before tests specs scripts, but after the application scripts:
<script type="text/javascript" src="lib/blanket.min.jsdata-cover-adapter="lib/jasmine-blanket-2_0.js"></script>

and that's it!

Below the test results report there will be the code coverage report.

The blanket.js package can be found at http://blanketjs.org/ and the adepter for Jasmine 2.x can be found at https://gist.github.com/grossadamm/570e032a8b144ec251c1 (unfortunately, blanket.js only comes pre-packaged with an adapter for Jasmine 1.x).




Happy Coding!

Sunday, April 17, 2016

Profiling .NET performance issues


In this post I want to talk about a frustrating problem most developers will encounter sometimes during their career - Performance issues.
You write some code, you test and run it locally and it works fine- but once it is deployed , bad things start to happen.
It just refuses to give you the performance you expect to...
Besides doing the obvious (which is calling the server some bad names) - what else can you do?

In the latest use case we encountered, one of our Sw. engineers was working on merging the code from a few processes into a single process. We expected the performance to stay the same or improve (no need for inter-process communication) - and in all of the local tests it did.

However, when deployed to production, things started getting weird:
At first the performance was great but than it started deteriorating for no apparent reason,
CPU started to spike and the total throughput went down to about 25% worse than the original throughput.

The SW. engineer, which was the assigned to investigate the issue, started by digging into the process performance indicators, using ELK.

Now, we are talking about a deployment of multiple processes per server and of multiple servers- so careful consideration should go into aggregating the data.
Here is a sample of some interesting charts:



Analyzing the results, we realized the problem happened on all of the servers intermittently.
We also realized that some inputs will cause the problem to be more serious than others.
We used Ants profiling tool on a single process and fed it with some "problematic" inputs and the results were surprisingly, not very promising...:

a. There were no unexpected hotspots.
b. There were no memory leaks.
c. Generation 2 collection was not huge, but it had a lot of data- more than gen1 (but less than gen0).


Well this got us thinking, might our problem be GC related?
We now turned to the Perfmon tool.
Analyzing the %time in GC metric revealed that some processes spent as much as 50% of their time doing GC.



Now the chips started falling-
One of our original processes used to do some bookkeeping, holding some data in memory for a long duration. Another type of a process was a typical worker: doing a lot of calculations using some byte arrays and than quickly dumping them.
When the two processes were merged we ended up with a lot of data in gen2 , and also with many garbage collection operations because of the byte arrays - and that resulted in a performance hit.

Well, once we knew what was the problem, we had to resolve it - but this is an entirely different blog post altogether...


Sunday, April 3, 2016

Challenges of learning ordinary concepts

In the last four years convolution neural networks (CNNs) have gained vast popularity in computer vision application.

Basic systems can be created from off the shelf components allowing solving in a relative easy task problems of detection ("what is the object appearing in the image?"), localization ("where in the image there is a specific object?") or a combination of both .


Above: Two images from the ILSVRC14 Challenge


Most systems capable to create product level accuracies are limited to a fixed set of different predetermined concepts , and are also limited by the inherently assumption that a representing database of all possible appearance of the required concepts can be collected.
The above two limitations should be considered when designing such a system as concepts and physical objects used in everyday life may not be easily fitted to these limitations.

Even though CNN based systems that perform well are quite new, the fundamental questions outlined below relate to many other Computer Vision systems.

One consideration is that some objects may have different functionality (and hence a name) whereas they have the same appearance.


For example, the distinction between a teaspoon, tablespoon, serving spoon, and a statue of a spoon is related to their size and usage context. We should note that in such case the existence and definition of the correct system output is highly depending on the system's requirements.





In general, plastic artistic creations, raises the philosophical question of what is the shown object (and hence the required system's output). For example - is there a pipe shown in the below image?


When defining a system to recognize an object, another issue is the definition of the required object. Even for a simple daily object, different definitions will result in different set of concepts. For example, considering a tomato, one may ask what appearances of a tomato are required to be identified as a tomato.
Clearly, this is a tomato:

But what about the following? When does the tomato cease to be a tomato and becomes a sauce? Does it always turns to a sauce?

Since this kind of Machine Learning systems learns from examples, different systems will behave differently. One may use all examples of all states of a tomato as one concept, whereas another may split it to different concepts (that is, whole tomato, half a tomato, rotten tomato, etc.). In both cases, tomato that has a different appearance and is not included in none of the concepts (say, shredded tomato) will not be recognized.
Other daily concepts have a meaning functional (i.e. defined by the question "what is it used for?") whereas the visual cues may be limited. For example, all of the objects below are belts. Except for the typical possible context (possible location around the human body, below the hips) and/ or functional (can hold a garment), there is no typical visual shape. We may define different types of belts that interest us, but then we may need to handle cases of objects which are similar to two types and distinctively belongs to one type.

Other concept definition considerations that should be addressed may be:
- Are we interested in the concept as an object, location, or both? As an object (on the left) it can be located in the image, whereas the question "where is the bus" is less meaningful for the right image.


Theses ambiguities are not always an obstacle. When dealing with cases when the concepts have a vague definitions or a smooth transition from one concept to the other, most system outputs may be considered satisfactory. For example, if an emotion detection system's output on the image below is "surprise", "fear" or "sadness" - it is hard to argue that it is a wrong output, no matter what were the true feeling of the person when the image was taken .


Written by Yohai Devir

Thursday, January 28, 2016

Logentries - Multi-platform Centralized Logging

Background:

Many times we develop websites that receive a request and perform multiple operations, including using other services which are not located under the same website.

The problem:

When we analyze error causes and unexpected results we need to track the flow from the request send, through the different processing stages up to the point where the response is received.
This requires the inspection of the different logs of the various system components.
We wanted to have a centralized logging where we could see the different logs in one location.
The problem was that the system components are in different platforms and languages (Windows/Linux, JS/.Net/C++).
We already use Kibana as a centralized logging for our in-house applications but here we have a website that is accessed from all over the world and logging data to the Kibana requires the exposure of an external endpoint.

The solution:

We chose to use Logentries.com site where log data from any environment is centralized and stored in the cloud. One can then search and monitor it.
It is very easy to use and provides two ways to achieve this; either by combining libraries directly in the application (e.g Javascript, .Net log4net Appender) or by adding an agent that listens to the application log file and sends it to the centralized logging site.
Logentries has simple and user-friendly UI with some useful features such as Aggregated Live-Tail Search, Custom Tags, Context View, Graphs.

This solution certainly meets our requirements.

Monday, January 18, 2016

.NET Compiler Platform "Roslyn"

At PicScout, we use .NET Compiler Platform, better known by its codename "Roslyn", 
an open source compiler and code analysis API for C#.

The compilers are available via the traditional command-line programs and also as APIs, which are available natively from within .NET code.

Roslyn exposes modules for syntactic (lexical) analysis of code, semantic analysis, dynamic compilation to CIL, and code emission.

Recently, I used Roslyn for updating hundreds of files to support an addition to our logging framework.
I needed to add a new member in each class that was using the logger and modify all the calls to the logger, of which there were several hundreds.

We came up with two ways of handling this challenge.

After reading a class into an object representing it, one possible way of adding a member is to supply a syntactical definition of a new member, and then re-generate back the source code for the class.
The problem with this approach was the relative difficulty of configuring a new member correctly.

Here is how it might look:

Generating the code:
private readonly OtherClass _b = new OtherClass(1, "abc");

Another option, which is more direct, was to simply get the properties of the class and use them.
For example, we know where the class definition ends and we can append a new line containing the member definition.

Here is how it looks:

Get class details:





Insert the new line (new member):



After that, replacing the calls to the new logger is a simple matter of search - replace.


Wednesday, January 6, 2016

PhantomJS

The Problem:
When given a website and an image on that website, the task is to take a screen capture of that image on the page it appears on the site.

Solution 1:
Manually: enter the website, find the given image (scroll if needed), take the screenshot and save it to the disk.
But what can you do when you have thousands of screenshots to take per hour?
You can employ hundreds of people to handle this scale, but...
This is kind of expensive and what should you do if your scale increases or decreases?

Solution 2:
Automate it: if only we could write a piece of software that could do exactly what we need...
So what do we actually need? Something that can:
1) Imitate a browser
2) Find an image on a webpage
3) Take the screenshot

Let me introduce PhantomJS:
PhantomJS is commonly known as Headless Web Kit with JavaScript API.
Headless refers to the fact that the program can be run from the command line without a window system.
JavaScript API means that we can easily write scripts that interact with PhantomJS which is useful if one needs to find an image on a webpage for instance.
Web Kit is the open-source web browsing engine that powers popular browsers like Chrome.

How to use it?

There are many ways to use PhantomJS, here at PicScout we use Selenium Web Driver to run PhantomJS. Selenium can control PhantomJS in the same way that it does any other browser.

How does it help me to take a screen capture?

As we said before, PhantomJS can run JavaScript, so all we have left to do is to write a short script that searches for the image location on the page and let PhantomJS run the script.
After receiving the location we can use PhantomJS to take a screen capture, despite the fact that PhantomJS is a Headless browser it still can render a web page as well as a web driver.


Code sample:

Running several instances of PhantomJS – problems and solutions

Problem #1: zombie processes. In our app we create and kill PhantomJS processes, we have noticed that after some time there are many zombie instances of PhantomJS.

Solution #1: for unknown reasons, occasionally we are unable to create a new PhantomJS instance. This happen when an exception is thrown and a new PhantomJS process starts. Now we need to manually find the process id and kill it.

Problem #2: low success rate on high CPU usage - when CPU reached 100% we were receiving a lot of errors from PhantomJS.

Solution #2: number of PhantomJS instances should be set according to 'computing power'. Notice that most of the time PhantomJS won't consume much CPU but there are websites for which this isn't the case, you should take this into consideration when you decide how many PhantomJS processes you would like to run.

Problem #3: sometimes the screen capture fails without any apparent reason.

Solution #3: we were able to increase the screen capture success rate by using a retry mechanism.