Excel reporting from web apps using EPPlus

I have recently been working on a new feature request for the ability to generate excel reports dynamically from a web application. This kind of feature can add a lot of value to a system as it enables businesses to analyse crucial trends in data on demand.

However it can be challenging to generate reports in an Excel spreadsheet format from a web application. Firstly you need to have the program itself installed on the web server, which can be difficult if you are using third party hosting. Also coding the solution itself is difficult as you either need intimate knowledge of the Excel API or, for versions of office after 2007, in depth knowledge of the OpenXML file format. But fear not there is a third option, open source to the rescue! Some clever chaps have developed the EPPlus library which abstracts away all the nasty low level OpenXML calls and provides a nice object oriented API for dealing with directly the spreadsheet.

Modelling the report data

I won’t go into too much detail on the features of the library as you can read all about it on the EPPlus FAQ page and download the samples. However one of the methods that I found really useful was the LoadFromArrays method. This enables a table of data to be output to the worksheet from a list of object arrays. So you can construct a model of the data you want to display then feed it straight to the worksheet through the list. It may sound quite complicated but here is a simple example that can be easily expanded with more columns / rows:

string firstName = "Joe";
string lastName = "Bloggs";
var dateOfBirth = new DateTime(2000, 1, 1);
var testData = new List<object[]>()
                   {
                       new object[] {"First name", firstName},
                       new object[] {"Last name", lastName},
                       new object[] {"Date of birth", dateOfBirth}
                   };

using (var excelPackage = new ExcelPackage())
{
    ExcelWorksheet excelWorksheet = excelPackage.Workbook.Worksheets.Add("Test worksheet");
    //Loads the summary data into the sheet, starting from cell A1. Prints the column names on row 1
    excelWorksheet.Cells["A1"].LoadFromArrays(testData);
}

Don’t get duped

I generally found that the EPPlus library was reliable, however I did hit a snag during the development. The file would get generated OK but when I tried to open it in Excel a nasty message was shown:

Excel found unreadable content in ‘Test export.xlsx’. Do you want to recover the contents of this workbook?

The file did open OK if you chose to recover, however this obviously isn’t ideal and not acceptable for a client release. The tricky part was that no error was being thrown so I had to use a bit of guess work to track down the issue. Initially there wasn’t much to go on, however after running some more tests on sample data I noticed that the export always worked fine if the file only included a single worksheet. However sometimes multiple worksheets would cause the ‘unreadable content’ error. From there I examined each of the properties that were being set on the worksheets and narrowed the problem down to be the worksheet name itself.

It turns out that excel requires each worksheet to have a unique name. This is backed up in excel itself if you try and manually name to worksheets the same you get a reasonably friendly error message saying:

Cannot rename a sheet to the same name as another sheet

After ensuring my worksheets were uniquely named the reports always opened up without a problem. As an aside I just tried to replicate this bug through EPPlus and this time it returned a much more helpful error message, I wish it had told me this when I was originally working on the feature!

Add worksheet Error: attempting to create worksheet with duplicate name

EPPlus for the win

In conclusion I would recommend EPPlus to anybody who needs to generate an excel report from C# code and doesn’t want to interface with the murky depths of the Excel API. It is a great open source library that does just what you need in the way you would expect it to work.

Quartz makes scheduling jobs easy with EPiServer Commerce

Quartz rockA common requirement for e-commerce applications is the ability to schedule jobs or tasks. I am currently working on a project which uses EPiServer Commerce and I needed to schedule an order export job.

I was pleasantly surprised to find that EPiServer Commerce is supplied bundled with the Quartz.NET enterprise code library to facilitate these job scheduling requirements.

Gentlemen, choose your weapons

I had three main options available for implementing the export job, a separate windows service, an EPiServer CMS scheduled job or a Quartz job. I decided that the Quartz service was better than the option of a separate windows service because the framework was already there and available so would not increase the complexity of deployments. I also felt it was a better option than a EPiServer CMS scheduled job because Quartz runs independently to the website app pool so should not put the web application under greater load while it is running and potentially degrade the performance of the end user website.

A 1000 foot example

I’m not going to go into a massively detailed explanation of how to implement the jobs as there is a already a good example available in the EPiServer Commerce dev guide along with a detailed tutorial on the Quartz .NET site. However a quick overview demonstrates how intuitive it is to implement.

The first thing you need to do is create your job class which will run when Quartz triggers the execution. The class must inherit from the IJob interface which just has a single Execute method to implement. Here is a sample skeleton class:

    public class CommerceOrderExportJob : IStatefulJob
    {
        private readonly ILog _log;

        public CommerceOrderExportJob()
        {
            _log = LogManager.GetLogger(base.GetType());
            //Any other init code
        }

        public void Execute(JobExecutionContext context)
        {
            //Your export methods
        }
    }

You then need to extend the config file to configure Quartz to call your job. I think all the parameters are pretty self explanatory:

<job>
    <job-detail>
      <name>OrderExportJob</name>
      <group>all</group>
      <description>This job exports orders from commerce</description>
      <job-type>Your.Assembly.CommerceOrderExportJob, Your.Assembly</job-type>
      <volatile>false</volatile>
      <durable>true</durable>
      <recover>false</recover>
    </job-detail>
    <trigger>
      <simple>
        <name>OrderExportJobTrigger</name>
        <group>eCommerceFramework</group>
        <description>Fires Order Export Job</description>
        <misfire-instruction>SmartPolicy</misfire-instruction>
        <volatile>false</volatile>
        <job-name>OrderExportJob</job-name>
        <job-group>eCommerceFramework</job-group>
        <repeat-count>RepeatIndefinitely</repeat-count>
        <repeat-interval>300000</repeat-interval>
      </simple>
    </trigger>
  </job>

Hit run on the service and check the event log for any problems. Make sure to include some useful event/error logging in the job otherwise it will just be a mysterious black box! There are a couple of snags you may run into when running the job against EPiServer Commerce:

  • ensure that your compiled assembly is actually available in the Quartz service directory, this is different from the main commerce site bin folder so it is not enough just to include it in the commerce app references
  • Make sure the data context has been initialised and the config file in the Quartz service directory is pointing to your database

Mining for more gems

Ultimately only time will tell if Quartz was the best choice but I feel it has the best chance of high reliability and performance in the long term. I hope that, where appropriate, EPiServer can continue to expand the incorporation of proven open source libraries with their product offerings.

Behaviour Driven Design encourages shared understanding

Most recent projects I have worked on have used unit tests to define the requirements of our code and ensure they are met. Typically the tests are written by the developer while they are coding the solution. However this means that the tests are based entirely on the developers understanding of the requirements, which may not always be the same as the stakeholders understanding.

To combat this issue we are introducing behaviour driven development, to agree on a set of requirements for each feature with the stakeholders before starting development. The requirements are captured using the Gherkin syntax (Given, When, Then) based on real world examples. These examples can then be wired up to the system directly as automated tests using Specflow.

OK, show me the example already

Here is a fictionalised scenario which demonstrates the value that BDD adds to a project. An abbreviated version of the initial scenarios drafted by the developer are as follows:

Feature: Process 3D secure authentication response
    As a logged in user
    I would like to place a 3D secure authenticated order
    so that I have assurance of the websites security

Scenario: Handle a 3D secure response of AUTHORISED
    Given a 3D secure response of AUTHORISED
    When i view the order confirmation page
    Then the order status will be Complete

Scenario: Handle a 3D secure response of NOT AUTHORISED
    Given a 3D secure response of NOT AUTHORISED
    When i view the order confirmation page
    Then the order status will be Cancelled

Scenario: Handle a 3D response of NOT PROCESSED
    Given a 3D secure response of NOT PROCESSED
    When i view the order confirmation page
    Then the order status will be Cancelled

Evolving the scenario

The assumptions shown in these scenarios are perfectly reasonable, an order will only be completed if the 3D secure security check returns an authorised response. However it should really be a decision for the stakeholder’s business whether the system should behave in this way. There are several valid scenarios where all 3D responses should actually result in a completed order, for example:

Scenario: Handle a 3D secure response of NOT AUTHORISED
Given a 3D secure response of NOT AUTHORISED
And a total order value order value under the following amount: £25
When i view the order confirmation page
Then the order status will be Complete

The stakeholders calculated that the risk of charge-backs was so low on orders under £25 that they could disregard the 3D secure response completely and allow ‘not authorised’ transactions to be completed.

Here we can see a clear benefit in the stakeholder sharing their understanding of the system requirements with the implementation team. Rules which are beneficial to their business can be incorporated into the system, resulting in a more effective solution from the outset.

If you are interested in reading more about using BDD for web applications I can recommend Steven Sanderson’s excellent blog post.

Discovering the original inventors of the internet

I’m sure many of you have heard about a chap called Tim Berners Lee who is widely credited for inventing the world wide web as we know it today. The story is well known and I’m not going to re-hash it here, however less is known about who may have come up with the first concept of the Internet. You may be surprised to hear that the origins of the concept date back several centuries and take place on an entirely different continent, among a race of people commonly considered quite primitive and barbaric. Here’s a little hint of who these people may have been.

Machu Picchu

The internet laid bare

You may well ask how the internet could exist centuries before computers had been invented or even the discovery of electricity. Naturally there was no way the internet as we know it today could be possible, however if we look at the abstract concept there are strong parallels. At it’s heart the internet is a communications system made up of a group of connected devices. The power of the internet lies in its ability to efficiently send messages through the network from one node to another. When you open your web browser and type in http://www.google.co.uk your computer sends a message to the web server hosting google.co.uk asking for the web page. However this message is not sent directly from your computer to the google web server, this would be very inefficient as each person that wanted to view the google web page would first need to establish a connection to their web server. As more people tried to connect to the same page the response would get slower and slower until the number of connections reached its limit.

Routing to the rescue

The internet has an elegant solution to this problem. Instead of attempting to establish a direct connection between the two end points, each of the nodes in the network just connects to their nearest nodes. So the end points are indirectly connected through the network in a similar way to this lovely node diagram:

Network nodes

Network nodes

Each message is sent with the address of its final destination, each node only needs to know what direction to send the message in order to get it closer to its destination. In this way the message is naturally routed on the most efficient path from its source to the destination. It is this concept if a nodal network and message routing that has strong parallels with a very old method of communications.

An ancient Inca network

I recently took a trip to Peru to hike the Inca trail to Machu Picchu. So this post is a (hopefully interesting) way to shoehorn some of my holiday snaps onto this blog!

Machu Picchu, Peru

However quite apart from that I learnt a great deal about the fascinating Inca culture. Before the trip I had thought that there was essentially one Inca trail in Peru leading to Machu Picchu, how wrong I was! The Inca trails actually comprised of thousands of miles of pathways connected across the whole of the Inca empire. They were the key to enabling the Inca rulers establish and run the largest single empire in South America.

One of the main uses of the trails was as a communications network. There were small guard posts dotted across the network, often high up in remote mountain areas. If the Inca ruler wanted to send a message they would encode it using a ‘quipus‘, a device made up of multi-coloured plied rope, threads and knots.   The exact method of encoding has unfortunately been lost in time but it may well of been a binary system quite similar to one used by computers today. It would have been quite possible to encode fairly sophisticated messages in this way.

An Inca Quipus

The ruler would then pass the quipus to a messenger to deliver the message. However the messenger would only take the message to the nearest checkpoint, then pass it onto another messenger who would carry the message to the next checkpoint. Each checkpoint would be around 5-10 KM apart, more distance for flat sections and less for mountainous terrain. In this way each of the messengers could run between the checkpoints as fast as possible and the message itself would reach its destination far quicker than would otherwise be possible.

To give you some indication of how efficient this system was during Inca times the capital city was Cuzco, situated at around 3,300 metres in the middle of the Andes mountains, several hundred miles from the coast. However the Inca king was able to eat fresh fish in the capital because it was sent over the Inca trail network. It is estimated that the same delivery made by road transport today takes longer than an equivalent delivery made over the Inca network in the 15th century!

So now all the pieces of the puzzle are in place we can see that the Inca trail network used the some communication method as today’s internet. The trail checkpoints are equivalent to an internet router, the messengers have now been replaced with fibre optic cables and of course the quipus is directly equivalent to an IP network packet. So there you have it, the Inca empire invented the alpha version of the internet.

The beauty of cherry pick

I’m loving the cherry pick command in git right now, this is the scenario that it really helps me out for.

I’ve fixed a bug on the live branch, however before it can be published to the live website it needs to be tested. So I can first publish it to our UAT (user acceptance testing) site so the client can test the changes. However the UAT site already has a bunch of other unreleased features on it. If I merge my bug fix branch into the UAT branch it will overwrite some of the other feature changes.

The solution here is to ‘cherry pick’ just my changes from my bug fix branch and commit those to the UAT branch. In this way git doesn’t merge all the other differences between the live and UAT branch.

Nhibernate generator class for SQLite databases

I’ve been developing a mobile app for Android which needs to query data in an SQLite database. So the first task I needed to get done was to import data into the SQLite database. As I’m more used to writing .NET applications which use the NHibernate ORM I decided to write an import program on this technology stack.

For my first attempt at the import program I just wanted to focus on the import logic, so choose to use a ‘known working’ NHibernate configuration with a MS SQL 2005 database. While this worked fine for me it did mean that I had to manually copy all of the data from the MS SQL database into the SQLite database before it could be used in the Android app. This was quite laborious so when I needed to modify the importer at a later date I decided to change the NHibernate configuration to point directly to the SQLite database instead. This was the updated config file used:

<hibernate-configuration  xmlns="urn:nhibernate-configuration-2.2" >
    <session-factory>
        <property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
        <property name="connection.driver_class">NHibernate.Driver.SQLiteDriver</property>
        <property name="connection.connection_string_name">FootyLinks</property>
        <property name="dialect">NHibernate.Dialect.SQLiteDialect</property>
        <property name="query.substitutions">true=1;false=0</property>
    </session-factory>
</hibernate-configuration>

And this is an example of the connection string format:

<configuration>
    <connectionStrings>
        <add name="FootyLinks" connectionString="Data Source=C:\path_to_the_SQLite_database_file;Version=3"/>
    </connectionStrings>
</configuration>

This configuration worked OK and the import program could connect to the SQLite database, however I hit a problem when trying to save new records to the database. This was the inner exception reported by NHibernate:

Abort due to constraint violation Club.Id may not be NULL

The problem was that the import program was using the default generator class of identity for in the NHibernate entity mappings. So NHibernate was expecting the database to generate the identity keys for the new records being saved to the database, however SQLite does not appear to support this. After some reading up on the options available for the NHibernate generator setting I decided to try the native setting. As this should select the implementation supported by the database. However this didn’t work either, I think because, of all the options native can select, none are actually supported by SQLite. So my next preferred selection was to use increment, this setting comes with a disclaimer that it should not be used in a cluster, which is absolutely fine as its very unlikely that I will ever be deploy the importer to a load balanced environment.

The increment setting worked fine with the SQLite database and should streamline my workflow nicely for updating the app data .

FootyLinks roadmap

  1. Make lowest difficulty level easier (squad numbers?, exclude older players?)
  2. Correct answer page
  3. Display score in game
  4. Increment score on correct answer
  5. Fix occasional crashing on view changes (db init to game startup event?)
  6. Only allow user to select roles available at their level
  7. Release one
  8. Level up on score thresholds
  9. Change game rules depending on difficulty level
  10. Post scores to twitter
  11. Update data with latest transfers
  12. Release two