Brosteins

Developers, Technology Evangelists, Bros.

Generating Test Data in Entity Framework

I will be the first to admit that I have a love hate relationship with Entity Framework.  One of the things that I do love about Entity Framework is that it provides a seed method that it can automatically call for you after it runs data migrations.  When implemented, this method can be used to create test data in Entity Framework.  In many of the solutions I work on that use Entity Framework I use the seed method to create test data for my application.

Generating Test Data in Entity Framework

A simple way for generation test data in Entity Framework is to setup a static class that returns some objects, add these to the appropriate set in your context during the seed method, and saving the records.

Here’s is what this scenario might look like in code:

public static class ProductFactory
    {
        public static Product[] All()
        {
            return new Product[]
            {
                new Product() {Name = "Pipe Bender" },
                new Product() {Name = "Blender" },
                new Product() {Name = "Game Console" },
                new Product() {Name = "Frying Pan" },
                new Product() {Name = "TV" },
                new Product() {Name = "Ice Maker" },
                new Product() {Name = "Bookcase" }
            };
        }
    }

And the corresponding seed method may look like this:

context.AddRange(ProductFactory.All());
context.SaveChanges();

Pretty simple right?  Yup…but…(there’s always a but isn’t there!)  As our application continues to grow and we add more data models we begin to create lots of dependencies between our data because the over all complexity of our model is growing.  Pretty soon our seed method starts to look like this:

context.AddRange(ProductFactory.All())
context.AddRange(OrderFactory.All())
context.AddRange(SomeFactory.All())
context.AddRange(AnotherFactory.All())
context.AddRange(ShouldThisBeCalledFirst.All())
context.AddRange(ThisDependsOnALotOfThings.All())
context.AddRange(AmIDoingThisRight.All())

Ok maybe I got a little carried away there but hopefully you get the point.  Every time we add a model we need to make sure it is placed in the correct order.  What if we update an old model?  We may really have to reorganize things!  Let’s take a look at something I came up with for generating test data in entity framework that is simple and much more maintainable in the long run.

The Easy Way

For this example I setup a simple data model of Companies, Products, Orders, and Users.  I have also created something called ITestDataFactory<>.  This interface will force all of our test data factories to implement an All method.

public interface ITestDataFactory<T> where T : IEntity
    {
        T[] All();
    }

public class OrderTestDataFactory : ITestDataFactory<Order>
    {
        private Order[] _orders;
        private readonly DbContext _context;

        public OrderTestDataFactory(DbContext context)
        {
            _context = context;
        }

        public Order[] All()
        {
            return _orders ?? (_orders = Generate());
        }

        private Order[] Generate()
        {
            var orders = new List<Order>();
            _context.Set<Company>().ForEach(c => orders.Add(new Order() {Company = c, Number = $"{c.Name}-{new Guid().ToString()}", Products = _context.Set<Product>().ToList()}));

            return orders.ToArray();
        }
    }

The important thing to note here is that I am able to ensure that Companies and Product are both created and stored in my database before my Order factory is called.  This is done by my TestDataGenerator class which has one public method called Generate that will automatically call the the All method from each factory in the correct order and store the data.  I’ve even included c# 6 string interpolation in my example.  If you haven’t had a chance to look at it yet head over to Mike’s post on it to learn more: https://brosteins.com/2015/08/11/string-interpolation-in-c-6/

public void Generate()
        {
            var factories = Assembly.GetAssembly(typeof (ITestDataFactory<>)).GetTypes().Where(t => !t.IsInterface && t.GetInterfaces().Any(i => i.Name == typeof (ITestDataFactory<>).Name));
            var maxDepthList = _entityList.GroupBy(e => e.Item1).Select(g => new {Type = g.Key, Depth = g.Max(m => m.Item2)}).OrderByDescending(o => o.Depth).Select(i => i.Type);

            maxDepthList.ForEach(t =>
            {
                if (factories.Any(f => f.GetInterfaces()[0].GetGenericArguments()[0] == t) && _context.Set(t).ToListAsync(CancellationToken.None).Result.Count == 0)
                {
                    var factory = factories.FirstOrDefault(f => f.GetInterfaces()[0].GetGenericArguments()[0] == t);
                    var factoryObject = factory.GetConstructor(new Type[] {typeof (DbContext)}).Invoke(new object[] {_context});
                    var method = factory.GetMethod("All");
                    var result = method.Invoke(factoryObject, null);

                    _context.Set(t).AddRange((object[]) result);
                    _context.SaveChanges();
                }
            });
        }

And here is how we use it:

protected override void Seed(WebDbContext context)
        {
            var generator = new TestDataGenerator(new WebDbContext());

            generator.Generate();
        }

What is it Doing?

When we instantiate the TestDataGenerator it is scanning our assembly for anything that implements IEntity (this could be something different in your project).  Once all of our models are found then a dependency tree of sorts is built.  When the generate method is called we again scan our assembly for anything that implements ITestDataFactory<> (again this can be something different in your project) and we proceed to call these in the appropriate order (based on the dependencies found previously).

That’s it – nothing else too it.  If you’d like to take a look at the full source code it is over on github. Go ahead and try adding some additional models and factories yourself!
Share

6 comments for “Generating Test Data in Entity Framework

  1. Avatar
    January 18, 2016 at 8:27 pm

    Hi Brosteins – I was reading your site, and thought you might also like to know – I also blog with my brother – Dave – on our blog site about technology and interesting things. I’m a .NET developer, and he works in the games industry – we’re both in the UK.

    Dave’s an artist, and only really got into coding in the last few years. He writes code now for a UK based games company and has written some .NET stuff before (interfacing to 3DS Max). I’ve been writing code since 1982 but only ‘commercially’ since 2000 or so – our latest project was using continuous integration (jenkins) to build an animated film – the idea is that you take a large film, split it into scenes. Each animator checks in their scene. The CI system integrates their video clip into the whole film.

    Anyway – ramble over : I was wondering :

    1. Would Autofixture[https://github.com/AutoFixture] (or something similar) be useful for creating test data (possibly at volume) within Entity Framework? It potentially allows you to spin up thousands of records and might be useful for load testing, performance monitoring etc.

    2. Excluding Test data from Production – If I change my model, and I create a load of test data to confirm that the performance is good – how do I deploy that migration to production without pushing a load of test data to the DB? I’m thinking some form of debug flag around the seed method – or is there a better way?

    Thanks

    • Avatar
      Mike Branstein
      January 19, 2016 at 8:14 am

      Thanks, Mike. I’ll let Nick respond. I’m sure he’ll have some good ideas.

    • Nick Branstein
      Nick Branstein
      January 19, 2016 at 9:03 am

      Mike – thanks for reading our blog. Sounds like you and your brother have done some really awesome stuff!

      1. In my brief 10 minutes in looking at Autofixture it looks like it can definitely be useful for creating test data. Part of our job as developers is picking the right tool for the job so my favorite consultant answer to your question is really it depends. There are a lot of other tools and nuget packages that will do this sort of thing as well. If you’re looking for load testing and performance monitoring I would definitely look at using a tool such as Autofixture. The thing that a lot of these tools don’t necessarily give you is contextual data that would make sense to an end user of the system. i.e. meaningful first names, address, etc. If you’re looking to setup some simple test data that a QA team could use while testing then you could go about it in a similar fashion to how I outlined in this post.

      2. That is the same way I have done it as well. At the end of day you could create some over abstracted way of handling this or litter your web.config with additional configuration variables etc. but isn’t it just easier and more simple to wrap it in a preprocessor directive? I like to keep things simple and I think that helps to keep things simple.

  2. Avatar
    June 3, 2016 at 5:45 am

    Hi Nick,

    If I try implementing this with project that has little complex entities. It throws this error

    An unhandled exception of type ‘System.StackOverflowException’ occurred in mscorlib.dll

    at line –

    if (!type.GetProperties()
    .Any(p =>
    typeof(IEntity).IsAssignableFrom(p.PropertyType) ||
    (p.PropertyType.IsGenericType &&
    (typeof(IEntity).IsAssignableFrom(p.PropertyType.GetGenericArguments()[0]))) &&
    _entityList.Select(t => t.Item1).All(pType => pType != type)))

    Not sure what’s wrong..

  3. Avatar
    Keith
    July 16, 2016 at 8:48 pm

    What is _entityList? I don’t think this code can be implemented without knowing what that is.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.