I had a really interesting conversation on facebook today with a friend I knew from school. We are both professional developers now, and we occasionally exchange ideas on software development. I feel that this particular conversation was worth documenting for posterity because of the natural progression of ideas. I think that the overall conversation simplifies a lot of very complicated ideas into a narrative structure.
Edgar: What’s your opinion on this hilarious discussion?
Me: I’ve actually seen this. We use both MongoDB and MySQL at work. There is a religious war between different teams, here.
Edgar: I think MySQL is better in situations where you need reliability above all else.
Me: You can get reliability with MongoDB. You just don’t have transactions across documents. Databases have transactions across tables. MongoDB only supports transactions for a single document. That issue is important when you design your application.
Me: The problem is that people want to use their pet technologies with MySQL or MongoDB. If their pet technologies doesn’t support one or the other then from their view that technology is automatically bad.
Edgar: You sound biased. hehe
Edgar: Wouldn’t it be beneficial to simply keep everything in one massive tree?
Me: No. To say you must always use a database means you must always use a b-tree because internally that is what a database engine uses to persist information. Imagine if someone told you that you are only ever allowed to store information in a b-tree in your application code. How painful would that layer be?
Me: Data structures and algorithms are interconnected. That means that the way you access information and the way that you store information are interconnected.
Me: The reason we used databases is because we used to store information on rotating cylinders. This made our secondary storage very slow because you had to worry about seek time and rotational latency in addition to access time. Therefore, we wanted to find ways to structure our data in order to minimize the seek time and rotational latency. Therefore, relational algebra was born.
Me: On top of that we defined mathematical functions to guarantee data integrity using relational algebra. That is where 1st, 2nd, 3rd, 4th, and 5th normal forms come from.
Me: Depending on your data, you might want to have a certain type of data integrity; so, you choose a particular normal form to guarantee that level of data integrity.
Edgar: I see
Me: Now we have solid state drives. Those have no cylinders and therefore no more seek time or rotational latency; so, the motivating factor behind databases no longer exists. We are now allowed to think of new ways of doing data persistence.
Edgar: That makes sense.
Me: Sometimes a database is the best tool for a job; sometimes it isn’t. So, going back to my original point: when I see people fight over MySQL vs MongoDB, I think that they are stupid. What matters is your data and application. That is what determines how you persist your information.
Edgar: At that point it is just a convenience thing for the developer I guess, and a few features missing here or there.
Me: No. It should not be a convenience thing that determines the choice of technologies. You are likely to make very bad decisions if you think that way. Putting convenience (I am assuming you are talking about application code) over things like scalability and data integrity will increase you technical debt.
Me: One of my biggest complaints about some of our engineers at work is their love of ORM tools. They love the convenience of automatic object persistence, but it comes at a very high price. Whenever I see a new project starting with an ORM, I always tell the development team to write apology letters to their future selves.
Edgar: haha. Really? Why do you dislike ORMs? (Entity Framework is pretty awesome)
Me: Actually, Entity Framework is one of the ORMs that get’s it right. Microsoft did a good job at learning from the mistakes of other implementations.
Edgar: How is it better than say Hibernate?
Me: The Entity Framework fluent API gives the developer the flexibility they need to do proper mappings.
Edgar: So your problem is with the annotations and its lack of control
Me: Not necessarily. Hibernate definitely went overboard with the overuse of annotations, but it is possible to work around it.
Edgar: I see.
Me: The problem is when you use hibernate to automatically persist object from your domain layer. If you don’t create an indirection layer in between your domain layer and your persistence layer then you have essentially married yourself to hibernate. The Entity Framework makes it easy to create that indirection layer.
Edgar: So you are saying that having the need to have certain attributes and classes to make the persistence layer work even though you probably wouldn’t need it in your actual domain model. Am I understanding you correctly?
Me: Pretty much. Although, that indirection layer may not always be necessary. It depends on your application. For example, Ruby on Rails uses the active record pattern which essentially mixes the domain model with the persistence model. That is fine for a lot of applications.
Me: The problem is when you need to have a complicated domain model, and at the same time have a proper database schema. A complicated domain model uses the notions of polymorphism, while a database schema uses notions of normal forms. That’s called the impedence mismatch.
Me: When you combine a complicated domain model with a persistence model, you have one set of classes where you need to do polymorphism and normalization at the same time. That’s not possible. That is why you need the indirection layer.
Me: One layer does the polymorphism you need to power the domain model; One layer does the data persistence using normal forms; One layer does the mapping between the two classes.
Edgar: Well, I’m not sure how other frameworks handle it, but the Entity Framework out-of-the-box just puts all classes and subclasses in one big table.
Me: Out-of-the-box, yes; but, it is configurable to perform the data mappings, well. That is where the Fluent API comes in. In Java, I have to manually create a set of classes to handle the impedence mismatch. .NET provides a framework for you to handle that, easily.
Edgar: Oh, Right.
Me: Going back to my original point of mongoDB vs MySQL. If you use mongoDB as a persistence engine then the data mapping becomes dead simple. If you use MySQL then the data mapping gets really hard. You could spend more time performing the data mapping than the actual application code. Therefore, you should make damn sure that you really need to use a SQL database.
Edgar: Unless of course you use an ORM, and accept the fact that it will be sub-optimal
Me: Well, using an ORM is not necessarily sub-optimal. My problem is that people just assume that they need to use an ORM without actually thinking about why they need to use an ORM. An ORM is just a tool.
Edgar: Right. To quickly persist data.
Me: Yes. Sometimes you really just want to persist information quickly, and that’s fine for simple apps.
Me: If all I wanted to do was power a data entry system then you can bet that I’m going to be using an ORM where my domain model and persistence model are the same thing. I would have that thing done in less than a week. However, if I wanted to power a social network then I would never use an ORM; or, if I were to use an ORM I would put it off to the side somewhere (away from my domain model).
Me: A social network would likely have a lot of conditional logic. I would want to use polymorphism to simplify my app development. However, that means that I would have to spend time performing the mappings to my persistence layer.
Edgar: Like using a lot of factories and things of that nature.
Me: Yup. Making persistence easy is not as important as getting my business logic right. If you do not have a lot of business logic then by all means use an ORM to power your domain.