Monday, March 26, 2007

Entity Naming Conventions

It seems as though as software developers mature they develop consistency in their approach to just about every aspect of their work, regardless if there is a good reason for adopting a particular practice or not.

For instance, in data modeling I developed the habit of always naming my tables in the plural – Employees instead of Employee, and such. There’s no reason for this convention, other than perhaps I copied what I saw from the Northwind database.

But it’s important to question these practices from time to time, and after over seven years of doing things the same way I have decided to make a change. And for the second time now (see my post The Importance of a Logical Data Model), it was a colleague: Steve Dempsey who initiated the change. So why would one opt for singular names over plural ones?

Developers might chose singular names because they are shorter and require less typing, but this argument never held for me because of tools like intellisense and code generation (not to mention touch typing). But Steve is extremely adamant about singular names for a different reason: because of relationship readability.

For instance, in Sharepoint, workflows relate to events. Specifically, a workflow (singular) is initiated by one and only one event, and an event (singular) can initiate multiple workflows, as is expressed below:



The objective of modeling is thus to express the relationship of a single entity (a workflow, an event, or whatever) to zero or one or many of another entity. So why not just name your entities appropriately in the first place: by making them singular?

Of course now the problem is getting an old dog to remember his new trick. Or is it tricks?

9 comments:

Anonymous said...

works for plural too: workflows are initiated by events. I really would like the 'final' answer on this question though...

Lee Richardson said...

While the statement "workflows are initiated by events" is both true and correctly expresses the fact that there is a relationship between the entities, it hides the true nature of the relationship.

For instance regardless if the true nature of the relationship is:

1. A workflow can be initiated by multiple events; or
2. A workflow is initiated by one and only event

The statement "workflows are initiated by events" still holds. In other words it hides the important part of what you are trying to express by data modeling.

Lee Richardson said...

By the way, there's a nice discussion on this topic on my double posted company blog at:

http://www.nearinfinity.com/blogs/page/
lrichard?entry=entity_naming_conventions

alex said...

About this singular or plural question:
I always take into account that a table in a database is not a single entity. It withholds more then one entity. A table is an entity collection. This is way I give tables always a plural name. The SQL syntax is much clearer and logical when a table has a plural name.
i.e.
1) Select Name FROM Employees WHERE Id IN (1,2); here you are selecting employees from an employees data repository; Ok, clear syntax.
2) Select Name FROM Employee WHERE Id IN (1,2); here you are selecting employees from an employee; Not a nice syntax.

What I always say is that: ’a table is not an entity but withholds multiple entities and has therefore a plural name’.

In my ERD diagrams I always use the singular names for entities also because of relationship readability. I always use ERD’s for modeling data in a conceptual level.
When I use an ERD to describe the logical implementation of it in an OO world I call it an ORD (Object Relationship diagram) or class diagram. When I use an ERD to describe the logical schema in a database I call it a DRD (Database Relationship Diagram).

What we also have to consider is that an ERD is not a DRD but a DRD is an ERD. A DRD is a logical implementation of an ERD on a relational database in which the entities are the records and in which you are showing the relationships between the records that reside in tables, with the name of the tables correctly in plural form.

In Dutch we have a saying: ‘Do not compare apples to pears’.

Try to remember this trick now, old dog! :-)

J Bryan Price said...

I, too, have waffled on this question. Presently I use singular names for my SQL tables, yet as mentioned, my SELECT statements seem contrived.

I usually un-contrive them by using abbreviated aliases:

SELECT Title FROM tblPart P WHERE...

I chose the singular form for another reason: I look at a table as a class of identically-structured entities. My records are instances of these classes.

The table is a plural entity - a fact you run into when you think of the table itself, as you do when "thinking" in SQL. My Hungarian tbl prefix eases the linguistic flinch I feel when I fail to find the 's' at the end of of my table name: I translate it to "[FROM] the Part Table"; "Table" becomes my pluralizing suffix.

Perhaps the key here is to realize that, in general, we're not using SQL exclusively in most projects. Within SQL, think of tables, and pluralize them. Outside, in whatever data-receiving language you may use, when manipulating a record as an entity, call the record instance by its singular.

I think I've convinced myself that I need to rename everything in my database. I gotta go...

Lee Richardson said...

Alex,

I really like your differentiation in naming convention between logical and physical data models. Doing that would address all of the points raised in my post. I also agree that SQL readability is clearer with them in the plural. Hmmm, I'll have to think about this some more.

J Bryan Price,

While I can appreciate recognizing tables because of prefix and then ignoring odd naming convention, I have to ultametely disagree with the hungarian notation in table names. Not only does it add unnecessary length to tables, but it seems inconsistent with you seeing a table as a class. You don't name your classes with hungarian notation do you? Just instances right? So why name tables with it?

bsrd said...

Agree with Alex,

I think, since OO paradigm has a goal to unify terminology since requirement process until development process.

Contradiction comes when designing an ERD (that usually using plural naming convention - since one entity actually stores a set of entity), but a Class Diagram always using singular naming convention.

In order to achieve OO goal, then the easiest way is to change the plural into singular (in ERD/table naming convention).

If we try to look at a sample of a class, let saya class called Employee, what will we name a collection variable of a set of Employee? if not Employees, it must be EmployeeList, so where's the different with table naming convention? if we want to mapp this sample into a table, of course we will give the table name as Employees (usually we're not using EmployeeList :) ), and one record of that table, that exactly one Employee.

Conclussion: I think, it would be OK if we use plural name in database level (table), but don't use it to a class that represented an entity of data. Anyway, I got this perseption from an article I got from goole, I forgot the url, maybe you can search by your self in google.

Blinkerfish said...

It's easier to think and model Entity Classes in the singular form. In diagramming exercises, we tend to draw the Entity Class, "student" even though the table itself stores many instances of the"student" entity class.

i tend to like the singular because it makes the translation of the paper model to the database a little more straightforward. It's not a great reason, because you can argue that the physical representation of the data-model can be considered as a separate exercise.

Mark Brady said...

@Alex

I have a much bigger issue with your use of ID as a column name.

if you used a much more useful name for that column you'd see why the plural for tables are less than optimal. Employees_ID is rather non-sensical. Employee_ID makes complete sense.

Besides, it seems silly to optimize code so that it sounds pleasent to the ear. I write to optimize maintainability. Having a column which means the same thing, change its name from table to table is less maintainable. "So I named it ID in this table cause I like the sound of FROM employees better than FROM employee but in THAT table I had to change the name to Employee_ID because there's already an ID field in that table." Really? The table, its PK and all FKs have no simple pattern. All for a little aural appeal for one query... puh-leaze