Labstory - Lab documentation and database

Theory

The basics

Guides

Document sections

For reference

LIMS - Laboratory Information Managements Systems

Documents are good for keeping track of day to day tasks in the lab. Their free structure allows any activity to be properly documented. But if you keep doing the same kind of task over and over, you might want to be able to properly organize the data such that it easily can be summarized. To do so you need to enforce a higher level of structure on your data, taking the first step toward setting up a laboratory information management system (LIMS). Labstory has all the facilities needed to set up both very small and very large structured databases, linked to ELN records.

Database concepts

To simplify life, disregard that you are working with a document-oriented ELN system. Instead, think that want to set up a set of cross-linked tables. Two simple tables are given below:

Strains
Strain name	Species	Description
ce-1	C.elegans	Worms. These are green
ce-2	C.elegans	Worms. These are red

Freezer
Freezer rack	Tube position	Frozen by	Strain name
5	A7	Johan	ce-1
3	J3	Niklas	ce-1

The first table shows which strains have been worked on in the lab. The second table shows what exists in the freezer. The first concept to introduce is that of relations and keys. Shortly, the two tables can be described as:

Strains:
   Strain name → Species, Description

Freezer:
   Freezer rack, Tube position → Frozen by, Strain name
   Strain name REFERENCES Strains(Strain Name)

In the terminology we will be using here, columns are also called Attributes. The other concepts you need to know are Relations, Primary keys and Foreign keys.

The rows above including the attributes are called Relations, the first key concept. The importance of the relation is that it explains what information can be derived from the information you have. For example, the first relation tells you that if you know the Strain name, you can look up which species it is, and any further description of it. The second relation shows that you need to know both the Freezer rack and Tube position to tell what is located there – which strain and who froze it. The information on the left side of an arrow, what is needed to derive all information, is called the primary key of the relation.

The second table has an attribute that references the first table. This is called a Foreign key. What this says is that by knowing the freezer position, it is also possible to derive all the information in the first table by combining them. That is, when combining the relations, you get:

Freezer rack, Tube position → Frozen by, Strain name, Species, Description

To make a relationship truly useful there must be constraints on the content. The keys now encountered are the first and most important constraint. In a table there cannot be two rows having the same value of the key. For the first table it means that there cannot be two strains with the same name. For the second table it means that there cannot be two tubes at the same location. The foreign key imposes constraints between two table as the value of foreign key has to match any value in the referenced table. Thus in the example it is not possible to freeze a strain that does not exist in the first table.

If you have followed thus far then you know the fundamentals of a modern relational database!

Attributes: Types and constraints

Another fundamental type of constraint is what values a certain attribute can have. Is it a text, or is it a number? If it is a text, can it be on multiple lines? For numbers, maybe they cannot be negative. All of this is encoded in the “Type” of an attribute. The following types are supported:

Texts
Integer numbers
Decimal numbers
Boolean values (True/False)
Time points
Files (strictly speaking, references to files **)

In addition, for each type it is possible to tell if it can be NULL. Such a record means that the information is missing (N/A). By default this is not allowed, but sometimes it is useful. An attribute is referenced by the short name. This name will primarily be used by programmers accessing the data, using for example R, who benefit from less typing. For all other users, the long name will be displayed. Since the long name is only used for display you may modify it at any time without affecting the stored data. The short name, on the other hand, should preferably never be changed. So pick it wisely. In addition to the names, there is a description of the field. It is only used to present users with a help text when they are entering the data, and has no other function.

** To be able to retrieve the data of a file, you must have access to the original ELN record inserting the file

Designing a new database

Creating a new database is simple following these steps:

Start by writing down all the attributes that will be involved
Figure out the relationships. Break them into pieces such that there is no redundancy. In the example, the two tables could have been written as one, but then for example the description field would have to be duplicated every time a strain is frozen. This is really bad design. If you wish, you can look up third normal form/3NF, which is preferred in databases.
Create the tables. See if the design makes sense

Modifying an existing database

There will be a "duh" moment when you realize that you forgot something in the design of your database. This need not be a problem – if you for example just add a new attribute, a new foreign key, a new constraint, change a default value of an attribute etc, chance is Labstory will just cope with this. Old inserted entries will have a null value of missing attributes. If null is not allowed according to your design, the default value will be used instead. If instead an attribute is dropped, the attribute will just be ignored in old entries.

Relational databases vs ELN - How it works in Labstory

The tricky part comes when the relational database viewpoint meets the ELN viewpoint. In Labstory it works the following way: A table is created in a document. The document ID and position within the document is used as the name for the table. This guarantees that the table ID is world-unique (or as unique as the document ID at least). Insertions to a table, or deletions, are all done in ELN documents through special objects. Labstory then takes all documents, finds the declarations of tables, and all modifications, and assemble them into a final populated table. This table can then be viewed separately from the documents. Further, the assembled table can be accessed from external programs, and various statistics can be collected using SQL expressions. Thus the data moves from an ELN point of view to a relational point of view, after processing of the software.

SQL database integration

Labstory can use a database in two different ways:

An existing external table can be viewed and cross-referenced after it is first imported as a special Labstory table. Labstory will never modify it
Labstory tables are assembled in an SQL database. By pointing Labstory to use a database of your choice, it is possible to cross-reference Labstory tables externally. But they cannot be modified, as Labstory may (or will) overwrite changes at any moment.

Labstory can interact with any JDBC-compliant database, that is, almost all common database implementations.