Back

Barcodes: ubiquitous, straightforward message encoding

Barcodes have been around for a while. They were patented in 1952, under what is one of the broadest patents I have ever read. Despite being a huge part of our daily lives, I realized I didn't understand how they work.

So, here's a deep technical dive into the mechanism that drives everything from shipping containers to airplane boarding passes.

Types and examples of barcodes

When I think of barcodes, I imagine the black and white code behind the groceries we buy. But that corresponds to just one category of barcodes. Here's a list of barcodes types and some of their use cases:

  • 1D Barcodes (a.k.a Linear)
    • UPC-A: Universal Product Code used in retail/groceries
    • Code 39: Industrial/military use
    • Interleaved 2 of 5: Shipping/warehouses
  • 2D Matrix
    • QR Code: you've seen this one.
    • Data Matrix: Small items marking, electronics, healthcare instruments
    • Aztec Code: Travel documents, airline/train tickets
  • Stacked Barcodes
    • PDF417: ID cards, shipping labels
    • Code 49: Compact alternative when linear codes are too long
  • Composite Barcodes
    • RSS Composite: Retail supply chain, pharmaceutical products
    • EAN.UCC Composite: Links linear retail codes with additional data

Technical Components

In the following sections, I'll elaborate the technical components of barcodes through the example of Universal Product Codes (UPC), which is a simple 1D barcode you will see on the back of most things you purchase at grocery stores.

The mapping between messages and barcodes is known as a symbology. A symbology specifies the technical components of a barcode.

Character Set

Character sets in barcodes define the kind of data that can be encoded (numeric only, alphanumeric, ASCII, etc.).

In addition to the typical character sets like the ones above, specialized barcodes can encode custom character sets. For instance, LOGMARS codes (used in military / government) and HIBC codes (used in healthcare) both use only uppercase letters for improved readability in their respective applications.

The character set used in UPCs is numeric only. In other words, something like "052302" can be encoded as a UPC, but something like "Banana" cannot.

Encoding Schemes

The high level idea of an encoding scheme is to convert data from one format to another. In the context of barcodes, the idea is to convert characters into patterns of bars and spaces. There are quite a few types of encoding schemes. Binary encoding is a class of encoding schemes where the end product is binary data, which is what we'll discuss in this section.

Encoding Scheme in UPCs: The 7-module system

UPCs use what's known as a 7-module system. Note that the 7-module system is just one way to use binary encodings, and other such systems exist.

Think of a module as the smallest unit of space in the barcode — each one can either be a black bar or light space. As the name implies, a 7-module system is a barcode where each digit is represented using 7 modules.

Each digit is encoded using 2 bars and 2 spaces, and their total width must add up to 7 modules. The pattern for a digit can be written in the form L1-B1-L2-B2 where:

  • L1/L2 = width of light space (1-4 modules)
  • B1/B2 = width of black bar (1-4 modules)

Following this, 0 is represented as 3-2-1-1, 1 is represented as 2-2-2-1 and so on. The full list can be found in the appendix.

Structure Requirements

Each symbology has its own set of structure requirements. These can come in a few forms, but can be broadly categorized as:

  1. Boundary patterns: Patterns that help recognize start points, end points and orientation for barcodes.
  2. Quiet zones: Mandatory blank spaces around the barcode symbol that prevent interference from surrounding elements or graphics.
  3. Internal Structure Elements: Built-in reference points and patterns that help scanners maintain orientation and module alignment during reading.
  4. Module Organization: The fundamental arrangement of data elements that defines physical layout.

Structure of a UPC: Left / Right encoding in UPCs

The items in pink are boundary patterns for UPCs. All UPCs have a few shared boundary patterns:

  1. Start Pattern (Left Guard): Three bars in the pattern: bar-space-bar. Always encoded as "101".
  2. Middle Pattern (Center Guard): Five bars in the pattern: space-bar-space-bar-space. Always encoded as "01010". The middle pattern divides the barcode into left and right halves.
  3. End Pattern (Right Guard): Three bars in the pattern: bar-space-bar. Always encoded as "101".

The Quiet Zone consists of blank spaces that appear before and after the code which MUST be:

  1. At least 9 times the width of the narrowest bar on the left side
  2. At least 7 times the width of the narrowest bar on the right side

And lastly, you'll notice that UPCs have fixed-width bars, which speaks to the module organization.

Data Characteristics

When designing a barcode system, two key characteristics determine how much information we can pack into a symbol: capacity and density.

Capacity

A barcode's capacity refers to the maximum amount of data it can hold. For example, a UPC barcode we discussed earlier can only contain 12 numeric digits. While limited, this is great for the retail use case.

As you might imagine, modern applications often need more storage space. For these purposes, a QR code, for instance, can store up to 4,296 alphanumeric characters – which is much less limiting than the 12 digits a UPC can store.

Density

Data density addresses the question: how efficiently can we pack information into a given space?

In UPC barcodes, each digit requires exactly seven modules, plus additional space for structural elements like guard patterns. Overall, UPCs can only store about 1.5 characters per linear inch at typical printing sizes.

Modern 2D barcodes dramatically improve on this efficiency. By using both horizontal and vertical dimensions (like a crossword puzzle instead of a single line of text), QR codes can achieve much higher data density. This is why a QR code can contain a full website address in roughly the same space that a UPC uses for 12 digits.

Capacity / Density trade-offs

The fixed structure of UPCs makes them reliable but relatively space-hungry. On the other hand, denser barcodes require more sophisticated scanning equipment, higher quality printing, and more precise alignment during scanning among other things.

This explains why simple UPC codes remain prevalent in retail environments where speed and reliability are paramount.

Error Detection and Correction

Every time we scan a barcode, we're asking a machine to interpret patterns of light and dark in the real world. We'd like to make our barcodes robust to errors in interpretation.

But first: what causes errors in barcodes? They might get damaged during shipping, partially obscured by dirt, faded from sun exposure, or poorly printed to begin with. Even a small coffee stain or scratch could potentially change how a scanner interprets the patterns of bars and spaces.

Simple Error Detection: Check Digits

The most basic form of error detection uses check digits, and our familiar UPC barcode provides a perfect example.

In a UPC, the last digit isn't actually part of the product code – it's a check digit calculated from all the other digits using the following method:

  • Starting from the rightmost digit (excluding check digit), multiply odd positions by 3 and even positions by 1. Add all these numbers together.
  • The check digit is whatever number needs to be added to make the sum divisible by 10.
  • Append the check digit at the end.

Example

Start with UPC above: 72527273070.

7×1 + 2×3 + 5×1 + 2×3 + 7×1 + 2×3 + 7×1 + 3×3 + 0×1 + 7×3 + 0×1 = 74

Check digit = 80 - 74 = 6

Final UPC: 725272730706 (you see 6 at the end of that UPC)

When a scanner reads the barcode, it performs the same calculation and compares its result with the check digit. If they don't match, the scanner knows something went wrong, which is when you see the person processing your checkout enter the product details manually.

Advanced Error Correction: Reed-Solomon Codes

The biggest downside of simplistic methods like Check Digits is that they simply detect errors but can't do much to fix them. Modern barcodes like QR codes use more advanced error-correction methods like Reed-Solomon error correction to get around this limitation.

The high level idea of Reed Solomon Error correction involves converting data into polynomials, then generating extra "redundancy" values by evaluating these polynomials at additional points. If parts of the message get corrupted during transmission, the mathematical properties of polynomials allow us to reconstruct the original data as long as we have enough uncorrupted points.

I hope to cover this topic in-depth at some point, but until then, all about this fascinating method here.

Space vs. Robustness trade-off

Adding error correction capabilities requires extra space in the barcode. The more error correction capability you want, the more space you need to dedicate to parity data instead of actual message content.

A UPC barcode on a cereal box might only need basic check digit verification because it's scanned in controlled conditions. But a QR code on an outdoor billboard might need high-level error correction to remain readable despite weather exposure and viewing angle variations.

With that, we should have all the tools we need to break down a simple UPC.

Conclusion

That about does it for the technical components of barcodes. In summary, we covered:

  1. Types and use cases of barcodes
  2. Character sets: what data can be encoded
  3. Encoding schemes: how the character sets are encoded
  4. Structure requirements: how a given barcode is structured
  5. Data characteristics: tradeoffs in capacity and density of a barcode
  6. Error correction: how do we make our barcodes robust?

I also left a few interesting tidbits about barcode development in the appendix below.

I appreciate you making it this far in the blog! Have a cookie: 🍪.


Appendix

Digit encoding in UPC

Digit   Left Encoding          Right Encoding
 0       0001101                1110010
 1       0011001                1100110
 2       0010011                1101100
 3       0111101                1000010
 4       0100011                1011100
 5       0110001                1001110
 6       0101111                1010000
 7       0111011                1000100
 8       0110111                1001000
 9       0001011                1110100

Practical Considerations of Barcode development

A few interesting practical nuggets, which inform considerations when developing barcodes.

  • The environment of deployment needs to be considered when printing barcodes. For example, in industrial settings, barcodes need to be printed on materials that are resistant to high heat.
  • Physical Requirements
    • Size Constraints: Barcodes must maintain minimum dimensions to ensure scanner readability, typically requiring at least 0.25 inches in height for standard applications.
    • Print Quality Standards: Print resolution should be at least 300 DPI to ensure clear, sharp bars with minimal bleeding or distortion. This precision is crucial because scanners rely on the precise width of bars and spaces to decode information - even slight distortions can cause misreads or complete scan failures.
    • Contrast Requirements: A minimum contrast ratio of 70% between dark and light elements is essential for reliable scanning. This high contrast is necessary because barcode scanners work by measuring the reflected light from the surface - dark bars absorb light while light spaces reflect it. Without sufficient contrast, scanners struggle to differentiate between bars and spaces, leading to read errors or failed scans.

Barcode Trivia

In 1981 the US Department of Defense adopted the use of Code 39 for marking all products sold to the United States military.

Further Reading

Text encoding with UTF-8: https://blog.hubspot.com/website/what-is-utf-8