Randomization—An Interview with Ken Traub—Part 1: GS1 Serial Number Considerations

Ken Traub
Ken Traub

Over the next two weeks I have a very special treat for RxTrace readers.  It is an interview with Ken Traub, GS1 standards expert and independent consultant.  The subject is GS1 serial number randomization, something so important that I think pharma companies ought to give deep thought to it before they turn on their serial number applications.

Pharma manufacturer who sell into the E.U. and/or Brazil markets will be forced to randomize their serial numbers because of regulatory requirements, but even those who only sell into the U.S. market should strongly consider randomization.  I’ll have more to say about why in a follow-up essay after this series is over.

Because the interview with Ken covers the topic so thoroughly, it is long.  That’s good, because it provides readers with an easy to understand explanation of everything they need to know about randomizing.  But it also makes for a very long essay, so I have broken the interview down into five RxTrace essays.  Read sequentially, they contain the complete interview.  The subtopics covered by those essays include:

  1. GS1 Serial Number Considerations (this essay)
  2. Properties of Randomization
  3. Threat Analysis
  4. Algorithmic Approach
  5. Other Approaches to Randomization

Almost two years ago I began to think about the special problems and benefits of randomizing the GS1 serial numbers that are applied to drugs.  I began to do some investigating in preparation for an RxTrace essay and I found the topic to be very complex.  About that time I overheard Ken answer someone’s question on a GS1 call about serial number randomization.

Ken is a frequent and key contributor to the development of many of the GS1 traceability-related standards.  In fact, he is the editor of many of them, including the Electronic Product Code Information Services (EPCIS) standard, the Core Business Vocabulary standard and the Tag Data Standard (TDS), to name only a few.  I met Ken quite a few years ago and I’ve learned a lot from him over those years (see “Writing Is Thinking. For Example, Ken Traub”).

From his clear and concise answer about randomization on that call, it was clear to me that he would be better at explaining the complexities or randomization than I would, so I asked him if he would be willing to do this interview for publication in RxTrace.  We recorded the interview in late July of 2013 and it took me this long to prepare it for publication.

I think you will agree, it is worth the wait. — Dirk.


Dirk Rodgers:  Welcome Ken, and thanks for agreeing to this interview.  What we’re going to be talking about is serial numbers that are associated with GS1 GTINs, of course, and so if you could just start out explaining the limitations of those serial numbers in general, without even considering randomization yet.

Ken Traub:  Certainly.  The GS1 General Specifications is the name of the standard that defines all of GS1 identifiers and also defines various attributes that are associated with identifiers that occur within bar codes and in other data carriers, and so it’s in that standard that they define the identification for trade items.  The identification for a trade item is a Global Trade Item Number, or GTIN, that identifies a class of trade items.  If you want to identify individual instances, then the GTIN can be accompanied by a serial number.  It could also be accompanied by a lot ID to indicate a lot or a batch.

The serial number, as well as the lot ID, in the GS1 standards is defined to be any alphanumeric string of between one and twenty characters, and the characters that are allowed are all the digits, all the upper and lower case letters, and a handful of punctuation characters—a total of 82 characters all together, not quite all the characters in ASCII, but most of them.  And so if you consider the number of unique combinations of all those characters, from one to twenty characters in length, it’s a pretty enormous number: 1.91 x 1038

But most people, when they assign serial numbers, aren’t going to make it twenty characters long because that gets to be difficult for manual data entry as well as consumes a lot of space in the bar code, so usually people use a shorter string.  And usually people wouldn’t use all those characters in the character set, particularly not a lot of the punctuation characters and things like that, but still it’s still a pretty enormous number of combinations that are possible.

One thing to bear in mind is that, because the serial number is treated as just a string of characters, there’s nothing special about “0”.  It’s another character like any other, so “7”, “07” and “007” are all different serial numbers, even though if you were thinking of them as pure numbers you might think they were all the same.  Because they are just strings of characters, those are different serial numbers.

Now, both bar codes and RFID tags have some limitations regarding serial numbers, or some considerations that might lead an end user to restrict the range of serial numbers that they are willing to employ.  For example, in bar codes, the number of characters in a serial number has an impact on the size of the bar code.  So whether you’re doing a linear bar code like GS1-128 or Databar, or a 2-D bar code like a DataMatrix or a QR-Code, the more characters you try to stuff into that bar code the larger the symbol becomes.  For the 1-D bar codes it expands in width and for the 2-D bar codes both the height and the width expand.  And so companies that mark bar codes on products may be concerned about the size that the symbol occupies on their package, so they may want to limit it.  And also, they may find it much easier for their quality assurance of the printing if they know the bar code is always going to be of a fixed size.  That may lead them to say, well even though the number of characters in a serial number can vary, we want it to always be the same number of characters for the serial numbers we assign, so that we can count on the bar code symbol to always be of constant size.  So those are limitations that people may self-impose in assigning serial numbers.

If they are planning on using RFID tags there is another kind of limitation that can come into play.  If you want to encode a full alphanumeric serial number of up to 20 characters, the encoding for the GTIN plus the serial number that gives you complete freedom to choose any of those available serial numbers requires 198 bits of memory for the EPC.  Now, up until recently it wasn’t actually possible to get a tag with that much memory.  Even today when you can get that much memory, the longer the EPC is, the longer it takes the reader to read the tag, and longer read times…I mean, the time to read the tag is a fraction of a second so it’s not the actual read time itself that is so relevant, but the longer it takes to read a tag the less reliably you read a given tag within a given interval of time, particularly if it is moving or if you’re trying to read several tags at once.  So between the cost of the tag and the availability of tags and read performance, there are some motivations to avoid using 198 bits for the EPC.  And so there is a 96 bit encoding available for EPCs which, indeed, was the only one you could use back in the days when that was the maximum memory size in the RFID tags.  But if you’re using a 96 bit tag, there’s simply not enough possible combinations of 96 bits to accommodate all those possible combinations of up to 20 alphanumeric characters, so some limitations have to come into play.  The specific limitations that are defined in GS1’s Tag Data Standard are that in a 96 bit RFID tag:

  • the serial number is restricted to only digits—so no letters or punctuation;
  • it’s not allowed to have any leading zeroes, unless the serial number itself just consists of a single zero character; and
  • if you pretended the string of digits was a decimal numeral, its magnitude has to be less than 238, which is about 274 Billion and change.

So what that means is with the 96 bit tags you have 238, or approximately 274 Billion different serial numbers to choose from, which is still a pretty large amount and I think most manufacturers would be delighted if that placed some constraint on the number of products they could sell—most companies don’t sell that many!  However, when we get to randomization you will see that that capacity limit may be something to be concerned about.

Now look what happens when you put the RFID and the bar code limitations together.  You remember with bar codes, people want to have a constant number of digits.  In the RFID tag, the serial number can’t have leading zeroes, so if you want to have a constant number of digits and stay within the constraints of the RFID tag where it doesn’t begin with leading zeroes, then you end up constraining yourself a little more, so one way to take the intersection of that is to say I’m only going to use 11 digits, it’s always going to contain 11 digits, but because I can’t have any leading zeroes, the first digit is always going to be one of the digits 1 through 9.  And so, any 11-digit number would give you a total of 1011, which is 100 billion serial numbers, but if you narrow that down to the first digit beginning with 1 through 9, then instead of 100 billion, you have 90 billion serial numbers available. The other possibility is to do a 12 digit serial number, again starting where the first digit is a “1”, but instead of going up to all 9’s—that would exceed the 238 limit—you would go from 100 billion to 274 billion and change, which gives you about 174 billion different serial numbers.  You just have to remember where that funny upper limit is.

All of these constraints and the net effects that I just described of what a typical rule that might be used, in terms of an 11-digit or 12-digit serial number, those are all detailed in the GS1 RFID Bar Code Interoperability Guideline which GS1 published a few months ago.  I was the editor for that document and that’s why I am able to recite all that information to you in such polished form.

DRHa!  OK, but I don’t know if I heard you mention anything about the serial number field in the GS1 definition being variable length.  You said anything up to 20 characters…

KT:  Yes.  One to 20 characters so that implies variable.

DRRight, and being variable length, the choice is up to the user, correct?

KT:  Yes.  And the user doesn’t have to be consistent.  He could put a 3-character serial number on one instance of a product and 5 characters on the next instance of the product.  And, in fact, if they were assigning numeric serial numbers without leading zeroes and starting from “1” and counting upwards, then the first ten products would have one character—one digit—the next 90 products would have 2 characters and next 900 would have 3 and so forth…

DRYeah, and the reason people probably don’t like that is, as you mentioned, the size of the bar code grows…

KT:  Actually, it would be very natural to do that in the RFID tag because, in a 96-bit RFID tag the thing is encoded as an integer where there are no leading zeroes and so if you’re counting up from “1” it will have variable length.  In the bar code world, people tend not to do things that way because they’re trying to keep the number of characters consistent.  That’s why when you take both constraints into account you end up with an even more restricted set from which you’re going to choose.

DRRight, and so, ignoring the difference between RFID and the limitations of the 96-bit tags, if you really are trying to keep a constant number of digits and you select something less than 20, as I see it, there is a trade-off between the number of characters used in a serial number that you do choose and the character set that you choose to use, because, as you pointed out, you could choose to use all numeric digits, or you could choose alphanumerics upper case only, and so on.  In other words, you can limit the character set yourself, beyond what the spec says.

KT:  Yeah, that’s right.  The general rule is, that any system that is receiving a serial number should be prepared for any character string that fits the standard.  So any character string that’s between 1 and 20 characters, composed of any of the 82 characters that are allowed, in any combination.  On the other hand, the system that is creating serial numbers can choose to operate in a more restricted mode so that, within the scope of serial numbers issued by that system, you will have less variation.  It’s really important to not build such assumptions into other parts of your system that are receiving serial numbers because someday, if you want to change, or if you merge with another company and they have already assigned serial numbers under a different policy, you don’t want to be in a situation where you are not able to input those serial numbers that have already been assigned into systems that need to have that information.

DRRight, so I guess my point about this trade-off is that, if you choose, for instance, a 3-character serial number, and you choose to limit it to numeric digits only you can go up to 999…actually you can have 1,000 serial numbers since you can start with zero, correct?

KT:  That’s right…

DR: But…

KT:  If you then allowed digits and upper case letters, then you’ve got 10 digits and 26 letters to choose from, that gives you a total 36 choices for each of the 3 digit (character) positions, so then the total number of serial numbers you can assign would be 36 * 36 * 36, … which works out to 46,656.

DRNow do you have a calculator that you just used or did you have that memorized?

KT:  No that one I calculated.  I have a lot of powers of 2 memorized but not powers of 36…

So, in general, if you’ve got “n” positions in your serial number, and in each of those “n” positions you have “m” characters to choose from, then “m” to the “nth” power, or mn, is the total number of combinations.

If it’s variable length then you have to sum that up for values of “n” for values from 1 up to your maximum length, which works out to (mn+1 – m)/(m – 1).

DRBut now if you look at it in terms of the resulting bar code size—and for the purposes of this question let’s assume a GS1 DataMatrix bar code—when you go from that first example where you just used numeric digits, 3 characters, and then the second example where you used the 3 characters, but this time you include all the upper case alphas, would you expect to see a change in the size of the image that results from those two examples?

KT:  It depends.  In the example you gave with three characters, it’s probably not going to make a difference.  With more characters, it might.

GS1 DataMatrix has several encoding modes, and some of them result in less size when only digit characters are used.  On the other hand, the GTIN is always digits, and so is the expiration date if you’re including that in the bar code, too, so the choice of digits versus alphas in the serial number only affects a part of the overall bar code size.  So it’s worth running tests with examples of the serial numbers you want to use to see what the size is.  You don’t need to try all possible combinations.  The size is determined by how many characters there are and which are digits vs non-digits, so you can try encoding a value with “9” in each place you plan to use a digit and “A” for each place where you plan to use a non-digit, and that will tell you the size.  If you’re looking to keep the bar code size constant, make sure all your serial numbers fit the same pattern.

DRWell that’s very interesting.  The reason I raise this is that in the pharmaceutical supply chain today there are some companies that would prefer to only use numeric digits because they are just used to “serial numbers” being a “number”.  And I suspect that they’re really kind of missing the point, that the full standard would seem to be a better choice—unless you choose to use a 96-bit RFID tag, today or in the future—to NOT restrict yourself in terms of character set, at least not that restrictive, because that’s a serious restriction to only use numeric digits.

KT:  And if it’s only a computer that’s ever going to look at it, then I think what you’re saying is true.  However, there are situations where humans have to deal with those serial numbers.  You may be trying to diagnose a counterfeit situation or problem and you may be on the phone with someone and you may want to read a serial number to them.  Or you may be in the warehouse and you’re jotting down some serial numbers on a piece of paper and you’re going to take it back to another system to investigate.  You can imagine there are a lot of situations where a human may be involved.  Once there’s a human involved, then there are reasons to prefer a simpler character set.

For example if you also allow upper case letters, then you’ve got the issue that letter “O” and the numeral zero are often very difficult to distinguish—same thing with the letter “I” and the numeral one.  If you were to allow lower case letters, that’s even worse because upper case “O”, lower case “o” and zero all look rather similar to each other and, depending on the font, a lower case “a” may look kind of similar to that as well.  And then if you throw in all the punctuation characters which would give you the maximum capacity, then you’ve got things like a colon and a semi-colon which look pretty similar and so forth.

Now what sometimes people do is, they say, well, we’re going to use the 10 digits, and then we’re going to use 24 upper case letters…we’re going to exclude “I” and “O” because they look similar to one and zero.  That’s a fairly common thing to do, and if you do that, you’re much less likely to have a transcription error, however, for a human to work with a 10 character random alphanumeric string, for humans it’s a lot harder than reading off a string of numbers, or even reading off a string of letters.  And the familiar example of that is if you’ve ever dealt with the reservation locator code that airlines use, it’s a string of alphanumeric characters, I think they exclude “I” and “O” so they avoid that, but if you’ve ever had to read one of those to a reservations agent you know how tedious that is, and that’s only six characters.  Whereas if you were reading off a 7-digit phone number you’d read it much faster than you can read that six character alphanumeric string in your reservation.  So there are human ergonomic reasons for restricting yourself to all digits.  But as you set yourself up with a smaller capacity limit—and as we’ll get to when we talk about randomization—that may be one of the places where you need to compromise because those trade-offs collide with each other.

DRExactly, and I wanted to have this discussion about the serial number in general prior to getting to randomization, so I think we’re ready to talk about randomization.

For the next installment of this interview, see Properties of Randomization.