Randomization—An Interview with Ken Traub—Part 5: Other Approaches

???????????????????This is the last of a five part interview with Ken TraubGS1 standards expert and independent consultant, on GS1 serial number randomization.  The full series includes essays covering:

  1. GS1 Serial Number Considerations
  2. Properties of Randomization
  3. Threat Analysis
  4. Algorithmic Approach
  5. Other Approaches to Randomization (this essay)

This week Ken talks about other approaches to serial number randomization.  – Dirk.

__________________________

Dirk Rodgers Now, Ken.  Do you know if this kind of algorithmic-based system can be incorporated into an SAP system because it would have to be done by SAP, wouldn’t it, rather than an external system?

Ken Traub:  The key thing is that, if SAP had designed their serial number interface to provide a list of serial numbers, rather than the first and last number in a range, then, yes, it would be easy to integrate a randomization algorithm.  The problem is, they built an interface that by its nature assumes that serial numbers are numeric, and assigned sequentially.  Because if all they’re providing to the edge system is a min and a max, the only way you can make sense of that is to understand that the numbers are to be filled sequentially in-between.

DRSo users of SAP either cannot use the SAP serial number generator, or, they’re forced to use numeric only and sequential.

KT:  Well, the issue is not so much the generator, it’s the interface to the edge system.  The problem is, if you go to an edge system vendor such as Systech, Antares, etc, and you ask if they can receive numbers from SAP, they say “sure, we support their interface.”  And you say, “well, what if I want to modify that so SAP gives you a list of serial numbers?”  Systech or whoever will say, “we don’t have an interface for that.  We only have an interface for a numeric range.”

There are two things you can do to overcome that.  One is to define a new interface to allow the central systems to supply a list, which could be in random order.   The other solution is, have the central system still supply a range as it does today, but then take that randomization logic—the permutation function—and push that into the edge.  So, going to my English and French example, you’d be delivering the English to the packaging line, and then the packaging line would have to know how to translate to French.

DRYeah, but then your central system wouldn’t know which French serial numbers were valid or not, because they can’t go backwards, right?

KT:  Actually, cryptographic functions are reversible, it turns out.  However, you raise a very important point.  Normally, you should never use your number allocation system to check the validity of serial numbers.  Usually what happens is, your number range management system delivers numbers to the manufacturing line, and then when you ship products, you gather EPCIS events, and those go in through your EPCIS database…or at commissioning time when you print the serial number and apply it to the package you may generate an EPCIS event.  And you later use that EPCIS event to decide whether a serial number is valid or not.  That’s because the manufacturing system can ask for a range of serial numbers from the central allocation system, but until it has actually manufactured the products, those serial numbers are not valid.  So the system that is allocating the serial numbers doesn’t really know which serial numbers are used, and in fact if you were to rely on that, that would open up a vulnerability where somebody who has access to the manufacturing equipment could have it allocate a range from the central system, but then avoid the production of that lot, write down the numbers and hand them to his confederate who now has a range of numbers that are “valid” according to the number range management system but haven’t actually been put on products.

So number range management systems should never be used as the source of truth for what numbers are on products.  You want something that ties to an actual observation of a product that has shipped…that has been legitimately manufactured, inspected and approved and shipped.  So going back to your question, moving the randomization logic to the edge doesn’t affect the use of EPCIS data because that data is based on the serial number actually on the product, not the number the numbering system is using to keep track of allocations.

This approach—using a permutation cipher to randomize—that’s one type of approach.  Now I think that people are not doing it for a couple of reasons.  One, they find it too hard to understand.  Two, the commercial products that offer this carry high prices and the end users can’t imagine implementing a permutation algorithm themselves.  Three, people will say, “well if we do it centrally then we have this interface problem to the edge, but if we do it on the edge, then it’s hard to update our manufacturing equipment to do it.”  So they run into a lot of practical problems.

The big part of the reason that the BayCoders of the world charge a lot of money is they’ll go in and do the system configuration and help overcome those issues.  But it is still seen as a kind of a big deal, and people are also a little leery about being locked into a proprietary solution or something like that.

So for a variety of reasons, a lot of people are not using the algorithmic approach.

DRWhat’s an approach that people are using?

KT:  The other approach people are using is, allocate a sequential block, and then select one serial number from the block and throw the rest on the floor.  So, if I wanted one number, instead of asking for one number I’ll ask for 10,000.   And then SAP, or the central system will give me a range of 10,000 sequential numbers, and then what I will do is randomly select one of those 10,000 and just drop all the rest on the floor.  And when I want the next number I get the next block of 10,000 and I’ll pick one at random and drop the rest on the floor.  That gives you sparseness.  It’s not fully random so it doesn’t actually thwart the attack of figuring out sales volume, but it does allow you to meet the 1 in 10,000 EFPIA requirement.

Now, our previous discussion of threats may lead one to question whether that EFPIA requirement is sufficient if the only attack they are really preventing is the one they don’t actually care about, but that’s an exercise left for the reader I guess.

So that’s one way of doing it.  Now, an equivalent way of doing the same thing would be for each number you want, get the number from the range management system, and then append four random digits to it.  Then you would have a number that is like a random number chosen from a block of 10,000, it’s just that, in the central system you’re keeping track of the blocks one at a time instead of individual numbers.  They lead to the same result, the only difference is, what the numbers in the database reflect.  And Systech and companies like that, I think, offer out-of-box solutions to do that, and in reality, the way it works is, if they need 1,000 numbers what they’ll ask the central system for a block of 10 million and then when the 10 million numbers come back…or the range of size 10 million comes back, they go through each block of 10,000 and pick one at random and throw the rest on the floor.  So that is the more commonly used technique.

Those are basically all the techniques that I’m aware of that people are doing out there, although I’m sure there are other ways of doing things as well.

DRThat’s great.

KT:  Another thing to point out is that, even though in all these examples we’ve been using numeric serial numbers, the same principles apply if you’ve got a bigger character set.  You just have to think in base 36, or some other base rather than base 10.  Again, I think people don’t do it because if you’re not a computer scientist you’re not trained to think in different bases, unless you really retained your “new math” from fourth grade when that topic was taught to you for no apparent purpose…if you’re of a certain age.  But it turns out that multi-base arithmetic is actually very handy for this type of work.

This is the stuff I think anybody ought to know if they are going to delve into this, and the fact it took us an hour and a half to go through it suggests the challenge people face because it actually is a fairly complicated topic.

DRRight.

KT:  And this was an hour and a half of fairly intense technical stuff that, again, I think, comes naturally to a computer scientist, particularly people who have dealt more in the mathematical side of computer science, but for the average supply chain person, this is…you know…I may as well be talking organic chemistry…

DRThat’s all right.  You do a great job of simplifying, and if someone really wants to understand it, and they read this, I think they can understand it, because of your way of describing things.  It’s great.  Thanks Ken.

KT:  Great, well the take-away for everybody should be what they really need to do is hire a good business and a good technical consultant.

DRYeah, that’s right.  Exactly!

_____________

This concludes our five part interview series with Ken Traub on serial number randomization, but it is not the end of the RxTrace coverage of randomization.  Watch for new essays in the coming weeks covering the EFPIA serialization guidelines and the value of randomization in the U.S., but first we have a few other topics to catch up on.  — Dirk.