Physical Attacks (PA) Basics
Welcome back. This week we will first talk about physical attacks. From studying various ways of physical attacks, you will see the threats from hardware, or, how to break security from hardware side. You will also learn some of the countermeasures to these attacks. Probably one thing that you may find a very use for is how industry and government define and evaluate the security levels of a product. Most of the physical attacks require special skills and knowledge in addition to sometimes very expensive equipment and the tools. However, you do not need to have any special background to understand most of these attacks. Our goal in this lecture is to let people be aware of these attacks, not to teach them step by step, how to do such attacks. Next week, we will discuss in more details about side-channel attack, which is a very powerful physical attack. Over the years of lecturing on hardware security, I find it is very difficult to deliver this part of the material without a lab for demonstrations. For this Coursera course, I will also take away some of the images. Part of this course is developed based on a book chapter by Sergei Skorabogatov. On his web page, Sergei has also posted his lecture notes Physical Attacks and Tamper Resistance. If you are interested in this topic, you should definitely read this book chapter. Let's start with the question, what are the physical attacks? I cannot give you an accurate definition, but all the physical attacks have the following characteristics, or require the following conditions. First, to launch a physical attack, the attacker needs physical access to the chip, or at least should be able to contact, to contact, to connect certain wires in order to measure the target's signals. With more and more devices being connected wirelessly nowadays, physical attacks without physical access to the system have also been reported. For example, acoustic attacks, near field communication attacks, and some attacks to the RFID can all be performed close to the device without touching them. In addition to the necessary equipment and the tools, the attacker must also have the required skills and the knowledge to perform physical attacks. The second common thing about all the physical attacks is that they all have two phases. First, the Interaction phase where the attackers interact with some physical characteristics of the device to collect data. And then, the Exploitation phase, where the attacker analyze the collected information to reveal the secret and break the security. From this, without discussing any physical attack in details, what can we say about physical attack and hardware security? First, we can imagine that this requires, requirements make it harder to attack the system from hardware than from software or network. We have said that Physical attacks needs Physical access to the system. And those will most of the time need Specialized equipment, tools, and the knowledge to perform the task the, the attack. And normally, for software attack or network attack the, the bar is much lower. On the other hand, we know that the attackers loss will always be our gain. In this content, contexts this means that with the same costs and efforts investing on protection from hardware will most likely make the system more secure than adding security from network and software layers. This is also one of the reasons why hardware security has gained a lot of attention in the recent years. However, the downside of building security at hard lev, hardware is that this will also add another attacking surface. We have to make sure that the hardware is secure and trusted. Which means that we should consider the vulner, the vulnerabilities of hardware design and implementations. And make sure that the back doors will not become new threats. Attackers, not necessarily who launch physical attacks, can be categorized into three groups. Class I, clever outsiders. These attackers are often very intelligent, but they may not have sufficient knowledge about a system. They may not have access to the equipment and the tools necessary to break the system either. Class II, knowledgeable insiders. These attackers are familiar with the system, and have highly sophisticated tools and equipment to analyze and break the system. Class III, funded organizations. They normally have a team of expert, experts with all the necessary skills, tools, knowledge, and equipment to invent new attacks if necessary to break the target system. So what are the motivations for all these kind, kind of different attackers to break a system? There's one single motivation, money. Actually, this is also the motivation for almost all the attacks. Based on how money can be made from successful physical attacks, we can further partition the motivation for physical attacks into several groups. First, steal money or a service directly. Example of these type of physical attacks include, breaking smart cards and making bogus deposit, or breaking the TV set top box to watch programs without subscribing. Or, breaking game consoles to play games without paying. Second, make profit fits by sailing, selling or reselling products illegally. IP piracy, integrated circuits, overbuilding, and counterfeiting, are all examples of this kind. Third, post their own sales by interrupting or damaging the service provided by the competitors. For example, an attacker can insert malicious updates, patches or hardware tojan to make competitors device or system malfunction or suffer bad performance. This will give their own product an end, an unfair competitive edge. We have discussed IP protection last week, and we will cover hardware trojan later. This week our focus will be on how physical attacks can break the system. Particularly cryptosystems to steal service or money directly. When we say breaking a system, we do not mean physically destroying the system. Instead, the attackers want to learn information about the system that he doesn't have ac, the authority to access. In the case of cryptosystems, such information could be the secret key, or the secret data. In the case of design IP protections, it would be the design details. If on further discuss, discussing physical attacks, we would like to point out the difference between Physical Attacks and Cryptanalysis. They both have the goal of breaking the security. However, classic Cryptanalysis focuses on breaking the cryptosystem by mathematical analysis of the crypto, cryptographic algorithms. It had success in the past, for example, the birthday attack and the differential Cryptanalysis. But now, Cryptanalysis is becoming harder and harder. This is because most of the modern crypto systems are theoretically sound and unbreakable. Instead, there may exist flaws in the implementation of the crypto systems. Physical attacks attempts to find such flaws and break the system. Based on whether the targeted system or device will be damaged during or after the attack, Physical Attacks can be classified in three groups. Invasive attacks, this kind of attack requires direct access to the inside of the chip or device. Depends on whether after you attack, the system can be reassembled. The sys, the, the attack can be either Reversible, otherwise it called irreversible. Apparently, for irreversible attacks, it cannot be repeated. Normally, the device will be damaged after the attack. Sometimes tamper evidence will be, will left. The cost and required skills to perform invasive attack varies based on how the attacks will be performed on what kind of systems. But normally, the cost and effort will be very, very high. The second category is called a Non-invasive attack. In this case, the attacker will interact with the device, with a chip, with its interface, such as voltage, current, clock, input-output interface, etc. And such attack can also be Passive or active. Depending on whether the attacker only monitors and measures this physical measure, physical characteristics. Or, the attacker also injects input to the system to cause the system malfunction. Normally, such attack will not damage the device, and it will not leave any tamper evidence. Most of these thing, these attack, have very low cost and can be repeated. And in the middle, between invasive attack and non-invasive attack, there's another category called semi-invasive attacks. These attacks, normally they require access to the surface of the chip. But they do not need to create contacts with the internal wires. And usually did not damage the system, and depends on how the attacker is performed, they may or may not leave tamper evidence. And the cost of this depends on what kind of surface, what kind of interact they will have with the system. They can be high or low. But still be definitely much lower than the very expensive invasive attacks. And they also require some special skills. And most of the cyber invasive attacks they are repeatable. There are a few ways to perform physical attacks. And based on how physical attacks are performed, it can also be put into several different categories. Reverse engineering, this type of attack will study the chip's in, inside structure. Then trying to decide, determine the functionality of the system. Normally, this has very high cost, and the attacker should have similar capabilities of the designer. And normally have, also have the capability of manufacturing. And this is in general is in, invasive attack. The second group is called Microprobing. In this case it requires to direct access to the chip surface to observe, manipulate, and interact with the chip, and they are also invasive attacks. The third category is called a Fault generation. In this case, the attacker will generate, create f, Fault, normally a faulty input, or make the chip run eh, of normal environment conditions, and in the hope that the chip will malfunction, to leak information, or give the attacker additional access to the system. Depending how the fault is generated, the attack can be either be semi-invasive or non-invasive. And Side-channel attack, which we'll discuss next week, they are normally non-invasive attacks, and this type of attack has two phases. First, the attacker, while monitoring, will measure a chip's physical characteristics, such as power, current, timing, EM radiation, etc, during the chip's normal operation mode. And then, with the data collected, they will perform a data analysis to learn the hidden information. And finally, there's a category called Software attack. This again, it is non-invasive and the attacker will use normal input, output interface trying to explore known security vulnerabilities in the se, in the security protocols or algorithms were there software implementations to perform an attack. This attack, they are repeatable and non-invasive.
Physical Attacks and Countermeasures
We have learned who the physical attackers are, their motivations to attack, and the classification of physical attacks. Now we study the details of these attacks and some of the countermeasures. We start from invasive attack. Invasive attacks start with depackaging the chip or device to expose the inside of the silicon die. Decapsulation is the process of removing chip package. For complicated chips with multiple layers, deep processing will remove layer by layer to allow attackers to study features in each layer. Once the chip is opened up, reverse engineering can be performed to discover the inner structure of the chip and its functionality. This can be done with a optical microscope with digital camera to capture high resolution image of the chip's surface in each layer. In order to obtain more detailed information about a chip, microprobing station will be needed. This is perhaps the most expensive equipment required for invasive attacks. It can reach submicron precision. So electrical contacts with single on-chip bus line can be established without damaging them. This means that the attacker will be able to monitor on-chip bus activities, and they can also inject test signals and observe the chip's response. For example, secret keys with sensitive data can be extracted from memory by this method. Finally, with all this detailed information about the chip, a knowledgeable design team, and the silicon fabrication capability, the attackers will be able to rebuild or modify the chip to complete the invasive attack. For example, cutting certain internal interconnections can disable on-chip components such as encryption blocks. The cost to attack to low end chips such as old smart cards could be very low. But attacking modern chips requires expensive equipment and a very skillful attacking team. In addition, as the feature size keeps on shrinking for modern chips, and its design complexity keeps on increasing, the cost of invasive attacks also increases very fast. We have mentioned some equipment for invasive attacks. Here's the list for your reference. Next in the semi-invasive attack, the attackers also need to depackage the silicon to open up the chip. However, they will not need to create contacts with any internal signals. Therefore, the expensive microprobing station and the other equipment may not be required for semi-invasive attacks. Instead, the attackers will have to use other equipment to launch various semi-invasive attacks. For example, in the imaging attack, with the help from special cameras and the photon pro, probing, attackers can read the layout of the chip, localize the active regions, or read the logical state of certain CMOS transistors. In fault injection attacks, after locating the security fuses, the attacker can use UV light to reset these fuses to their unprotected states. In optical fault injection attack, the illumination of a transistor can make it conduct and thus introduce a transient fault or set to the value of a single bit in the memory. This can be done with some cheap laser pointers. Local heating can also permanently change some of the memory cells. With lasers, attackers can also write into SRAM or disable the write operation in the embedded flash memory. This is called memory masking. We will discuss side channel attacks in details next week. Optical equipment and techniques can make such analysis much more accurate and deadly. For example, laser can be pointed to a particular transistor. This allows the attacker to collect the power trace before and after using laser and then compare. These are the set of tools and equipment that might be useful for non-invasive attacks. There are four types of non-invasive attacks. Side channel analysis. This is the only passive attack as it only monitors the chip's execution and it collects some measurement. Brute force attack is a strategy for search information on the chip. It is normally an organized search, instead of a random search. For example, attackers can search for secret keys and password if they are not sufficiently long when their memory address can be predicted or are restricted to certain areas. They can also try to rebuild the truth table of the input or output pairs to reveal the functionality of the design. Of course, this will be limited to relatively small designs. By applying high voltage, normally it doesn't have to be very high, just about twice as the normal supply voltage or injecting random signals or commands, attackers may gain access to factory testing or programming mode. These access are known sometimes as backdoors. The other two types of non-invasive attacks are known as data remanence, remanence and fault injection. We will discuss them late, next. These last attacks all need interaction with the chip. And therefore they are all active attacks. When secret keys or data are stored in SRAM, they may leak in several ways because of the physical characteristics of, of SRAM. First. After power is down, SRAM will still retain the data for a short period of time, so the attacker can try to steal the data during this period. On the other hand, when data has been stored in the same location for a long time, it may appear after the chip is powered up again. This gives the attacker the chance to view the data. Finally, data in SRAM can be frozen in low temperature. Most commercial devices consider negative 20 degrees Celsius or below as tempering stage. But there are ways to make data freeze at higher temperature and then read it out. Data rem, remanence can also happen in EEPROM or flash memory. The threshold voltage will change after each write and then erase in such memory. It has been reported that data can be extracted after multiple write and erase cycles. The idea behind fault injection attacks is to manage the chip to run with faulty data or unexpected instructions, and then observe the execution to gain unauthorized access to systems or learn secret data. It is possible to input faulty signal directly from the input/output interface. But secure systems can easily detect such faulty input before the execution starts. So how can attackers inject faults? There are actually many ways to do that. We will talk about clock and voltage glitches in details on the next slides. By changing the temperature, or exposing the chip to white light or laser, the states of the transistor or of, or the flip flops may change. This can also be done by using X-ray, ion beams, or EM flux. A glitch is a fast, fast change in chip's supply signals, such as supplying voltage and the clock, clock signals. It may affect some random transistors or flip-flops. Although the attacker may not have control on which transistor or flip-flop will change, it is not very hard to do a systematic search to find security holes. For example, a shorter clock pulse cause incorrect instruction fetch in Motorola controllers. And power supply glitches can break 128 bit AES key in 2 to the power 12 attempts. Which is about eight hours at a 100 millisecond cycle. If we do it selectively at carefully chosen time slots, not continuously, the number of attempts can be reduced to further to 2 to the power 7, or only about a couple of minutes. Fault injection attacks are based on faults. So fault-tolerant computing techniques can be used to defend both at software level and hardware level. However, this does come with overhead in hardware and performance. Also, sometimes, detecting fault injection attacks after the attack might be too late, the damage may have already been made. For example, attackers may have already learned information about sensitive data. It is preferred to detect the faulted condition before the system executes under such conditions. This is a very challenge problem. Now we talk about some of the countermeasures to invasive attacks. Some of them may also be applicable to semi-invasive attacks. First, recall that attackers can probe at a single base line to steal data. Almost all of the databases that connect the CPU and the memory are either in the increasing order from zero page to the highest page, or in the opposite order, from the highest page to page zero. This makes it easy for the attackers to locate the baseline they want to probe. We can change this order to confuse the attacker, which is known as bus scrambling. We can also encrypt sensitive data to prevent an attacker from probing it. Of course, the data of the encryption needs to be decrypted in a trusted zone before it can be used. Another way to defend invasive attack is the new glue logic design approach. In this design style, standard building blocks such as register files, instruction decoders, arithmetic and logical units, and input/output circuits are glued together like a ASIC instead of individual components. And this hides the data, the data bus, so it becomes impossible for attackers to find the signal to attack. To protect from invasive attacks, most of the smart card, smart cards have a sensor mesh implemented in the top metal layer. All the paths in the mesh are monitored continuously. Microprobing attempts will, will cause short circuits. This will trigger an alarm and the memory contents will be reset to protect sensitive data.
Building Secure Systems
After learning of physical attacks, and some counter measures, we discuss how temporary resistance levels are evaluated and also how to build an enhanced system security. The tamper resistance level or system security level have always been measured by the time, cost and the knowledge that the attacker has to have to break the system. From high to low a device tamper resistance can be put into six different levels. The lowest level is zero, where the attacker does not need any special equipment or tools to break the system. The attacker can be done in minutes or a couple of hours. These systems do not have any security features and all of their components are open. Examples include microcontrollers or FPGA chips with external memory. And the second lowest level is called low. This systems has some basic security features that are easy to break. The microcontroller with internal memory and proprietary programming algorithms, but no security mechanisms to protect the memory fall into this, this category. The attacker does need some basic equipment and tools to break these systems. It may take them several hours or a couple of days. The cost was under $1,000 in the original 1991 document. But the price of the equipment keeps on dropping. And nowadays an attacker can obtain all the necessary equipments with probably less of $500. The next level is called moderate low. These other systems that have security features designed against the low cost attacks. It will take more expensive equipment and longer time to break the security. System in the moderate temporary resist, resistance level normally are protected against most of the common attacks, special tools, skills, and the knowledge are needed to break such systems. Rather expensive equipment is also required. The time to break the systems can be several months. Finally, systems designed for security applications such as military chips, banking systems, complex ASIC and FPGAs with high or moderate high level of temporary resistance. They are supposed to be secure against all the known attacks. It is normally impossible or extremely difficult for any individual to break these systems. Sometimes, new attack models or new tools have to be invented to break these systems. As we have already seen, the cost for attacks drops as the price of equipment and the tools goes down. So the classification of these levels are relative. Some device with moderate to high tamper protection levels may one day become slow level low while a new low cost attack is found. Also technology advances such as the use of new materials with a new design process will increase the tamper resistance level of certain devices. The US Department of Commerce has also published its standards of how secure a crypto module is implemented. Level one is the lowest level. It must specify the basic security requirements for the crypto module. Level two improves level one by adding physical security mechanisms such as coating, seals or locks. Level three requires more advanced physical security to keep attackers away from critical security parameters in the crypto modules. Level four is the highest security level. It should ensure the security of the crypto module against all kinds of physical attacks. Here are some examples of the security failures. Your microchip PIC microcontroller, a security fuse bug is found, was found, where the security fuse could not be reset, could be reset without erasing the code or data. And this has been fixed in newer devices. In Hitachi smartcards, the information leak on a product CD was reported. Where the full data sheet about the smartcard is accidentally included in the CD. Actel secure FPGA chip, a programming software bug was, was reported. Of such chip, devices were always programmed with a specific passkey. A software update can fix the problem. Also a programming software bug was found on Xilinx secure CPLD. The security fuses is incorrectly programmed, and they need a software update to fix the problem. Finally, in the Dallas SHA-1 secure memory, a factory initialization bug was reported. Some security features were not activated and it failed to provide the required protection. It ends up that batch of chips were all record, recalled. From this set of slides, we have learned that there are different types of attackers and the different ways to attack a system. Before you build your secure system, you have to ask yourself what will be the motivations for others to attack my system. Who might be the attackers, and how would they attack my system? More specifically, building a secure system start with understanding the attackers and to perform, a system security evaluation and threat estimation. You need to take into account the attacker's motivation to attack, the tools, equipment, skills, time, and money they need to break the system. And also, the probability they can be successful. A rule of thumb is that if the cost that they have to pay to break the system is higher than the benefit they can get, then a rational attacker will not attack, attempt to break the system. Once you find the weakest security link in your system and decided, decided to improve it, you may find that there are too many products on the market. Most of them, claim to be secure or with a security, sec, certific, cert, certification. However, claims and certification are not guarantees. In many cases, there's no security benchmark, and the attackers will always keep on bringing new and a powerful attacks. So we cannot test the system against all the possible scenarios to verify its security. Even worse, it may not make much sense to compare the security features of components from different vendors. Cheap manufacturers, they know more about the security of their products. However, they will only ad, advertise the good features. Just like most of the sales persons like car dealers and realtors will do. Good luck if you trust 'em. So you may find that sometimes you have to redesign for security and the trust of the system. This is clearly not what you want because of the cost of redesign. Particularly when you face the pressure of released deadlines. That is one of the reasons we have seen, and will continue to see, many products with defect on the market. In the worst case, when, when recall have to be made, and the cost of recall may be higher than redesign. This brings us to the conclusion that building secure system needs a sys, system engineer approach. And you have to build security and the trust into the system, not onto an already built system by adding security features.
Modular Exponentiation (ME) Basics
Welcome back to Hardware Security. In this lecture, we will learn a mathematical operation called modular exponentiation. It is a very popular operation in modern cryptography. However, it is also very computationally expensive in terms of hardware implementation. So what we'll learn today is, we're going to learn the definition of modular exponentiation. We will show a couple examples of how it can be applied for security. We will show a couple ways to see how it is computed and how it is implemented. And we'll try to explore whether there will be any security vulnerabilities in such implementation. For you to understand this lecture, you only need to know how to multiply two integers. However, for you to complete the quiz, the weekly assignment, you need to know how to convert a decimal number into binary. The modulo operator is always about trying to find the remainder. So when we do a division, for example, 7 divided by 2, we say the 3 is a quotient, but we have a 1 as a remainder. So mathematically, we write this as 7 is congruent to 1 mod or modulo 2. Sometimes, I also call this one 7 equals to 1 mod 2 or 7 mod 2 is 1. mathematically, we call, we define a is congruent to b modular n if and only if the difference of a and b, which is a minus b, can be evenly divided by n. So once we understand what modular is, so what is the modular exponentiation? So the goal is in this operation is to try to find out what is the value of a to the power e mod n. For example, once we have a to the power 4, we say a to the power 4 equal to 16. So a to the power 4 mod 10 equal to 6. So that is a simple, fairly simple operation we're, we're doing here. So now let's see a couple of simple ways to implement the modular exponentiation operation. So first it is the straightforward way by definition. First we compute a to the power e, which we call it a variable b. And then we do b mod n. And this gives us the result. For example, when I calculate 2 to the power 10 mod 10. What I do first is, I raise 2 to the power 10. I get this 1024 as the result. Then I do a 1024 mod 10 where I know the remainder, remainder is 4. So that is the result, okay. So the second way I do is, this is the one I call derivative modular and exponentiation. It is based on the following observation. So you have two numbers x and y. And if they are congruent to each other mod n and then if I multiply them both by a, they are still congruent to each other. So, following this observation, I can limit the value of the numbers, which I do multiplication. For example, I'm not going to multiply anything larger than n. Whenever they are larger than n, I'm going to do a modular operation. For the same example here, let's say I try to calculate 2 to the power 10 mod 10. I start with this, which is 2 to the power 1, which is 2. So this is fine, this is less than n, so I continue doing multiplication. The second 2, 2 times 2 is 4. This is still less than 10, so I do the multiplication. 4 times 2, which is 8, it is the third one. And then 8 multiplied by 2 again becomes 16. So now 16 is greater than n, which is 10. What I do now is I do a 16 mod 10. So this give me a 6, okay. So this is 2 to the power 4. And then I continue doing this for 2 the power 5, 6, 7, 8, 9, and then 10. So what I do is, I continue from here. This 6 multiplied by 2 give me 12. And 12, again, it is larger than 10. So I do a mod by 10, and I get the remainder, which is 2. So starting from this 2, I multiply by 2 again. So 2 becomes 4. 4 multiplied by 2 becomes 8. 8 multiplied by 2 becomes 16. 16, again, becomes larger than 10, so I do another modular operation. And this gives me a 6. From 6, I multiply it, becomes 12. I do a modular, becomes 2. And then from this 2, I multiply by 2 again, give me a 4. Okay? So by this method, I also get a 4 here. So both methods give me the correct answer. And you will see I can, this is much better because, for example, I don't really have to find this large number. The largest number I have used here is 16 for a couple of times. So this is, is on, is going to be a very, very memory efficient implementation. However, the problem comes if we are trying to calculate a large number raised to a large power. For example, in this case, what is the value of 34,987,317 raised to the power of 10,357,198 and then mod 510,926,533,897. So it only, it only, it's already taken me some time to read all these numbers, so how can we do these multiplications? In particular, if you think about this, no matter we're doing the first method or this second method, we have to multiply these things e minus 1 times. So once you have a large exponents like this, how can you multiply this many times efficiently? That is one of the problems we need to solve.
ME in Cryptography
So now we have seen what is the definition of modular exponentiation and a couple of simple ways to implement it. And in the couple, the next couple of slides, I'm going to show you two examples of how it can be applied for modern cryptography. And I'm sure that professor Jonathan Card has showed you more examples about this in his course on cryptography. The first example is on the Diffie-Hellman Key Exchange. In this case, there are two people, Alice and Bob. They want to talk to each other, but they want their message to be encrypted. So in that way they want to share the same key. So what Alice is going to do is she's going to find a random number x sub A. Then she's going to raise a base a to the power X sub A, and doing the modulo q operation. And she's going to call this one Y sub A. What Alice is going to do is, she's going to send Y sub A to Bob, the partner she's going to talk with. And then meanwhile keep X sub A as a secret for herself. On the other side Bob is going to do exactly the same. And which I highlighted everything to green, just trying to distinguish what Bob is doing compared to what Alice is doing. So Bob gets X sub B and then raise A to the power of X sub B, and call this result Y sub B. And he's going to send the Y sub B to Alice and then keep X sub B as a secret for himself. Now, Alice has Bob's Y sub B and she has his own X sub A. So how Alice is going to generate her shared key here? So here is the key generation protocol. So what Alice is going to do is, she's going to raise Y sub B, which is the message coming from Bob, to the power X sub A, which is the secret she keeps for herself. And I'm doing this things under the mod operation. And on the other side, Bob is going to do the same. And the y just gave them the same key share the key. This is the underline mass for this one to be true here. So from Alice side what she calculated is a to the power X sub B. Which is this Y sub B, raised to the power X sub A. Then from Bob's side, what he has computed is Y sub A to X sub B. And where Y sub A is actually a to the X sub A, and then raised to the power X sub B. So what you see here is, for this inequality on the left side, left hand side, you have a to the power X sub B times X sub A. On the right side you have a to the X sub A and then X sub B. And we know these two actually they are the same, so that is the reason in this way how they can share the same, the same key here. So however we're realizing this protocol, both Alice and Bob, they have to do this modular exponentiation many, many times. Here, here, and here. Both for them to generate to the message to send to each other and also trying to generate their key. So this is a very, very quick going operation for both parties. And the second example we're going to show is the, is the public key encryption scheme, called RSA, which is named under its three inventors. So this is an asymmetric encryption scheme, which means the encryption key and the decryption key, they're different. So in this protocol, what each user has is a pair of keys. A public key and a private key. So one user want to do an encryption. So, for example, I want to send a message to a certain user, let's say Bob, who has this pair of key. So what I'm going to do is, I'm going to encrypt a message using Bob's public key, which everybody knows. And then this message comes from P, become encrypted, becomes the ciphertext C here. When Bob receives the ciphertext C, he will be able to decrypt it using his private key and the decryption algorithm. So he decrypted the ciphertext C and then get the message the P back. For this part of it to work, we need the following requirement here. For the same message P, after we, after I encrypted it using Bob's public key, and then after Bob decrypted it using his private key, this message should remain the same. So that is the underlying protocol for this, RSA public key, public key encryption. So currently the key size for RSA is about 10,000, 1,024 and, but very soon it's going to be doubled or even I mean quadrupled. So let's see how, what, what is the encryption and the decryption algorithms, what schemes RSA is using here. So in the RSA algorithm, what user is going to do here is, is going to select two prime numbers, p and q, and then they're going to multiply these two prime numbers. And then they are going to choose a small number, or maybe not that small, smaller than n for sure, so it's a number e which is relatively prime to p minus 1 times q minus 1. And the next step is a little bit of challenge. So what you need to do is, you need to find another integer d, such that e times d is equal to 1 mod p minus 1 times q minus 1. 'Kay? And once it, this is done, the user will have a pair of public key and private key. And with this public key and private key pair, we can do encryptions, we can do decryptions. 'Kay. And what you would realize in this case, the encryption function is a modular exponentiation. The decryption function, again, it is a modular exponentiation. And why it works in this case? So, we see what happens here is, so if I have a message X, I do the encryption, and then I do the decryption. And from the fact, from number theory, we know that when x raised to the power e times d, and as long as this e times d, mod p minus 1 times q minus 1 equal to 1, and then this x to the power e minus d mod n equal to x. Which means you can get the original message back. So again from this example we see that we need to calculate modular p exponentiation. We need to compute modular exponentiation. So next we are going to show how we can, we can compute this one efficiently. So this is another small example to show, how this works in real, real time. So we select p equal to 11, q equal to 17, is our two prime numbers, and then following the definition, n is the product of these two which is 187. And then p minus 1 which is 10, q minus 1 which is 16, so their product will be 160. I choose 7, which is relatively prime to 160 and then I find out 23 is the magic number, that's 7 times 23 mod 160 is, is 1. So this will be my public key and this will be my private key. So now if I want to do an encryption. I want to encrypt the message x equal to 88. All I need to do is, I raise x to the 88 to the power 7, which is the public key. 'Kay? And then this give me, and when I do the mod, 187, give me the ciphertext 11. So, I'm going to send this message 11 to whoever is the receiver, who has this 7 as the public key. So when this person receives the ciphertext y equal to 11, he or she is going to raise this to the power of their public, of their private key, 23. And then do this modular exponentiation operation and then find out the message is 88. [SOUND] So now the question remains is how do we compute 88 to the power 7, or 11 to the power 23 mod 187? And in general when RSA has a large key size of 1000 bits, or maybe 2048 bits how can we compute this modular exponentiation efficiently?
ME Implementation and Vulnerability
We have define the modular exponentiation and we showed how it can be applied into first security applications. Now we are going to show how it can be implemented efficiently. And we are going to sh, show that in such efficient implementation whe, whether there'll be any security leak vulnerabilities. So these are the two simple ways we have seen before how we can implement a to the power e mod n. The first one is the one exponentiation and modular, where we first do the exponentiation and then we do the modular. And the second one is the one we called iterative exponentiation and modular. We keep on multiplying by a, and whenever we got a product larger than n, we'd do a modular operation to keep the value less than n. So, for software engineers, or for people who design crypto-algorithms, what they care is, they care about the correctness of the algorithm. They, what they want is, they say, I want to calculate a to the power e mod n. And it is the job of hardware designers to implement this one efficiently. Efficiency in the terms of computation time, how fast you can find out the results. In terms of energy consumption, how much energy, or how much power it's going to take to do such operation. And for hardware designers, we have been doing our job in terms of how to optimize design all the time. I mean, this particular time, one question we want to ask is, can we do multiplication less than e minus 1 times? Because in both these two approaches you have to multiply a with some number for e minus 1 times. And the answer for this is yes. And I'm going to show you the motivation behind this algorithm here. So first, let's see how do we do, well, do we compute a to the power of 16? And you could say okay, I'll do these things, use 15 multiplications. With 16 a's, I'm going to multiply them together, with 15 of these multiplication operators. And then you also realize, a to the power 16 actually equals a to the power 8, and then to the power 2 again, will square again. And then if you try to generalize this, you realize a to the power of 16 is a squared to the power of 2, to the power of 2, to the power of 2 again. So basically, now what you are doing here, if you follow this operation, you're doing square operation four times. Or you multiply the number with itself four times. So we have, we have reduced the number of multiplications from 15 to 4 by doing this implementation. And however, you are going to say okay, so all [INAUDIBLE], because 16 is a power of 2. What happens if it is not? So for example, if I ask you to compute a to the power 23 instead of a to the power 16. [SOUND] In this case I can say okay, I can also try to make this one more efficient, in the sense that I realize a to the power 23 equals a to the power 16 times a to the power 4 times a to the power 2 times a. So there's no need for me to do 22 multiplications. I can calculate a to the power 16. I can compute a to the power 4. And I can compute a to the power 2. And I multiply them together. So, I want to see, I mean, how many multiplications we are doing here, but definitely less than 22. Okay so this motivates us this algorithm which is called a square and multiply, as I showed you earlier, so try to do all this squares, again the multiplication of the power of 2s, and then multiply them together. And depends on the way how you implement this. There are two ways to implement. One is called the left to right implementation. The other one is called the right to left. So the next two slides I'm going to show you, I mean for both of these two approaches. So the first one is the left to right implementation. And our goal is trying to compute a to the power e mod n. The first step what we need to do is, we want to convert the exponent e to binary, so in this case it is s plus 1 bit number. What we do is we set the result b equal to 1. And then we are going to run a loop here. And the loop is going to start from bit s, which is the left-most bit. Or sometimes, people will call it most significant bit. And then it's going to go down all the way to the last bit, the right-most bit, which people call the least significant bit. So what we do for each page is, doing a loop, we do a square of b, and then we check, if this bit k of, k sub i is 1. If it is 1, we're going to do one more multiplication. If the bit is 0, we're not going to do it. And then we do the loop for this many times. At the end, we're going to return b as the result. So I'm going to demonstrate the of this algorithm by showing a very small example here. Let's say a to the power of 23, as we mentioned from the previous slides. So the first thing I'd do is, I'd convert 23 to binary. Which is 16 plus 4 plus 2 plus 1. So following this pseudo code here I start b equal to 1, and I'm going to start i from the most significant bit, which is this bit 1 here. So when this b is 1, what I do I come inside the four loop, I do a square of b, so it become still 1. And then because this bit is 1, so I'm going to do a multiplication, a times 1 gives me a a. So this is the value of b. So after the first bit, b becomes a. As I move on to the second bit. So I'm going to do a square, a square, from this operation. And then since this bit is 0 I'm going to stop here. Move on to the next iteration. So for the next iteration I'm going to do a square of the current b which is a square, so this become a to the power 4. And because now this bit is 1, so what I'm going to do is, I'm going to multiply this by another a, so now I got a to the power 5. And then I move to the next bit, which I first to do a square operation. So this becomes a to the power 10. And then because this bit is 1, so I multiply this by a again. So it becomes a to the power of 11. As I had reach the last bit, I do the square of a to the power 11. This gives me a to the power 22. And this last bit is also a 1. So I need to multiply this into the power 22 by a again, which gives me a to the power 23, which is exactly what you want to compute. And for each of these steps we are doing these things under the modular operation, which is similar to the one I taught, iterative exponentiation and modular. Whenever I get a number which is larger than n, you do the modular operation. So next, we show the other implementation, which is the one called a right to left, square and multiplication algorithm. The first step is the same. Convert e to a binary number, s plus 1 bit. And the difference between the previous one is highlighted in green. In the next step instead of setting b equal to 1 I set b equal to the power of the last bit, the least significant bit. Initially the starting value of b will either be 1 or a depends on the value of k sub 0. If k sub zero is 1 we start with a. If k sub zero is 0, b starts with a to the power 0, which is 1. Okay? And now also put c, another variable I may need in the middle, because initial value to be a. So what I do now is I start i from 1, the second least significant bit from the right, and move all the way, to the, to the left. That is why it's called right to left implementation, and I move all the way to the most significant bit, okay? So at each bit position what I do is I do a square of c, which is the intermediate variable with, we initialized here. And then if the bit value of k equal to, k sub i equal to 1, I'm going to multiply b by c here. And then I move on to the next iteration. Okay? So let's see from the same example again, a to the power 23. How we calculate this? 23 we know it is 10111 in binary. And then in this example here, c starts with a, and then b starts with a to the power of k sub 0. And our k sub 0 is 1, so b also starts with a. And then I count, start from this bit here. What I do first, I do, I do c equal to c square. So c becomes a square. And because this bit is 1, so I do a a, which is b, the value of b here. So a times c, which becomes a cubed here. Okay, I then move on to the next bit. And I do a square of c, so c is a square, now it becomes a to the power 4. And this again, is a bit 1, so I'm going to multiply the current b with c. The current b is a cube. The current c is a to the power of 4. So what I've got is, I've got a to the power of 7. The next bit is a 0, so I'm going to square the current value of c, which becomes a to the power of 8, and I'm not going to do anything on b. And the last bit is a 1, so I'm going to square the current value of c. So a to the power 8 becomes a to the power 16. And because this is a bit 1 here, so I'm going to multiply the current value of b, which is a to the power 7, by the current value of c, which is a to the power 16. If you multiply them up, you get a to the power 23, which is exactly what we're trying to compute. Okay? So this is a different approach, but they have the same as, they are doing these things which we call the square and multiply, which shows, inside a for loop, you have a square, you have a multiplication. 'Kay? So how efficient this will be? So think about in this case, we do a to the power of 23. If I do the naive implementation, I need to do 22 multiplications. However, if I convert this 23 into binary, I know this loop is going to run one, two, three, four times. So each time I'm going to do a square. So I'm going to do four squares. And then how many multiplications I'm going to need to do? That depends on how many 1s I have here. In this case I have one, two, three 1s. So three multiplications. I did not count the least significant bit, because that has been used here. In my for loop, I start from the second bit. So in this particular example, by doing this square and multiplication algorithm, I need only seven multiplications. And compared to this 22, I use less than one-third of it. So now let's see what is any potential security vulnerabilities in such implementation. So one important thing we observe here is, in this case, what we have here is we have this particular operation, this block, which is going to be executed only if, and only if the bit, the key bit equal to 1. And this operation is multiplication and the modular. And we mentioned that if in the size of key size, in the size of n and the oldest members for for Diffie-Hellman, they're O large numbers. So this will be a non-trivial operation here. 'Kay. And this operation is only going to be done when the key bit is equal to 1. And this leads us to the funnel-, follo-, following thinking here. So what happens if you implement this on your smart card, on numerical devices? And what, your taggers, they can observe from side channel. They can observe, for example, the execution information about this algorithm, whether the key bit is 1 or 0. If it is 1, it's going, probably going to take longer time to do this computation. Is probably going to consumes higher power. It have to have higher current to do that. If it is a 0, probably it doesn't take any time. It doesn't consume that much power. So this is least to us what people call side channel attacks. And what you have seen here is, we have this modular exponentiation operation, which is required for crypto algorithms or crypto primitives. And they're proven to be strong enough. However, when we implement this we implement for performance. Trying to make it faster, trying to make it energy efficient. However we've realized it makes some difference when the key bit equal to 1, or the key bit equal to 0. This leaves a loophole, or leaves vulnerability for attackers. They can attack through what people call side channel attacks.
Montgomery Reduction
We now learn another way to do fast modular multiplication. This method is known as Montgomery Reduction. Let R and N be two integers that are relatively prime to each other. And R is greater than N. For any integer T, which is greater than or equal to 0 and less than the product of N and R, the Montgomery reduction of T mod N with respect to R is defined as the modular multiplication of T and R inverse mod N. Where R inverse it the multiplicative inverse of R mod N. Which means R times R inverse equal to one, mod N. From this definition, we see that Montgomery reduction is nothing but a modular multiplication and, mod, mod N. The following algorithm actually tells us there are some other ways to compute this quantity. We start with some modular modification between T and the negative and inverse mod R. And we call this value m. Next, we add T and m times N, and the sum will be divided by R. The quotient, t, will be compared with N. If N is less than or equal to t, N will be subtracted from t, and the final value of t will be report, reported as the Montgomery reduction. Before we show this claim is true, let's see what is interesting in the Montgomery reduction algorithm. The goal of the algorithm is to compute the Montgomery reduction, which is the modular multiplication and the mod N. But in this algorithm you don't see any modular multiplication in the mod N. What we do see is one modular multiplication in the mod R. This feature actually gives us a way to speed up the modular multiplication, which we'll show later on. Now let's prove this result t is indeed the Montgomery reduction. To prove this is true we have to show two things. First t times R must equal to capital T and the mod N, and second, t must be between zero and N. To see the first equation, we start with the definition of t, and we multiply both sides by capital R. This gives us t times R equal to T plus m times N. When we do a mod N operation, the second term disappeared, so the only thing left is T as required. If t is obtained by subtracting N from the original t, we see this equation still holds. Because it is entered a (mod N) operation. So subtracting or adding a N doesn't change anything. To see the second condition which tells the range of t, we start with the definition of m. So, m is defined as the remainder of T times megative N inverse divided by R, or the mod R operation. From the definition of remainder or mod operation we know m must be between zero and R. So if we multiply this inequality by N, we know zero is less than or equal to m times N and less than N times R. This combined with the range, we select a T, so we have the following inequality. T plus m times N must be greater than or equal to 0, and also less than one N times R plus another N times R, so total less than 2 times N times R. So if we divided this inequality by R, what we get is T plus m times N divided by R will fall into the range of zero and 2 times N. And this quotient, it is exactly the definition of T. So we know that one T is less than, less than N, it is what we need. And if t is greater than or equal to N, and following the definition, N will be subtracted from t, again putting the, the value of t into the right range. This completes the proof. It tells us, indeed, the Montgomery reduction algorithm does give us the correct answer from Mon, from Montgomery reduction. When we plug in the definition of m into the equation for t, indeed we get the Montgomery Reduction algorithm's formula. Where the reduction equal to T plus T times negative N inverse mod R times N divided by R. And pay attention here, at the end we have a mod n operation. And remember in the procedure we do have mod N. So this mod N here is actually a shorthand in the notation. Remember, when we do this division divided by R, we know that if the quotient is less than N we don't do anything. If the quotient of t is greater than or equal to N, we subtract the N from t. This is exactly the modular operation. So that is why we use mod N as a short-handed notation to indicate this condition and this subtraction operation. Next we'll show how we can use Montgomery Reduction to simplify, or to speed up, the computation of modular multiplication, of modular multiplication. So to apply Montgomery reduction, so let's pick integer R, which is greater than N and relatively prime to N. And then we compute the following quantities in order. First, we compute an inverse mod N, and then we do the modular multiplication between a and R mod N, and similarly we do another modular multiplication between b and R mod N. And the last of the two operations, are the Montgomery reductions. First, we compute the Montgomery reduction of a prime times b prime mod N, with respect to R, and then this new result is called c prime. We can calculate the Montgomery reduction of c prime. After we do this, we return the value of c, and the claim is c is indeed the modular multiplication of a times b mod N. And to see this, we start from the definition of c and then plug in the definition of c prime. So what we get is c prime times R inverse equal to a prime times b prime times R inverse times another R inverse. And because this multiplication, they are commutative, so we pair up a prime with one of the R inverse to get a prime times R inverse, which is a, following the definition here. And similarly, b prime times R prime gives us b, which shows this is indeed the modular multiplication. So what we see here is we pick this R satisfies only two conditions. First R must be greater than N. And second, R and N must be relative prime to each other. And so, as long as these two condition are satisfied, we can pick any R we want. So when R happens to be a power of two. So what we know that in digital computer, the multiplication of R in this case will be a shift of k back to the left by k bits. And the division by R with the modular R operation will be a shift to the right by k bits. So all these operations will be trivial operations. So that is why, if we are following the Montgomery reduction algorithm and pick R as a power of two, then this gives us a great potential to optimize the modular exponentiation. So now let's see example to see how this can be performed. Let's see, we want to compute 68 times 57, modular 109. So in this case, we have a equal to 68, b equals to 57, and N equal to 109. So we pick a R which we want to have a power of 2 and greater than N. So we pick R equal to 128, which is 2 to the power 7. And then we compute an inverse and their mods R, which give us 27 101 here. Because 109 times 101 equals to 1 modulo 128. And when we put a negative sign here, it becomes 27. So now we compute a prime, which is the modular multiplication between a and R, in this case 68 times 128 which is 93. And similarly, b prime will be defined as 57, which is the value of b times R, which is 128. And the result is 102. And next we compute using the Montgomery reduction formula to compute the Montgomery reduction of a prime times b prime. So a prime times b prime which is the T here, plus the T times negative N inverse which is 27 mod 128 times N, which is 109, and then divided by R which is 128. And if we simplify this, we realize the result is 178. However, this is greater than N, which is 109. So we subtract 109 from here. We got the result, 69. Next, we do another round of Montgomery Reduction of c prime. So what we do is, we plug in the value of c prime into the formula here, we've got c equal to c prime, which is 69, plus 69 times 27 mod 128 times 101, 109 divided by 128. And this give us a result of 61. To verify this indeed is the result, the correct result, so we used the naive, straightforward multiplication and modular. So when we multiply 68 and 57, we got 3876. And then when we do a mod 109, we got 61. Which is exactly the same value here.
howdy, your websites are really good. I appreciate your work. hardware shop online
ReplyDeleteThank you for another great article. Where else could anyone get that kind of information in such a perfect way of writing? I have a presentation next week, and I am on the look for such information. hardware store in singapore
ReplyDeleteVery interesting blog. Alot of blogs I see these days don't really provide anything that I'm interested in, but I'm most definately interested in this one. Just thought that I would post and let you know. best hardware store Singapore
ReplyDelete