MyClassNotes: Hardware Security - Side Channel Attacks and Countermeasures

Introduction to Side Channel Attacks

Side channel attacks are perhaps the most successful attacks to modern cryptographic systems for two main reasons. First, they target the weakness of the implementation of the crypto-algorithms, not the algorithms themselves. Therefore, the mathematically sound algorithms, can become vulnerable against side channel attacks. Second, these attacks are non invasive. Passive and to most of time will not leave any trace of attack. They use the signals leaked from side channels during system's normal execution, to reveal the secret information. So it is hard to detect and catch such, such attacks. This week, we will start a section of attacks. We will start with what are set channels, why they may mix information, how attackers can take advantage of this vuln, vulnerabilities to launch side channel attacks. We will discuss in details on four representative attacks. Cache attacks, power analysis, including both sink power analysis, and a differential power analysis. Timing attacks, and scan chain attacks. Of course, we will also talk about the counter measures from software, hardware and algorithms. Some background on electrical engineering will be helpful to understand the mechanisms of information leak through different types of side channels. To follow the example we're going to use, you need a good understanding of the con, concept of modular exponentiation and Montgomery reduction that we have to, learned last week. Some basics of assembling programming will be helpful for a couple of examples, and some entry level knowledge about the cache and memory structure are needed, too. I'll try to cover this background as needed during the lecture. Side channel attacks, normally have two phases, a marrying phase and a data analysis phase. During the first phase, the attacker will measure, will monitor the systems physical characteristics when the system is running at the normal mode. This physical, physical features can be power consumption, current, timing or delay, electro magnetic ignition, acoustic information, optical information etc. Then during the second phase the attacker will perform data analysis on the collected side channel data to determine the on-chip secret information of interest. Normally, further measurement will be collected to conf, confirm the deduced secret information. Side channel attacks, are non invasive because they do not require to open up the chip. Some of the attacks do not need to have physical access to the chip either. For example, the EM attacks and acoustic attacks. Side channel attacks are also passive as the attacker will be monitoring the normal operation of the chip. However, there is the trained of combining side channel attack, with active attacks, to improved efficiency and effectiveness of the attack. For example, by doing input control, the attacker can put the chip in the normal operation with his imputed data, and the attacker can use faulty ejection techniques to force the system to run at some of normal conditions. This will help the attacker to collect more valuable section information that are not necess, are not easily available through a passive monitoring process. There are many set channels that can be utilized by the attackers. The power consumption and execution time during the normal operation. The electromagnetic information, the optical, and acoustic emission, as well as the system's output signals. Next, we'll briefly discuss each of this. First, the power and the current side channel is perhaps, that the, the side channel that has been started extensively in the past twenty years or so. Chips power consumption mainly comes from three sources. The dynamic power, power that is needed to charge and a discharge the capacitors. The leakage current that will occur even when the system is idle. And short circuits and others. Information may leak from dynamic power consumption, because it is proportional to the effective capacitors of the switching activity. A switching happens, when we change a logical 0 to a logical 1 or logical 1 to a 0. This will incur higher power consumption than the case when the logical value remains at zero or one. On the other hand, leakage current may leak information about the system because the leakage current of a logical device is related to the input value to this device. For example, for this two input nano gate the leakage current on both input signals are one is about 454 nanoamp, about 12 times higher than the leakage when the input vector is 00. Naturally, timing or delay is related to the execution time of an operation. The execution time of different operations may be different. The execution time of the same operation with different operants under different conditions may also be different. This will give attackers some insight, about the system's internal information. For example, the control flow will leak data when different branches execute very different instructions. Take this if-else statement, for example. It will take longer time to perform the operation x equal to c minus d in the false branch, than the operation x equal to 8 in the true branch. If the attacker can manage to measure the execution of this portion of the code, whether a and b have the same value can be deduced. The data dependency nature of many operations can also leak information. Consider a very simple multiplication operation, x times y and assign the product to x. The execution time when y equal to a number such as 190, will be very different, in particular will be much longer than the case when y equal to zero, y equal to one, or y equal to 64. In the last case, because 64 is 2 to the power 6, so this multiplication normally is implemented by a logical shift which takes much shorter time. In addition monitoring execution time can also reveal whether cache misses and pipeline stores have happened. This may also leak the cache memory content, in particularly, the cache memory content. EM emissions originate from the acceleration of charges near a conductor or an, antennas. In the near field range, which is about two wavelengths from the antenna, the electric and magnetic fields are dominated by EM waves. However, their intensity drops quickly as the distance gets further away. Why data may leak from EM emissions? This is because they can modulate other on-chip signals through near-field inductive and capacitive couplings. This makes it possible to use the EM tracers as set channel signals to these cover operations inside the chip. Optical emissions come from the movement of hard carriers including both electrons and holes. When they move in a fat channel, they can cause a visible or infrared light emission. This can be captured by a charge-coupled device cameras and used to help IC testing and the debugging. If this information about that circuit can be used for testing and the debugging. It is also possible for an attacker to use the same information to extract information about the system. This type of attack can be used to detect the data from all kinds hardware devices, from microcontrollers and smartcards to FPGAs and ASICs. Acoustic information has been used to attack hardware systems for several decades. When a key is entered from the keyboard, or when certain system components are running, sound might be captured. For example, in the 1950s, British intelligence agents determined, the starting position of the Egyptian encryption machine by listening to the resetting of the key wells. In old spy movies we have seen spy movies we have phone numbers leaked from the dial tone. Text printed on certain dot matrix printers can also be detected from the acoustic side channel information. The reason for information leak from acoustic channel is fairly straightforward. The acoustic traces of different key being keyed on the computer are different. Microphones can capture this and other sound information through the system's components during execution. For instance, it was reported that on a home-built PC with RSA encryption implemented, the mic, a micro, when a microphone is placed close by, it can capture the acoustic information to tell when the RSA encryption algorithm is running. Furthermore the runs of the RSA with different key values give different acoustic traces. It is speculated that, the acoustic emission is the result of the piezoelectric properties of the ceramic capacitors used on the mother board of this system. This is related to the power and the current draw on the chip. Finally, we talk about the scan chain sidechannel. This is another perfect example to show how design features that are introduced for good intention can make the system vulnerable. This figure shows a core and a test, and the scan channel built to facilitate the testing. The original five D flip flops, have been modified and connected by this chain through the Q port, of the, of flip-flop to the input of the next flip-flop. Testing value, testing control signal, which is called the TC here can choose the input to the D flip-flop. When TC equal to zero the signal coming from core and the test, will be used to keep the system running at a normal mode. However, when we change the TC value to one, the system changes to the testing mode, and the testing input vector will come from the scanning port and forge to each of the flip flops through this scan chain. For example, if we want to tested the system with stage information zero, zero, one, one, one from left to right, we can push the input vector one, one, one zero zero, one bit at each clock cycle from the the scan in input report. And this, at the end, these, these values will go through the scan chain to the designated flip flops. For example, the first bit one will travel through the first scan flip flop to the second on the second clock cycle, and third one and the fourth one and eventually reach the last flip flop. After that, we can change the test control signal to zero to let the system run at a normal mode, and then the result will be written back to the flip flop. Now we change the test control signal back to, to one for testing mode and then we can get, the results from each flip-flop from the scan out signal. This is one of the most important inventions in the test. It allows the tester to test any system stage at any time. However, this scan flip-flops can be turned into side channel that will leak the system stage information directly, from the sent out signals when the attacker set the TC value to be one for the tasking mode.

Memory Vulnerabilities and Cache Attacks

As the first example of side channel attacks, we use the developing cycle of encipher to show the vulnerabilities in memory access. As I demonstrate several attacks, related it to cache memory. The development of a cipher starts with the algorithm designed by mathematicians or cryptographers. A good cipher, needs to be mathematically solid. For example, the RSA encryption, which is based on the heart integer factorization problem. Once the cipher is proven mathematically strong, software engineers can implement the algorithm with high, high level programming languages such as C. Sometimes, this C code will be fine tuned at assembling level to improve the performance. Then, the code can be put onto the appropriate computing devices ready for execution. Alternatively, the cryptoalgorithms will secure the particles. Can also be implemented directly on hardware. The hardware designers would write HDL code. Synthesize it, and then send it to the foundering for AC fabrication or run it directly on FPGA chips. HDL stands for hardware description language, such as Verilog. AS, ASIC is application specific integrated servers. Such as the, the chip viewed for cell phones. FPGA stands for field programmable gate arrays, which is prefabricated chips that can be programmed to implement different applications. Today we focus on the vulnerabilities of the software implementation of the cipher. This is a simplified architecture of a standard microprocessor or a computing device. It has it's memory hierarchy of the main memory, instruction cache, data cache and register files. This is the central processing unit we call CPU's. It has functional, functional blocks, has controllers and it has the arithmetic and the logical units. In a typical flow of the execution of a software or program, instructions in the data will be loaded from main memory, to instruction cache and the data cache. The register files, are the closest to storage to the CPU and thus have the fast action time. The CPU will take inputs, both instructions and the data, from this, caches and register files, and process them accordingly. The result will be read them back to the memory, either the cache or the main memory. Now assume that we store some of the secret data in the register files doing execution. And that see, that we visit the typical execution flow to see what will be the security vulnerabilities. First, memory load operation will get data from D cache to the register files. This needs the memory access, the memory address of the data. If the memory address is determined by whether it is related to the secret data the secret might leak from the memory address. When a secret data value, the, when the value of the secret data is overwritten by the data from the main memory, there will also be information leaked. For example, when the register file is reset to o 0s, it requires power to override over the ones in the previous registers. But there's almost zero power consumption on the bits that was previously zero. So this may review some of the old information which might be the secret data in the register files. Similarly while memory store operation is performed. Information might leak from ei, from either the memory address or the data to be written to the memory. During the arithmetic and the logical u, u, the logic operations, particularly when the operation is performed at bit level. The secret data may be exposed through that channel. We have seen this earlier from this example of this integer multiplication. Finally, we have also shown earlier that's the date, the data might leak from the control flow of the execution. For example, like a if else statement. Now we use a concrete example to show these vulnerabilities. This is an if, else statement which is very similar to the one we have seen earlier. The difference is, we changed the condition from a not equal to B to A less than b. Here is the control flow graph for this instruction. When variable a is less than b, it will take the true branch and assign constant value eight to x. Otherwise, it will be greater than or equal to b. And it, it will take the false branch, and assign the different between C and the D to X. Now we show the assembling code for this, this statement. The first is, the first line starts with a semicolon. This is the sign of commenting assembly. The next line is to get the address for variable a and store this address in register called r4. This load instruction will load the value of a from this register, from address that is stored in r4 and the value will be stored in register r0. Similarly, the next two lines will get an address of b, put it in r4, and then fetch the value of the address r4 and then store the value in r1. The CMP instruction is to compare the data stored in these two register, r0 and r1. Which stores the value of A and B. And, the next instruction it means that, when the register r0 is greater than or equal to the value stored in register r1. The execution flow goes to the branch. Which is labeled as F block. This slice shows the implementation of the true branch and the false branch. In the true branch, the move statement will move the constant of value X eight. To register r0. This is going to be the value we are going to assign to variable x. And, this is going to get address of x. And then store operation will store value which is occurring at 0 the value 8 to the address, which is the address of x. In the fast branch, we are going to execute the instruction x equal to c minus d. So first like before, we'll cache the address of c, and then get a value of c, store it in zero. And then we get the address of d, and just get the value of d, store it in r1. The next instruction sub is going to, go perform the subtraction. Is going to subtract the value of, in r1 from the value stored in r0. And the difference will be then stored in register r0. The next two statements are very similar to these two, which is the memory store that's going get the address of x. In that store, the valuing are zero, which is the difference between c and the d to add to x. Now, we use this portion of the assembling code to show the memory access vulnerabilities, we have mentioned in a previous slide. First, we see there's memory load operations. There are load on constant, or load the value from the main memory. And if either the address or the value are related to secret data, there might be information leak here. And similarly, we have seen memories do operations. Where we store the value to certain registers and then, then memory address. When, when either the address or the data value carries, carries secret data, these data may get leaked. And next, we have this instruction which is the subtraction, and as we have seen earlier. This sub subtraction that execution time may be different. It depends on the data value. For example always subtract this binary number, from this number. It will take much longer time, than with subtract this one zero which is binary two from this number. because there's no borrow here, but there's a lot of borrow here. Finally, if we count the number of instructions in the true blocks, in the true block, we have one, two, three instructions, but in the false branch, we have one, two, three, four, five, six, seven instructions. So if the attacker can get a hold of this part of the code. And then can monitor the execution time. He will be able to discover, whether A is less than B or not. If A is less than B, it is a very short execution. If A is not less than B, the execution time, might be much longer. Next, we will see how these vulnerabilities, can become attacked through cache memory. Cache attacks can happen when lookup tables are used in the ciphers. And also, the processor uses, cache memory. There are three types of cache attacks. In the trace-driven attacks, the adversary monitors the power consumption of the. Electron magnetic re, reduction traces. In order to figure out whether a cache miss or a cache hit has occurred. This can be used to attack small embedded devices such as smart cards. In time driven attacks, the attacker uses the execution time of that, the encryption operation to determine the secret key or other data. This normally requires a large amount of obs, observations. However, it can be done remotely. This types of attacks have been reported on sm, small micro controllers to large servers. The last attack is called access driven attacks, this happen on processors. The cache memory is shared by multiple processes. The attacker can use process running on the same host of the encryption process to reveal the memory access paten of the cipher, will steal the secret data stored in the cache directly. Now, we'll see a couple of examples. The first one is a trace-driven cache attack on the S-box. In the implementation of the DES algorithm in each round, there will be eight S-box access. The usage of each S-box depends on the input and the encryption key. Assume that each S-box is implemented as a look up table. The idea of this attack is based on the cache hit of these look up tables. In the first round, there's no S-box stored in the cache, so it will be eight cache misses. Like we have seen here, O to eight apps. However in the second round, there might be some cache hit such as the second and the fifth table in this case. Because the usage of the S boxes depends on the input data and the encryption key. Such collision on the lookup tables. Between two rounds, in particular the first of two rounds of the memory access can review information about the, the encryption key. This leads to the following attack where the attacker monitors the cash trace of the first two rounds of the cipher, assuming that the cash is cleared initially. First the attacker uses a random input in the first round. To break the table, to break the table for one S-box, the attacker selects input for the second round such that there will be a cash hit on the target tab, lookout table. Not all the secret keys can result in your cache key on this S-box or the two inputs in these two rungs. Therefore, these keys can be filtered out as the potential candidate for the secret key The attacker can repeat this process again and then again until there's only one valid key left. And that key will be the secret key. It was reported in 2003 that a 56-bit DES key can be revealed with only 2 to the power of 10 inputs. And a key search space of 2 to the power of 32. Much less than the boot first search space of 2 to the power of 56. Similar attacks have successfully broke, broken AES as well. The second attack is also based on, on tracing the cache miss and the cache hit. However, unlike the previous attack which compares the cache activities in the first two rungs of the cache with different input and requires old cache assisting the first rung. This attack attempts to introduce cache misses. First. A random input x is encrypted. We know that if the same x will be encrypted again, there should be a lot of cache hits of the x boxes because all the x boxes are now loaded in the cache. What the attacker does next is to invalidate a cache line occupied by the S-box. Then, when the same input adds is encrypted again. If the cache access to the invalidated cache line is required, there will be a cache miss. More details of this attack and how it has successfully broken AES can be found in this 2005 paper. Timing attacks can break a software limitation of a cipher. It is based on the fact that the execution time of an operation is related to the input and the secret key. Following the requirements for timing attack to be the successful. First, there must be operations whose execution time has timing variations. Second, these operations, and particularly the timing variations, must depend on the values of the secret key. The adversary should be able to measure the timing variations. The number of measurements may not be a constant. It, it depends on the, the amount of reviewed information from these measurement, and how much, how many measurements we need to break a cypher varies. The attackers should have a synchronization signal to identify the start, the starting and completion of the encryption. Finally, the design of the crypto-system and the implementation of the cipher should be known to the attack. There are many practical ways to prevent cache attacks. At application level, because cache attacks rely on the look-up table implementation of S boxes. If we do not use look-up tables, these attacks will fail. Also, most of the cache tags are unable to distinguish data in the same cache block. If we use small tables that can fit into one single memory block, less information will be leak. Cache warming is a technique that loads the look-up table to the cache before the execution. Or before them, in particular before the encryption to avoid cache misses. There are several approaches to change the path of memory access so it becomes ob, oblivious to the order that data is passed to the encryption algorithms. For example, when multiple access to different memory locations I needed for the encrypts, for the execution, we can use an arbitrary order of memory access to confuse the attacker. Another way is to request a data from each memory block and use only the one needed. This will leave almost identical memory activities for the attacker to monitor. However, it does in, introduce overhead in runtime and power consumption. The execution time of certain operations depends on the key but it can be modified to hide this, this dependency for example, in cache warming. Warm memory access will be cache me heat and this will give almost a constant execution time. Meanwhile it is also possible to introduce random delay to help us gage the execution time that can be measured by the attacker. Hardware can be modified as well to prevent cache attacks or make them much more difficult. For example, we can change certain memory pages to non-cacheable memory. This will make all the cache access miss and to prevent all cache attacks. Special instructions can be added to the instruction set in architecture, or ISA, or specialized cache can be designed to counter node cache attacks. Data prefetching is a popular performance enhancement tech, technique. It will fetch data from main memory to the cache before it is used. By pre pa, by pre fetching we can increase cache hit and confuse the adversary. This completes the discussion on cache attacks. Next time, we will talk, talk about power analysis attacks.

Power Analysis

We now discuss the power analysis attacks. We will cover both simple power analysis and differential power analysis. In simple power analysis, the attacker attempts to visually exam, the system's current or power waves forms. Do, during execution, your ordered to discovery information about the data operation. This is based on the fact that the system will consume different amount of power when perform different operations or the same operation with different input data. The power and the current it traces can be upturned by oscilloscopes, high frequency components that do not reflect the data induced variations can also be filtered out. For devices such as smart cards, the power and the current information can be read directly from the terminals. The oscilloscopes and other equipment to measure power or current are widely used in the design, testing, and the debugging of the systems. They are relatively inexpensive and they can provide a pretty high precision, of the measurement. This is the pseudo code, of the square and the multiply algorithm, that we have learned last week. It is one of the most efficient ways to compute the value of a to the power e mod n. Remember, that's this multiplication operation, will only be executed when ki equals to one. This multiplication is a non trivial operation and its data dependent execution variation, can be observed through such channels, such as power, current and timing. We have mentioned that this might lead to side channel attacks that can reveal the value of the secret key. Now, we will see how this can happen. Square, is a special case of multiplication. Since it is a very popular operation in signal processing and other applications. Many systems, have special implementation of square that is normally more efficient than multiplication. This results in measurable shorter time and less power consumption to compute B square than A times B. From this simplified power trace, we can distinguish the square and the multiply op, operations. The x-axis is time, the y-axis can be seen as either the power or delay. Square is short, and the two bars are multiplication. Now recall that multiplication is only executed, when the key value is one. We can easily retrieve the key value, from this trace. First, the two bars, corresponds to the power or current, for multiplication. Therefore, they, these bars and the shorter square bars immediately in front of them, are related to logical ones in, in the secret key. The other ones only square operations, they are corresponds to logical zero in the key. And this is a way, how we can retrieve, the secret key that has been used to compute this ex, modular exponentiation. Simple power analysis, not only can reveal the key data value. It can also leak information, such as which round of the encryption is performed. However, as we have said earlier today, simple power analysis relies on the visible information from the power or current traces without using analyzing tools. Large number of traces, do not help much. This also implies that the precision of this side channel information, should be relatively high. To deduce secret data about the decrypted algorithm, from the very limited visual evidence in the power and the current traces. A good understanding of the crypto algorithm and it's hardware, software implementation, will be critical to the success of a simple power analysis. Simple power analysis attacks, are noninvasive, because the attackers, will only collect the power and the current traces, during system's normal execution. They do not leave much evidence, of the attacking either. However, it is possible to compare power analysis attacks, with other passive attacks. For example, if the attackers can control the input signals, they can have the sys, system running with inputted data of their choice to obtain or verify the deduced information from power trace. A more effective way, is to use fault injection message. So, the system will run at an abnormal mode, and the power the traces, may leak more valuable information. Because of the simplicity of simple power analysis, there are many countermeasures, that can effectively prevent simple power analysis attacks, with, with little or no cost. Therefore, advanced power analysis method, known as the differential power analysis, has been proposed. Differential power analysis attacks, require a large amount of power or current wave forms. However, the precision of these wave forms does not need to be very high. Then, with this data, the attacker will build a model, with some hypothesis about the secret key information, advance the data analysis techniques, such as statistical message. Nobody in the form of computer softwares will be used to reveal the key information. Differential Power Analysis, can be performed in any algorithms, that use the S box with the exclusive or of the segmented key and some known data are for as input. For differential power analysis attack to be successful, the following conditions are needed. First, the attacker must denote the crypto algorithm under attack. It is well known that the security of for cipher or cryptosystem is based on how hard it is to break the secret key, not on which crypto algorithm the system uses. But in practice, knowing which crypto algorithm the system uses can be very, very helpful for the attacker. Second, as we have mentioned in the last two slides, differential power analsi, analysis attacks need a large amount of power traces, this normally implies that the attacker will have physical control of the system for some amount of time to collect such data. Third, unlike the superpower analysis attacks, that do not need much help from software tools, most d, d, differential power analysis attacks require some statistical analysis tools to study the data and extract the secret information. The two big advantages of differential power analysis attacks compared to single power analysis are that they do not need to know the hardware and the software implementation details of the crypto devices. And they do not need to the power and the current waveforms to be very accurate either. Now, we see something else about differential power analysis attacks starting with the data collection phase. The algorithm here in the bl, the black box or the key that is using this algorithm is the target of the differential power analysis attack. The attacker will run algorithm with the piece of input data M sub i, and to obtain output C sub i. Meanwhile, the power wave form W i, during the execution, will be captured. The attacker will then repeat this process to collect more power waveforms W sub i. Normally, from hundreds to hundreds of thousands of measurements might be needed for successful differential power analysis attack. This number depends on the crytosystem and their attack and also the. The quality of the [INAUDIBLE] information. After the data collection, now the attacker has a lot of triples of the inputs text M sub i, [INAUDIBLE] text C sub i, and the power [INAUDIBLE] curved waveform W sub i. Assuming that the attacker knows the cryptoalgorithm being targeted, and wants to figure out the key only, that function f be the operation, or transformation, this algorithm is performing on the inputed data N, with hidden secret key. The attacker can compute output L sub i when the input and M sub i is supplied or plugged to the function F with any possible key value of K. He then can select a bit position j in the output L i, and let's call this bit Ls of I J. The large amount of data the attacker has collected will then be petitioned into two disjoined, which means now overlapping subset as sub 0 and as sub 1. In S sub 0 it has all the triplers of Mi, Ci and Wi, where L sub ij is 0. And in Si, it has all the data where Lij has values 1, that the j speed position. Then the attacker can compute the average wave formation, if each of these two subsets, which is calculated here by summing them up and divided by number of piece of data's in each of the set. And then we can com, calculate the difference between these two average, we can them data, and also we call them as the DPA value. Based on this value, the attacker can build model, or they can do the hypothesis test to deduce to correct key. If an incorrect key K is used to compute the, the output L sub i, the values of L i and the new output, C of i will not have any correlation. Recall that the two subsets, S0 and S1 there are partitions based on the bit value of Lij. But the power wave form, Wi's, they come from the real computation of Ci's. In this case because the Li's and the Ci's can not correlate it. Therefore, in each of this set when we compute the average the power wave form collected will, will up zero. And those will, will, when hubbard is one. So, when we do the average we will have very similar values. So, their difference, which is the DPA value will be very close to zero. However, when the correct key K is used, the computage of L sub I and the real output C sub I should be the same. Therefore, the power wave forms, W I in the same subset should be similar. And the wave forms from different subsets should be different. This will result in a large gap between the average power of, for zero and the average power, or average current for one. The difference is the delta, is the delta, or the DPA value. So, we can locate the correct key with high probability by looking at the key values that cause peak values in DPA. This slide shows how DPA breaks, how differential power analysis attacks breaks 128-bit AES key or 16 bytes. As described earlier, first large amount of input. Text x sub i will be encrypted to generate the ciphertext C sub i. And then the power wave forms will be collected. Then for each byte K sub i, which has 8 bits, there will be 2 to the power h, with 256 possible key values. We use each of them as the key to compute the output decipher text, L sub i, then to the partition of the collected power wave forms based on L sub i, and to compute the DPA values. Because there are 16 bytes, so the total number of DPA values we have to compute will be 256 times 16, which is 4,096. The following figure, figures shows the result. Where we can clearly see there are a few picks. For example, the guess 1 here has the largest to pick, which indicates this, most likely, will be the correct key. It is not-trivial to compute the DPA value. Given that the power wave forms we will have to collect may be tens or hundreds of thousands. So, doing DPA type computation for 4,096 times is not an easy task. However this can lead us to break the 128 AES key. If we do the brute math search, the key space will be 2 to the power 128, this is much larger then 4,096, which is only 2 to the power of 12.

More Attacks and Countermeasures

We have started cache attacks and power analysis. In this lecture, we will talk about two other kinds of side channel attacks. Timing attacks and scan chain attacks, then we will conclude with some common countermeasures for side channel attacks. We will continue to use the RSA algorithm as an example to show these attacks. Here is a short review of the RSA algorithm. During the key generation phase, we pick two large prime numbers p and q, we take their product as n, and choose the encryption key, e to be relatively primed with e me p minus 1 time q minus 1. That we compute for the value of d such that e times d is equivalent to 1 mi p minus 1 times q, q minus 1. Given the value of p minus 1 times q minus 1, and the secret times the key e, the value of d will be unique. Next time we will talk more about this and give a practical message to compute d. The pair e and n, will be used as the public key, and and the d, value of d will be kept as the private key. The security of RSA algorithm relies on the assumption that it is difficult to factor a large integer number, n, into p and q. To encrypt a message m we compute m to the power e mod n and use this as the cypher text we call c. To decrypt the cypher text c, we compute c to the power d mod n and this will give us the original message, m. If we use the naive square and the multiply all of them to increment the modular exponentiation, we know that there will be security vulnerabilities. Based on this observation, a timing attack was proposed to detect the exponent which is the private key. The attacker will guess some bits of the exponent and use this to predict the execution time for decryption. Then he will try to monitor the decryption and compare the actual runtime with his prediction. If the guess is correct, there will be some correlation in the execution time between the prediction and the real execution time. Otherwise the prediction will look like random. It is also suggested to start at the guess from the leading bits or the significant bits. And use the correlation for each guess to select new guesses for the next round, until the exponent is found. This is the Montgomery reduction method we have learned the last week. It is a fast implementation of modular multiplication, and it can be used to compute modular exponentiation. When T is less then N times R, we need to do this, we need to this sub the, this is Montgomery Reduction will fall between 0 and N. However for large T when T is larger then N times R we need to do this subtraction multiple times to get its Montgomery reduction to be less then N. Schindler observes that the probability of this subtraction step is proportional to the value of c mod q. If c is close to q, a lot of subtractions might be needed. If c mod q is 0, very few subtractions will be needed. Knowing this, an attacker can guess the value of q by monitoring the decryption time for different ciphertext c. When the decryption time is short, the ciphertext will be close to the multiple of q and this will help him to find the value of q. Remember that the security of the RSA algorithm is based on the hard integer factorization problem. When q is discovered the algorithm will be broken because when we have q we can divide M by q to find the value of P. Here is a more detailed description of how to find the value of p. Suppose a, 1024 bit RSA system is used. Our initial guess of q of g will be between 2 to the power of 51 and 2 to the power 512, I'm sorry, will be between 2 to the power of 511 and 2 to the power of 512. Then we will try both the possible guesses for the top few bits. Now suppose we know the top I minus 1 bits of q are 101001. To guess the next bit which is the ice bit, we set g equal to 1010010 followed by all zeroes. And the g high to be 1010011 followed by all zeroes. If q is less than g high, then the i speed of q must be 0. If g, if q, is greater then g high, then the i-th bit of q must be 1. So another question is how can we determine rather q is less than or greater than, than ghi. We measured the decryption time with two guesses, g and a ghi. If q is greater than ghi like in this case we know that the two decryption times will be relatively close to each other. However when q is less than g high because most attractions will be performed for the case of g. Then the case of g high, we will expect a much bigger gap like this. So by comparing the decryption time, we can determine the next bid for q. By, by checking whether, whether q is larger than or smaller than g high. Then use this information to de, decide whether the next bit for two is 0 or 1. Now, we show two types of scan chain attacks very briefly. This is the scan chain and it's test of control signal TC we have discussed last time. We call that one TC equal to 0, the system will be added to normal running mode when TC equal to 1 it will be at detached mode. Attackers can, can, can make a tag based on the observability of the scanned art board as follows. First, he will set TC equal to 0, and provide certain primary inputs to the system and let the system run for one or more clock cycles. Then he can change the TC signal to 1 to have the system enter the test mode to capture the internal state of the system from the scan out port. And the attacker could repeat this process to collect as many information as he needed. So if this attack takes advantage of the of the ability of the Sget out port. The next one shows how attacker can take advantage of the controllability of the scan in port launch a different type of attack. So, this is another one based on the scan, another scan chain based attack that takes advantage after controllability of the scanning port. Notice that the only difference between these two is in the first step. If the attacker now can change the system to the test mode and use the scan import to inject what have whichever stage of the system to these scan flip flops. Note that this can also be used as a fault injection attack. Now, we discuss some com, common countermeasures for side channel attacks. Hiding based approaches, try to hide the secret information from the side channel so it will become more difficult for the attacker to extract use for data. Noise, noise generators can add noise to the side channel at execution time. For example, electron magnetic noise can be added. Additional power or delay can be added randomly to disguise the data dependence side channel information. On the other hand special logic units can also be used during the design phase to make the power with delay independent of the input data or the secret key to prevent side channel attacks. Adding as, asynchronous logic components in the design will also eliminate clock, and that will make many scan, scan, many, side, scan but many side channel attacks fail. Low power design techniques can reduce the power consumption of the system, and it can make the make the signals weaker. In some sense this will increase the difficulty for side channel attacks. Side channels can, can be shielded physically to prevent information leak. For example, upper level metal layer can prevent EM radiation and sound dampening materials can reduce acoustic ignition [NOISE] Another category of countermeasures is known as masking or blending. These approaches try to remove the correlation between input data or the secret key of from the side channel emissions. For instance at gate level, exclusive or in the output of a logical unit with some pre-selected data will make the real output will can mask the real output. It can also be applied at work level to give the logical units randomized input the data. However this will increase the design complexity because the internal design of the logical units now needs to be modified in order to generate the correct output. Design partitioning work separation can also help against the side channel attacks. Traditionally plan tags and the cipher tags should be separated in the memory which is known as the red black separation. On arm chip infrastructures like the power supply rails, clock networks and the testing facilities should also be separated for crypto our operations and other applications. The emergence 3D IC design add the split fabrication methods that we have discussed for IP protection can also be used to build separation at fabrication level. Finally physical sa, security can be used to keep the attackers away from the proximity, access, and possession of the system under attack. For example, acoustic shielding can protect acoustic emission. The secure construction zooming has been used for, for decades to prevent potential em emission attacks. We have seen many countermeasures from both hardware and software as well as system level design. Most of this will encourage that overhead with performance degradation. Sometimes a counter measure against the one type of stack strain of attack will make the system vulnerable to another type of stack strain of attack. Next time we will talk about counter measures directly from the design of the cryptography. So we don't have to change the well established de, de, the design and implementation process-

Modified Modular Exponentiation

We have learned modular exponentiations. And its security vulnerabilities. Today we're going to talk about, how to fix these security vulnerabilities from algorithm perspective. And this approach is called Randomized Modular Exponentiation. We know in modular exponentiation, our goal is to compute x to the power of d, mod N. And we know the attacker's goal is trying find the value of the exponents d, d. And the most popular implementation of this is the called square and multiply algorithm. And we have seen the vulnerabilities in that algorithm. And in the randomized modular exponentiation algorithm, there are several steps. First, we will chose three random numbers, r1, r2, and r3. And then we're going to change the three important numbers x, d and N in the modular exponentiation to x prime, d prime, N prime, following the following formulas. X prime equal to x plus r1 times N. d1 equal to, d-prime equal to d plus r2 time phi of N. Phi is the unitless phi function, which we're going to define next. N prime equal to r3 times N. And then instead of computing the modular exponentiation of x to the power d mod N. We compute x prime to the power d prime, mod N prime. And we call this result y prime. And after this, we do a modular y prime mod N, and we call the result y double prime. The claim we have here is. This y double prime is the same as y, which is the modular exponentiation we're trying to compute. Let's first do a quick review of the Square and Multiply algorithm for modular exponentiation. And then see, what is the vulnerability here. And the example we have here is we want to compute 15 to the power of 47, mod 26. The first step is to convert the decimal 47 to binary. And here's the standard procedure how to do the decimal, to binary conversion. We divide 47 by 2, the quotient is 23, and has a 1 as remainder. And then we divide the quotient 23, by 2 again it's a new quotient 11 and a remainder 1. Then we keep on dividing. The quotient by 2, and then get the new quotient get a remainder, and then keep on doing this. 5 divided by 2 equal the 2 with 1 as a remainder. And 2 divided by 2 with 1 as the quotient, and there's no remainder. And finally, 1 divided by 2, with 0 as quotient and 1 as a remainder, then we stop. So now with all these remainders we try to write these remainders from bottom to top, bottom up and then this new sequence is the binary representation of 2. Let me repeat again. So to convert a decimal number, 47 in this case, to binary, all we need to do is keep on dividing 47 by 2, get the quotient, get the remainder. And then keep on dividing the quotient by 2. Get a new quotient. And then keep on doing the division. And then, we keep track of all the remainders and write down them backwards from the first, last one to the first one. And this is the binary number for binary equivalent of 47. Now we use the left to right Square and the Multiply algorithm to compute 15 to the power 47 mod 26. We first write down the binary representation of 47. Start from the left most bit 1, 0, 1, 1, 1, 1. Remember when we have, when we do this square in the multiply algorithm, and when we move one page to another page, we square the current value we have. And then once the bit is a 1, we do a multiply by 15. So we start with 15, then we do 15 square because this is 0, so we don't do any further multiplication. 15 squared is 225, or modular 26. This is negative 9, or it is positive 17. Next, we do a square of negative 9, and because the bit is 1, so we multiply this by 15. The square of negative 9 is 81. 81 mod 26 is 3, because 78 divided 26 equal to 3. So 3 times 15 equals 45 and double 26 is 52, so 45 is 7 away from 52. So 45 equal to negative 7 mod 26. So next, we have another 1, so we square the negative 7 we have, and then multiply by 15. This gives us 49 times 15. 49 mod 26 is negative 3. And negative 3 times 15 is negative 45. And because 45 mod 26 is negative 7, so negative 45 mod 26 must be positive 7. And we keep on doing this. Square the 7 here. Multiplied by 15 because the big 1. So we've got 49 times 15 again which is 7. And we do this again. So eventually, we know that's 15 to the power of 47 mod 26 is equal to 7. However, we have pointed out there is a vulnerability in this calculation. Because, whenever the bit is 1, we do one step, which is the multiplication. Hence, if the the attacker, they can monitor the stat channel information, they can figure out this extra step. And then, reveal what is the secret key here, or the value of the exponent. Before we talk about, how to fix this with the randomized modular exponentiation? We're going to introduce the Euler's phi function as we have promised. Euler's phi function, is actually the number of positive integers less than or equal to n and relatively prime to n. And this stands for the great common divisor between k and n is 1, which means. They are relatively prime to each other. We'll see a few examples here. So first, phi of 2 is only a number 1, which is relative prime to 2, so phi of 2 equal to 1. For 3, we know both 1 and 2, they are relative prime to 3, so phi of 3 equal to 3. Phi of 5, all the numbers less than 5, 1, 2, 3, 4, 5, 4. They are relative prime to 5. So phi of 5 is 4. At this point, we can reasoning that for any prime number p, phi of p should be p minus 1. Because all the numbers from 1 to p minus 1, they are relatively prime to p. Next we see phi of 10. And we see number 1, 3, 7, and 9. These are the four numbers, that are relatively prime to 10. So phi of 10 equal to 4, which happens to be phi2 time phi of 5. Which is 1 times 4. And the last example is phi of 15. So we know that 3, is relative prime to 15. 5, 6, 7, 9 I'm sorry 9, 10, 12 and 15. Those are the numbers that are not relative prime to 15. These remaining numbers, they're all relative prime to 15. And if count these, we have 1, 2, 3, 4, 5, 6, 7, 8 numbers, which happens to be the product of phi of 3, and phi of 5. So for these two observations here, it happens they are not a coincidence. Indeed we have this formula here. So if we have two numbers m and n, they're relatively prime to each other. And so phi of m times n equal to phi of m times phi of n. Actually to prove this is pretty straightforward. So we first count how many numbers between 1 and, n times m and not, are not relatively prime to n time m. So from 1 to m, there m minus phi of m numbers, that are not relative prime to n from the definition of phi of m. And for each of these numbers, and if they multiply by n, all these numbers. And multiply from 1 to n, and the product will not be relative prime to n times n. So these are the numbers, that will not be relatively primed to m times n. And for the numbers between 1 and m, that are a relative prime to m. If the second number, which are not relative prime to n, their product will not be relative prime to m times n. It's follows the si, similar argument as the first term. So if we multiply this and then we see the phi of m times n with a minus. And phi of m times n with plus these two terms, they cancel each other. So, what we have is m times n, minus phi of m times phi of n. So these are the numbers between 1 and m times n, that are not relatively prime to m times n. So the number that are relative prime to m over n, will be m times m minus this quantity, which happens to be phi of m times phi of n which is what we have claimed here. To see a concrete example, we consider phi of 10. So, three of 2 which is 1 number, 1 here. Phi 5, which is 4, contains the number 1, 2, 3 and 4. So for the numbers that are not relative prime to 10 from the first term, we have 1 here. And this 1 can be multiplied by all the numbers between 1 and 5. So we have this 2 here, which is not relative prime to 2. 2 times 1, 2 times 2, 2 times 3, 2 times 4, 2 times 5. These are the five numbers that are not relative prime to 10. And then for the second term, for the phi m, which is this 1 here. And n minus phi, phi of n. In this case, there's only one number, which is 5, not relative prime to 5. So we get the number 5 here. So in total, we have one, two three, four, five, six numbers that are not relatively prime to 10. Which tells us phi of 10 equals 10 minus 6 which is 4. Indeed, we have the Euler's Product Formula, which we're not going to keep the proof. It says that, if we can do the factorization of n, we go to p1 to the power of k1 times p2 to the power of k2. Times et cetera to pm times, to the k, km where all this pi's, p1, p2, p of pm, they are distinct prime numbers. If we have this prime factorization of n, Euler says phi of n equals to n times 1 minus 1 over p1, times 1 minus 1 over p2, times...until 1 minus 1 over pm. The next important thing about Euler's phi function, is the one called Euler's Theorem. Which happens to be one of the most elegant theorem and number theories. So it says, we have two numbers, a and n. They are relatively prime to each other. And then we this phi of n which is the phi function. It says a is to the power of phi of n mod n equal to 1. And, let's see one very simple application of this Euler's Theorem. So let's see, we want to find the modular multiplicative inverse. If we have two n and m, a and n they are relative prime to each other. From Euler's theorem, we claim that the modulo inverse of a, which is a negative 1, equal to a of phi n minus 1. And to see this is relatively easy. So what we do a multiplied by a to the power phi n minus 1. This give us a to the phi n. And followed Euler's Theorem, this tells us this is one. So, we see a times a to the power of phi n minus 1 equal to 1 mod n. That is exactly the definition of modular inverse. To see example of this. We know 7 and 10, they are relative prime to each other. And we have phi of 10 equal to 4. From this we know that the modulo inverse of 7 mod 10, is indeed 7 to the power 4 minus 1, which is 3. So 7 cubed is 7 squared, which is 47 times 7, 49 mod 10 is 9, so this 9 times 7 which is 63. And modular 10, mod, mod 10 is 3. And to verify this, we see 7 times 3 equal to 21, which mod 10 is 1. So now we have all the necessary background we need. To learn the randomized modular exponentiation. So let's first see example, see how it works, and then give a formal proof of this. So this is the, the procedure to do the Randomized Modular Exponentiation where we select the three then, random numbers. In this case, let's pick r1 equal to 4, r2 equal to 7, and r3 equal to 5, randomly. [SOUND] Following the definition of x prime, d prime and N prime, we can compute these quantities. Oh, by the way, phi of 26 equal to phi of 2 times phi of 13 which is 12. So x primes equal to x, which is 15 plus r1, which is 4. Multiply by N which is 26. This give us 119. d prime equal to d, which is 47 here. Plus r2, which is 7 times phi of N which is 12. So this is 47 plus 81 give us 131. Finally N prime. Is defined as r3 times N. So in this case 5 times 26 is 130. So now we can compute the modulo exponentiation. Instead of doing 15 to the power 47 mod 26, we compute 119, which is x prime. To the power of d prime which is 131. And the mod m prime which is 130. So again, the first step, is to convert 131 in to binary. So this is a procedure, as we have outlined earlier. And I'm not going to go through the details again. And we do the, keep on divided by 2 and record all the remainders and write them bottom up, so we have binary representation. And then we can use the left to right square and multiply algorithm to compute and find out 119 raised to the power of 131, mod 130 is indeed 59. And now, we follow the last step to do a mod N, which is 26. So 59 mod 26 is 7. And what we see here, this is the same 7. As we have computed earlier. So when we take a look of this, we do compute a modular exponentiation during this procedure. However, this modular exponentiation to take us may see some information about 131. And this, from this they did not get much information about the secret of D here, which is 47. Let's now give a few more proof, of the randomized modular exponentiation. To show its correctness. So let's say we have two numbers a which is equal to c to mod p. So we can write a as c plus p times the integer k1. And similarly, if b equal to d mod p. We can write b as d plus p times k2. So the product of a and the b can be written as c plus p times k1 times d plus p times k2. And if we ex, if, if we break the parentheses, and when we do the mods operation of mod p. All these terms disappeared except cd, which means if a and, a and c they're equivalent mod p, b and d are equivalent mod p. And then their product, a times b, will be equivalent, will be the same as c times d mod p. So now let's see the modified or randomized modular exponentiation x prime, to the power d prime. So when we plug in the definition of x prime, which is x plus r1 times N, hence the definition of d prime, which is d plus r2 time phi of N. So we this, this equation here. And then once we separate the exponents, we get the first term times the second term. So let's denote the first term with a, and then let's do a mod N prime operation here. So if a equal to x plus r1 times N to the power d mod N prime, we can write a equal to this plus N prime, which is r3 times N times the integer k1. And if we mod N of this quantity. You see this term disappeared after mod n, and for this binomial exponent, by following the binomial equation formula here, for this term when we do a mod n the only thing remaining will be x to the power d for the second term. That's column b which is x plus r times N, raised to the power r2 times phi of N, mod N prime. And so b can be written as this plus N prime, which is r2 times N times k2. And if we do a mod N operation, so this term disappeared. And for this term, and because we do mod N. So the only thing left is x to the power r2 times phi of n. Now because of Euler's Theorem, we know this x to the power of phi of N mod N equal to 1. So eventually, what we get here is x double prime will equal to x prime to the power, raised to the power d prime, will equal to a times b. And a will be x to the power d times b, which is 1 mod N. So this eventually is x to the power d. Which is y mod N. So this shows that the randomized model exponentiation, gives the correct answer after- After model exponentiation adds to the power d. However, the secret, which is the exponence d will be hiden. The modular exponentiation we're doing here is, x prime to the power of d prime, mod N prime.

MyClassNotes

Monday, August 31, 2015

Hardware Security - Side Channel Attacks and Countermeasures - Week 4

1 comment: