18 CHAPTER 2: MONOALPHABETICAL SUBSTITUTION CIPHER 2.1 Definiton of substitution cipher: A replacement figure is a sort of encryption where characters or units of text are supplanted b
Trang 1FACULTY OF INFORMATION TECHNOLOGY
ĐẠI HỌC TON ĐỨC THẮNG
THE MIDTERM OF PROBABILITIES AND STATISTICS
RESEARCH ON ENCRYPTION AND
DECRYPTION
Instructors: MR.NGUYEN QUOC BINH Student MAI BAO THACH — 520H0490
Class: 20H50304 Course:
HO CHI MINH CITY, 2021
Trang 2VIETNAM GENERAL CONFEDERATION OF LABOR
TON DUC THANG UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY
ĐẠI HỌC TÔN ĐỨC THẮNG TON DUC THANG UNIVERSITY
THE MIDTERM OF PROBABILITIES AND STATISTICS
RESEARCH ON ENCRYPTION AND
DECRYPTION
Instructor: MR.NGUYEN QUOC BINH
Student: MAI BAO THACH - 520H0490
Class: 20H50304 Course: 24
HO CHI MINH CITY, 2021
Trang 3After working for a half semester with the enthusiastic help and support of Mr Nguyén Quéc Binh, I was able to complete the report in the most complete and effective way His teaching has given our students a lot of knowledge as well as full skills in the specialized subject Although couple of months is quite short, but that time has also helped me to easily approach the major step by step with a solid foundation, especially with the encouragement and help from seasoned lecturers
I sincerely thank
Trang 4REPORT COMPLETED AT TON DUC THANG
UNIVERSITY
I hereby declare that this is my own report and is under the guidance of Mr Nguyễn Quéc B The research contents and results in this topic are honest and have not been published in any form before The data in the tables for analysis, comments and evaluation are collected by the author himself from different sources, clearly stated in the reference section
In addition, the project also uses a number of comments, assessments as well
as data of other authors, other agencies and organizations, with citations and source annotations
If I find any fraud I take full responsibility for the content of my report Ton Duc Thang University is not related to copyright and copyright violations caused
by me during the implementation process (if any)
Ho Chi Minh city, 09 April, 2022
Author (sign and write full name)
Mai Bao Thach
Trang 5Confirmation section of the instructors
Ho Chi Minh city, day month year (sign and write full name)
The evaluation part of the lecturer marks the report
Ho Chi Minh city, day month year (sign and write full name)
Trang 7W)./.9).92000A./9).4))0 1 3
REPORT COMPLETED AT TON DỤC THANG UNIVERSITY 4
-TEACHERS CONEIRMATION AND ASSESSMENT SECTION 5
SUMMARY — 6
WV.).)00/9)/299)/00011.177.557 7
J9) 00.)).).4020009) 0 9
LIST OF DIAGRAMS, CHARTS AND TABLES - sec 10 CHAPTER 1: INTRODUCTION - Án HH HH gry 12 '52NNH).2 5 12
'P“N) lo na 13
hà in hố 14
1.4 Asymmectric CTYJẨOSYSÉGI: Ăn ng ngư 15 CHAPTER 2: MONOALPHABETICAL SUBSTTTUTION CIPHER 18 2.1 Definiton of substitution cÏpher: Ặ. ng gen 18
2.3 Idea oŸ solution and algoriflhim: s5 5s Son ng gen 19 2.4 Example and evaluation with analysÌS: .- - Sen sseeeexee 19
3.3 Idea oŸ solution and algoriflhim: s5 5 ng ng nen 23 3.4 Example and evaÌuafÏ0H: - - - - - SH HH ng 23
Trang 8CHAPTER 4: EXPERIMENTS ON PYTHON -oeeeeeeeere 26
40.40000111 45
Trang 9AES: Advanced Encryption Standard
Trang 1010
LIST OF DIAGRAMS, CHARTS AND TABLES
[iu 0 305 010 17
[i08 si oa40119) 0070070587 17
Picture 3 )40014115ì6 SN 17
Picture 4 Example of cipher K€WS§ - - «ch HH HH HH rư 19 Picture 5 Monoalphabetic Substitution Cipher Illustration - << «2 21 Picture 6 Frequency table II 22
Picture 7 Values for EncryPfIOI - - «+ 1x nh HH ch nh nên 26 Picture 8 Source code of Encryption aÏlgOrIfhim - «cành ren 27 Picture 9 Validation Of Ï€f€TS .- - 5 - «+ «SH HE nh nên 28 Picture TƠ AlBOTI{Hirm - <5 <5 3 1 HH re 29 Picture 11 Importing testcases eee eeeeeceeeeececeeeesecceseeceseeecesaeeceseecesaeeseeeeons 30 Picture 12 Generated cipher key§ Ì - - «Ăn 1H nh HH kh nh ren 30 Picture 13 Testcase 2a 30
Picture E00 31
lai f8 € 2i ovi:-o0oii 00c, Vn 31
Picture 16 Testcase 33
Picture 17 Generated cipher key§ Ổ - << «HH nh HH ch nh rên 33 Picture 18 Testcase mm 35 Picture 19 Generated cipher keyS 4 - << sành HH ch nh rên 36 Picture 20 Erequency table In Python Error! Bookmark not defined Picture 21 Import from 1nput values ẨIÏ€ cece ceeeee cee eeeceneeseeeaeeceeeeceeeeees 38 Picture 22 Source code for Decryption algOTItHm - - «+ se sec sen 39 Picture 23 Code of checking ©XISEITE << HH nh ng cư 40
Picture 25 Code of replacing letter with frequency table - -«c+sc+sxssse2 42
Trang 11Picture 26 Testcase
Trang 12The process is operate like this:
Sender - Plain text — Encrypt — Transfer — Decrypt — Authorized users
Moreover encryption is a crucial part for any individuals or organizations to prevent hacker from robbing their sensitive information or code
Here is the example:
When a bank want to deliver someone9s credit card and their account numbers, those information need to encrypt in order to reduce the possibility of theft The way
to encrypt and its application and experiments are called as cryptography
How it works ?
The research given that the encryption strength depends a lot on the length of the security key In 20" century, the first length that developer use is 40-bit encryption with 2*° possible permuations and 56-bit format But now on, the hackers are so powerful and can break the defense of these formats through brute-force attack easily, this led to 128 bit-system becoming the standard of encryption9s length
Trang 13For instance, The Advanced Encryption Standard, which stand for AES, is a
convention for information encryption made in 2001 by the U.S Public Institute of Standards and Technology AES utilizes a 128-digit block size, and key lengths of
What is Decryption ?
This concept is the reverse of above Encryption This is a Cypher Security format that make hacker or thief inconvenient in finding a chance to steal the information when they are not allowed to read these datas It tranforms the cipher text into the original text that people with decryption keys can easily read it and understand throughout some tools This techniques required some coding function to make it
Trang 1414
unreadable However, we already know that Encryption protects data but the accesser must have the authorized tools to reach the plain data this means the Decryption can
be done manually or using decode application
What are the types of Decryption ?
In this concept, I will introduce just a few type of decryption such as AES About AES, it is exceptionally effective in 128-cycle structure, and AES likewise uses 192 and 256-bit keys for substantial information encryption AES is by and large accepted to be impervious to all assaults, barring savage power, which attempts to decipher messages utilizing all potential blends of 128, 192, or 256-cycle cryptosystems In any case, Cyber Security experts guarantee that AES will at last be hailed as an accepted norm for information encryption in the private area
What is the advantage of Decryption ?
There are many purpose on utilizing Decryption but the main course still is the fresh and unbreakable organization supervision This method help Cipher Security in the whole new levels of protect information as it reduce the amount and the percentage
of confusion in reading and understanding the datas
What is the process of Decryption ?
The data or information or cipher text will be delivered to the receiver After that, it is enable to convert from random code or keys into the original form of the datas
The least complex method for demonstrating likelihood of a framework is through balance For instance the idea of a "fair" coin implies there are two potential
Trang 15results that are undefined Since each outcome is similarly possible the result is 50/50
heads or tails
Comparatively for a fair kick the bucket there are 6 potential results, that are largely similarly reasonable This implies they each have the likelihood 1/6 The possibility of balance is behind arbitrary inspecting If we have any desire
to comprehend a populace we can take various arbitrary cases and it educates us something concerning the entirety Anyway this is possibly evident assuming the example is irregular as for the properties we're estimating That is assuming we traded individuals haphazardly we would probably gauge them
Another model is a spinner, similar to a roulette wheel The model is that a fair twist is similarly liable to land anyplace on the periphery circle So by evenness the likelihood of a result is relative to the length of the curve it subtends on the circle This likewise is relative to the point of the curve, which is corresponding to the region
of the circular segment
What is asymmetric cryptography ?
This concept is known as cryptography format, which is utillize to pair a public key with a private data in order to encode and decode information and prevent hacker from robbing the access or sensitive data,
To understand more about public key, it is a cryptographic key which is used
to encrypt datas in order to decrypt by the receiver with their private key (private key
is only shared with the sender)
Trang 1616
There are a majority of protocols that depend on asymmetric cryptography which included the TLS (Transport layer security) and the SSL (Secure sockets layer) which makes HTTPs possible
The purpose of using asymmetric cryptography is increasing the information protection This technique does not require to publish the private keys when encrypting So that, we can protect our information and hold the these data outside those cybercriminal9s range
How does asymmetric cryptography work ?
Hilter kilter cryptography is regularly used to confirm information utilizing computerized marks A computerized mark is a numerical method used to approve the legitimacy and trustworthiness of a message, programming or advanced report It
is what might be compared to a written by hand signature or stepped seal
In light of uneven cryptography, advanced marks can give affirmations of proof to the beginning, character and status of an electronic archive, exchange or message, as well as recognize informed assent by the underwriter
What are examples of asymmetric cryptography ?
The RSA calculation - - the most generally utilized unbalanced calculation - -
is installed in the SSL/TLS, which is utilized to give secure interchanges over a PC organization RSA gets its security from the computational trouble of figuring huge
numbers that are the result of two enormous indivisible numbers
Duplicating two huge primes is simple, however the trouble of deciding the first numbers from the item - - calculating - - structures the premise of public-key cryptography security The time it takes to factor the result of two adequately huge primes is past the abilities of most assailants
Trang 17are moving to a base key length of 2048-bits
Encryption (used to protect sensitive information)
» V/V —
vJ —
Trang 1818
CHAPTER 2: MONOALPHABETICAL SUBSTITUTION
CIPHER
2.1 Definiton of substitution cipher:
A replacement figure is a sort of encryption where characters or units of text are supplanted by others to encode a text arrangement Replacement figures are a piece
of early cryptography, originating before the advancement of PCs, and are presently
somewhat old
In a replacement figure, a letter like An or T, is rendered into another letter, which
actually encodes the grouping to a human peruser The issue is that basic replacement figures don't actually encode successfully regarding PC assessment - with the ascent
of the PC, replacement figures turned out to be generally simple for PCs to break Nonetheless, a portion of the thoughts behind the replacement figure keep on living
on - a few types of present day encryption could utilize a very enormous message set and an incredibly complex replacement to encode data really
2.2 State the problem:
Our requirement is using monoalphabetical substitution cipher algorithm to encode a plaintext into some scripts that the hackers or thief can not reach it easily But the case
we will focus in is English alphabet and with the length about 50 words to 5000 words depending on what testcase we need
Moreover, the condition here is we need to use distinct character and replace separately that mean a letter will be replaced exactly one specified letter, no exception
here
The monoalphabetical substitution cipher algorithm are born to do that, it will receive original text and transfer it into cipher text which means the code that not everyone can read and understand it Each letter in the plaintext will be replace by a fixed letter
Trang 19in the alphabet which means we need to generate the cipher alphabet before execute the algorithm
When you generate ‘a9 will be replaced by ‘r9 which means every time ‘a9 appears in the plaintext, it will change into ‘r9 The cipher alphabet will be generate again when use to encode another script not the old one which means it will have a permutation
of cipher alphabet when you want to encode a new script
2.3 Idea of solution and algorithm:
My idea is to solve and replace the letter in their lowercase format so it will be easier
to solve when uppercase appear making it compatible for all letter in plaintext
A[BI|G[IĐIEIET|IG|IHITTT|K|ITC|]MIN|O|IP[†[QGIRI|SIT|U|V[W|X|IYTZ
MIZ ||L INl||o [| T ||K lv |[[A l| 9 |6 l[F |[El[R Í x |[y lulÌc ÍnlÍP [pl|ÌB Jwllo [s || 1
Picture 4 Example oƒ cipher keys
About the cipher alphabet, it will be generate randomly when encoding a new script (default, we will have 26 factorial of position in cipher alphabet)
In above table, we see that each default alphabet is fixed with a letter in the alphabet and just will be replaced by it as long as in the same script we use not the new one The idea is using list to contain default alphabet and then fixed the alphabet with cipher alphabet to maintain each letter fixed with the key letter in cipher As the result, after replace respectively each letter with key using loop
2.4 Example and evaluation with analysis:
Given the string script of Ielts reading passage below and encode it with random cipher alphabet:
“Not many people have mental imagery as vibrant as Lauren or as blank as Niel They are the two extremes of visualisation Adam Zeman, a professor of cognitive and behavioural neurology, wants to compare the lives and experiences of people with
Trang 2020
aphantasia and its polar-opposite hyperphantasia His team, based at the University of Exeter, coined the term aphantasia this year in a study in the journal Cortex.= With the support of the cipher alphabet fixed letter it will tranform into this:
“lih nvlp cuicsu rvtu nulhvs anvyugp vb taxgvlh vb svzgul ig vb xsvlo vb laus hrup vgu hru hwi uehgunub id tabzvsabvhail vmvn kunvl, v cgidubbig id qiylahatu vim xurvtaizgvs luzgisiyp, wvlhb hi qincvgu hru satub vlm uecugaulqub id cuicsu wahr vervlhvbav vlm ahb cisvg-iccibahu rpcugervlhvbav rab huvn, xvbum vh hru zlatugbahp id ueuhug, qialum hru hugn vervlhvbav hrab puvg al v bhzmp al hru jizglvs gighue.=
Note that all commas, dot or dash will be maintain as it original forms
We can notice that the way using monoalphabetical substitution cipher is very simple cause you just need to replace one by one letter with your generated key table (just another letters without any special symbols) This lead to a lot of secure problems because it is very easy to break it throughout some methods Such as, Frequency analysis use the amount of appearance of a letter to construct a table tell that whichthe frequency for that letter being encoded In English (and different dialects) there's an immense variety in how continuous various letters show up “e" is the most widely recognized one, representing around 13% of all letters in a message, next is "t" at 9%
"a" at 8% thus on.[1] To figure out the code, I just count how often each letter shows
up in the ciphertext, and afterward I surmise that the letter that shows up most often
is an “e", the second most successive one is a "t, etc Subsequent to having done this for a portion of the letters, it becomes conceivable to perceive words, on the off chance that for instance "t?e" shows up habitually, the ? is probably going to be a "h" - for each new letter I accurately surmise, speculating the leftover ones becomes more straightforward and pretty soon the code is broken
Trang 2222
CHAPTER 3: FREQUENCY ANALYSIS
3.1 Definition of Frequency Analysis:
In cryptographic, Frequency Analysis which is called with another name is frequency counting or counting letters This algorithm is the research on finding the frequency appearance of a letter in cipher text This algorithm is use to break some substitution ciphers
Actually, frequency analysis is based on the stretch of written language, it will find out the frequency of a certain letter or the combination of letters occur
Recurrence examination comprises of counting the event of each letter in a text Recurrence investigation depends on the way that, in some random piece of text, certain letters and mixes of letters happen with fluctuating frequencies For example, given a segment of English language, letters E, T, An and O are the most widely
recognized, while letters Z, Q and X are not as oftentimes utilized
79% 14% 27% 41% 122% 21% 19% 59% 68% 0.2% 08% 39% 23% 65% 72% 1 0.1% 58% 61% 88% 27% 10% 23% 02% 19% 10
Picture 6 Frequency table
We can expect that most examples of text written in English would have a comparative appropriation of letters Anyway this is possibly obvious assuming that the example of text is sufficiently long An extremely short text might prompt something else entirely
While attempting to decode a code text in view of areplacement figure, we can utilize arecurrence examination to assist with recognizing the most repeating letters in a code text and thus cause theory of what these letters to have been encoded as (for example
E, T, A, O, and so on) This will assist us with unscrambling a portion of the letters in
Trang 23the text We can then perceive designs/words in the part of the way decoded text to distinguish more replacements
3.2 State the problem:
Given a cipher text and we have to decrypt it into original text, the problem is wwe have to count the appearances of a letter and use the default frequency table to decode
it
But in the case that we replace a letter like ‘a9 with ‘e9 and there is another e in the text This leads to misunderstand and the cipher ‘e9 will be not replace and error occurs
So we have to mark it as it is replaced or not
3.3 Idea of solution and algorithm:
The solution is we need to create a visited list to contain those index or value that are replaced before in case we miss those which are not replaced at all Beside that, we need to sort ascending order same as alphabet to make it more suitable in pratical situation This means when we face that to letter have the same frequency, we need
to consider based on the alphabet For instancec, ‘a9 = 2, ‘h9 = 2, but a will be replace
by the letter frequency before “h9
So the algorithm here is, we need to count exactly how many time a letter appear and sort it descending The most appeared letter will be replace with ‘e9
3.4 Example and evaluation:
Given the string script cipher passage below and encode it with random cipher alphabet:
“caj uzfmq-19 tsiqyymu asn uspnjq ysih njfjdj qsysxjn cz tjztwj9n wmfjn mc sgegjyucjq sww sntjucn zg jemncjiuj: yjqmumij, juzizymun, siq tzwmcmun vmcazpce sih qzpoc,
mc swnz migwpjiujq ntzden siq tdsucmuswwh sww bmiqn zg tahnmusw gsumwmemjn sn gmcijnn ujicjdn, xhyn, siq ntzde uwpon vjdj uwznjq cajdjgzdj, ysih