0% found this document useful (0 votes)
1 views107 pages

-Module7-18-11-2024

The document discusses the principles and implementation aspects of Low-Density Parity-Check (LDPC) codes, particularly in the context of error correction for NAND flash memories. It covers the fundamentals of error correction codes, LDPC encoding and decoding methods, and decoder architectures. Additionally, it explains the significance of error correction in flash memories due to the variability in threshold voltage distributions as memory cells undergo programming and erasing cycles.

Uploaded by

Kritika R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views107 pages

-Module7-18-11-2024

The document discusses the principles and implementation aspects of Low-Density Parity-Check (LDPC) codes, particularly in the context of error correction for NAND flash memories. It covers the fundamentals of error correction codes, LDPC encoding and decoding methods, and decoder architectures. Additionally, it explains the significance of error correction in flash memories due to the variability in threshold voltage distributions as memory cells undergo programming and erasing cycles.

Uploaded by

Kritika R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

LDPC Codes: Principles and

Implementation Aspects

1
Overview
• Why Is Error Correction Needed in Flash
Memories?
• Error Correction Codes Fundamentals
• Low-Density Parity-Check (LDPC) Codes
• LDPC Encoding and Decoding Methods
• Decoder Architectures for LDPC Codes

2
NAND Flash Basics
• Information is stored in a NAND flash cell by inducing a certain voltage
to its floating gate.
• To read the stored bit(s), the cell voltage is compared with a set of
threshold voltages and a hard-decision bit is sent out.

4
Threshold Voltage Distribution with
Increasing Number of P/E Cycles
• Cell voltages change randomly over time and with memory wear out. So read
voltage is a random variable.
• As the number of P/E cycles increases, the threshold voltage distributions
widen and shift.

5
Threshold Voltage Distribution with
Increasing Number of P/E Cycles
• Cell voltages change randomly over time and with memory wear out. So read
voltage is a random variable.
• As the number of P/E cycles increases, the threshold voltage distributions
widen and shift.
No of cells

Voltage

Threshold Voltage Distribution

6
Overview
• Why Is Error Correction Needed in Flash
Memories?
• Error Correction Codes Fundamentals
• Low-Density Parity-Check (LDPC) Codes
• LDPC Encoding and Decoding Methods
• Decoder Architectures for LDPC Codes

12
Error Correction Codes

13
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘

14
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits

15
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code

16
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

17
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

18
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

19
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits Systematic code
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

20
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits Systematic code
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

21
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits Systematic code
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 Parity bits
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

22
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits Systematic code
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3 𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 Parity bits 𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0

23
Error Correction Codes
• Linear Block codes: 𝑢1 , … , 𝑢𝑘 → 𝑐1 , … , 𝑐𝑛 , 𝑛 > 𝑘
𝑘: Data block size 𝑛: Codeword size 𝑚 = 𝑛 − 𝑘: Number of parity bits
• Example: 𝑘 = 4, 𝑛 = 7, 𝑚 = 3 → A (7, 4) block code
𝑐1 = 𝑢1
𝑐2 = 𝑢2
Information bits Systematic code
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3 𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 Parity bits 𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0
Parity check equations

24
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2
𝑐3 = 𝑢3
𝑐4 = 𝑢4
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

25
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮=
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

26
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4

27
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

28
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0

29
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1

30
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0 Parity-Check Matrix
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1

31
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0 Parity-Check Matrix
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1
𝑯 = 𝑷𝑡 , 𝑰(𝑛−𝑘)×(𝑛−𝑘)

32
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0 Parity-Check Matrix
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1
𝑯 = 𝑷𝑡 , 𝑰(𝑛−𝑘)×(𝑛−𝑘)

𝒄 = 𝒖𝑮 = [𝒖, 𝒑]
33
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0 Parity-Check Matrix
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1
𝑯 = 𝑷𝑡 , 𝑰(𝑛−𝑘)×(𝑛−𝑘)

𝒄 = 𝒖𝑮 = [𝒖, 𝒑] 𝒄𝑯𝑡 = 0
34
Error Correction Codes
𝑐1 = 𝑢1
𝑐2 = 𝑢2 1 0 0 0 1 0 1
𝑐3 = 𝑢3
0 1 0 0 1 1 1
𝑐4 = 𝑢4 𝑮= Generator Matrix
0 0 1 0 1 1 0
𝑐5 = 𝑢1 + 𝑢2 + 𝑢3
𝑐6 = 𝑢2 + 𝑢3 + 𝑢4 0 0 0 1 0 1 1
𝑐7 = 𝑢1 + 𝑢2 + 𝑢4 𝑮 = 𝑰𝑘×𝑘 , 𝑷𝑘×(𝑛−𝑘)

𝑐1 + 𝑐2 + 𝑐3 + 𝑐5 = 0 1 1 1 0 1 0 0
𝑐2 + 𝑐3 + 𝑐4 + 𝑐6 = 0 𝑯= 0 1 1 1 0 1 0 Parity-Check Matrix
𝑐1 + 𝑐2 + 𝑐4 + 𝑐7 = 0 1 1 0 1 0 0 1
𝑯 = 𝑷𝑡 , 𝑰(𝑛−𝑘)×(𝑛−𝑘)

𝒄 = 𝒖𝑮 = [𝒖, 𝒑] 𝒄𝑯𝑡 = 0 𝑮𝑯𝑡 = 0


35
Overview
• Why Is Error Correction Needed in Flash
Memories?
• Error Correction Codes Fundamentals
• Low-Density Parity-Check (LDPC) Codes
• LDPC Encoding and Decoding Methods
• Decoder Architectures for LDPC Codes

36
Tanner Graph Representation of
Block Codes

37
Tanner Graph Representation of
Block Codes
Parity-check matrix:

1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0
1 1 0 1 0 0 1

38
Tanner Graph Representation of
Block Codes
Parity-check matrix:

1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0
1 1 0 1 0 0 1

39
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4
𝑣5
Variable nodes: Left nodes
𝑣6
𝑣7

40
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7

41
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes

42
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’
in the 𝑯 matrix

43
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

44
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

45
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

46
Tanner Graph Representation of
Block Codes
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

47
Tanner Graph Representation of
Block Codes
Length-4 Cycle
Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

48
Tanner Graph Representation of
Block Codes Length-6 Cycle

Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

49
Tanner Graph Representation of
Block Codes A (2, 1) Trapping Set (TS)

Parity-check matrix: 𝑣1
𝑣2 𝑐1 : 𝑣1 + 𝑣2 + 𝑣3 + 𝑣5 = 0
1 1 1 0 1 0 0
𝑯= 0 1 1 1 0 1 0 𝑣3
1 1 0 1 0 0 1 𝑣4 𝑐2 : 𝑣2 + 𝑣3 + 𝑣4 + 𝑣6 = 0
𝑣5
Variable nodes: Left nodes
𝑣6 𝑐3 : 𝑣1 + 𝑣2 + 𝑣4 + 𝑣7 = 0
Check nodes: Right nodes
𝑣7
Edges connect variable
nodes and check nodes
Each edge represents a ‘1’ Degree of a node is the number
in the 𝑯 matrix of edges connected to it

50
LDPC Codes
• Linear block codes with low-density parity-check matrices
• Number of nonzeros increases linearly with the block length
(sparseness)
• Iterative message passing decoders
• Decoding complexity depends linearly on the number of
nonzeros and on block length
• The generator matrix is constructed from a sparse parity-check
matrix

51
Example: Gallager’s LDPC Code
The example used by R. G. Gallager:

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 𝑛 = codeword length = 20
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 𝑑𝑣 = column degree = 3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 𝑑𝑐 = row degree =4
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 This code is called a regular LDPC
𝐻= 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 code.
0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
If 𝑑𝑣 (𝑑𝑐 ) is not constant for all
1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
columns (rows), the code is called
0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 an irregular LDPC code.
0 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

52
QC-LDPC Matrix: Example 1

Example H Matrix
2 3 110 142 165
 5 64 96 113 144
r/row degree =5
S35 c/column degree = 3
7 50 75 116 174 Sc/circulant size = 211
N = Sc × r = 1055
QC-LDPC Matrix: Example 2

0 0 ... 0 1
1 0 ... 0 0

  0 1 ... .0 0
 
: 
0 0 ... .1 0

I I I ... I  Example H Matrix: Array LDPC code


I  2 ...  r 1 
 
H  I  2 4 ...  2 ( r 1)  r row/ check node degree =5
 
:  c column/variable node degree = 3
 I  c 1  ( c 1) 2 ...  ( c 1)( r 1) 
 Sc circulant size =7
N = Sc × r = 35

54
Overview
• Why Is Error Correction Needed in Flash
Memories?
• Error Correction Codes Fundamentals
• Low-Density Parity-Check (LDPC) Codes
• LDPC Encoding and Decoding Methods
• Decoder Architectures for LDPC Codes

55
LDPC Encoding
• For Systematic codes: 𝒄 = 𝒖, 𝒑 .
• Suppose 𝑯 = 𝑯𝒖 , 𝑯𝒑 , where 𝑯𝒑 is (𝑛 − 𝑚) × 𝑛 − 𝑚 and invertible. Then

𝒄𝑯𝑡 = 0
𝑯𝒕𝑢
= 𝒖, 𝒑 × 𝒕
𝑯𝑝
= 𝒖𝑯𝒕𝑢 + 𝒑𝑯𝒕𝑝
𝒑𝑯𝒕𝑝 = 𝒖𝑯𝒕𝑢
𝒕
𝒑 = 𝒖𝑯𝒕𝑢 𝑯−𝟏
𝑝

56
LDPC Encoding
• For Systematic codes: 𝒄 = 𝒖, 𝒑 .
• Suppose 𝑯 = 𝑯𝒖 , 𝑯𝒑 , where 𝑯𝒑 is (𝑛 − 𝑚) × 𝑛 − 𝑚 and invertible. Then

𝒄𝑯𝑡 = 0
𝑯𝒕𝑢
= 𝒖, 𝒑 × 𝒕
𝑯𝑝
= 𝒖𝑯𝒕𝑢 + 𝒑𝑯𝒕𝑝
𝒑𝑯𝒕𝑝 = 𝒖𝑯𝒕𝑢
𝒕
𝒑 = 𝒖𝑯𝒕𝑢 𝑯−𝟏
𝑝

57
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2

58
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

59
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

60
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

0+0+1=1 0+1+0=1 1+0+0=1

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

61
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

0+0+1=1 0+1+0=1 1+0+0=1

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

Flip Flip Flip

62
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

0+0+1=1 0+1+0=1 1+0+0=1

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

Flip Flip Flip Flip Flip


Flip

63
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

0+0+1=1 0+1+0=1 1+0+0=1

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

Flip Flip Flip Flip Flip Flip Flip


Flip
Flip

64
LDPC Decoding
(The Bit-Flipping Algorithm), 1/2
• Bit flipping is a hard decision (HD) decoding method

0+0+1=1 0+1+0=1 1+0+0=1

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 1 0 0 0

Flip Flip Flip Flip Flip Flip Flip


Flip
Flip

65
LDPC Decoding
(The Bit-Flipping Algorithm), 2/2
• Bit flipping is a hard decision (HD) decoding method

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 0 0 0 0

66
LDPC Decoding
(The Bit-Flipping Algorithm), 2/2
• Bit flipping is a hard decision (HD) decoding method

0+0+0=0 0+0+0=0 0+0+0=0

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 0 0 0 0

67
LDPC Decoding
(The Bit-Flipping Algorithm), 2/2
• Bit flipping is a hard decision (HD) decoding method

0+0+0=0 0+0+0=0 0+0+0=0

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 0 0 0 0

68
LDPC Decoding
(The Bit-Flipping Algorithm), 2/2
• Bit flipping is a hard decision (HD) decoding method

0+0+0=0 0+0+0=0 0+0+0=0

m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


0 0 0 0 0 0 0

Stay Stay Stay Stay Stay Stay Stay


Stay
Stay

69
Message Passing Decoding of LDPC
Codes, 1/2

• The decoding is successful when all the parity checks are satisfied (i.e. zero).

70
Message Passing Decoding of LDPC
Codes, 2/2
• There are four types of LLR messages

m=0 m=1 m=2


(i) (i)
R0 >3 R 2 >3

Q 3(i)>1

P6

n=0 n=1 n=2 n=3 n=4 n=5 n=6


L
3

Channel Detector

71
Message Passing Decoding of LDPC
Codes, 2/2
• There are four types of LLR messages
o Message from the channel to the n-th bit node, Ln

m=0 m=1 m=2


(i) (i)
R0 >3 R 2 >3

Q 3(i)>1

P6

n=0 n=1 n=2 n=3 n=4 n=5 n=6


L
3

Channel Detector

72
Message Passing Decoding of LDPC
Codes, 2/2
• There are four types of LLR messages
o Message from the channel to the n-th bit node, Ln
(i )
o Message from n-th bit node to the m-th check node Qn( i>
)
m
or simply Qnm

m=0 m=1 m=2


(i) (i)
R0 >3 R 2 >3

Q 3(i)>1

P6

n=0 n=1 n=2 n=3 n=4 n=5 n=6


L
3

Channel Detector

73
Message Passing Decoding of LDPC
Codes, 2/2
• There are four types of LLR messages
o Message from the channel to the n-th bit node, Ln
(i )
o Message from n-th bit node to the m-th check node Qn( i>
)
m
or simply Qnm

o Message from the m-th check node to the n-th bit node Rm( i>
) (i )
n or simply Rmn

m=0 m=1 m=2


(i) (i)
R0 >3 R 2 >3

Q 3(i)>1

P6

n=0 n=1 n=2 n=3 n=4 n=5 n=6


L
3

Channel Detector

74
Message Passing Decoding of LDPC
Codes, 2/2
• There are four types of LLR messages
o Message from the channel to the n-th bit node, Ln
(i )
o Message from n-th bit node to the m-th check node Qn( i>
)
m
or simply Qnm

o Message from the m-th check node to the n-th bit node Rm( i>
) (i )
n or simply Rmn
o Overall reliability information for n-th bit-node Pn
m=0 m=1 m=2
(i) (i)
R0 >3 R 2 >3

Q 3(i)>1

P6

n=0 n=1 n=2 n=3 n=4 n=5 n=6


L
3

Channel Detector

75
The Min-Sum Algorithm, 1/3
Notation used in the equations
xn is the transmitted bit n ,
Ln is the initial LLR message for a bit node (also called as variable node) n ,
received from channel/detector
Pn n,
is the overall LLR message for a bit node
xn is the decoded bit n (hard decision based on Pn ) ,
[Frequency of P and hard decision update depends on decoding schedule]
(n) is the set of the neighboring check nodes for variable node n ,
 (m) is the set of the neighboring bit nodes for check node m .
For the ith iteration,
i 
Qnm is the LLR message from bit node n to check node m ,
i 
Rmn is the LLR message from check node m to bit node n .

76 76
The Min-Sum Algorithm, 2/3

(A) check node processing: for each m and n  (m) ,


i  i  i 
Rmn   mn  mn (1)
i 
 mn  Rmn
(i )
 min Q
 i  1 (2)
n    m  \ n nm

77
The Min-Sum Algorithm, 2/3

(A) check node processing: for each m and n  (m) ,


i  i  i 
Rmn   mn mn (1)
i 
 mn  Rmn
(i )
 min Q
 i  1 (2)
n    m  \ n nm
i 
The sign of check node message Rmn is defined as
  i 1 
i 
 mn 
 m \n sgn Q
nm  (3)
 n    
i 
where  mn takes value of  1 or  1

78
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :

i 
Qnm  Ln  
m  n  \ m
Rmin (4)

79
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :
i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

80
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :

i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

A hard decision is taken where xˆn  0 if Pn  0 , and xˆn  1 if Pn  0 .

81
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :

i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

A hard decision is taken where xˆn  0 if Pn  0 , and xˆn  1 if Pn  0 .


If xˆn H  0 , the decoding process is finished with xˆn as the decoder output;
T

otherwise, repeat steps (A) to (C).

82
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :
i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

A hard decision is taken where xˆn  0 if Pn  0 , and xˆn  1 if Pn  0 .


If xˆn H  0 , the decoding process is finished with xˆn as the decoder output;
T

otherwise, repeat steps (A) to (C).


If the decoding process doesn’t end within some maximum iteration, stop and
output error message.

83
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :
i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

A hard decision is taken where xˆn  0 if Pn  0 , and xˆn  1 if Pn  0 .


If xˆn H  0 , the decoding process is finished with xˆn as the decoder output;
T

otherwise, repeat steps (A) to (C).


If the decoding process doesn’t end within some maximum iteration, stop and
output error message.
Scaling or offset can be applied on R messages and/or Q messages for better
performance.

84
The Min-Sum Algorithm, 3/3
(B) Variable-node processing: for each n and m  M (n) :
i 
Qnm  Ln  
m  n  \ m
Rmin (4)

(C) P Update and Hard Decision


Pn  Ln  
mM ( n )
(i )
Rmn (5)

A hard decision is taken where xˆn  0 if Pn  0 , and xˆn  1 if Pn  0 .


If xˆn H  0 , the decoding process is finished with xˆn as the decoder output;
T

otherwise, repeat steps (A) to (C).


If the decoding process doesn’t end within some maximum iteration, stop and
output error message.
Scaling or offset can be applied on R messages and/or Q messages for better
performance.
The Min-Sum algorithm can be used in both hard-decision (HD) and soft-decision (SD) modes. In HD mode, LLRs have
same magnitude with lower bit resolution
85
LDPC Decoding Example, 1/3
m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6

86
LDPC Decoding Example, 1/3
m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

87
LDPC Decoding Example, 1/3
m=0 m=1 m=2

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

88
LDPC Decoding Example, 1/3
Variable node m=0 m=1 m=2
processing:
−7 −7
+5 +3
+3
+9 +2
−7 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

89
LDPC Decoding Example, 1/3
Variable node m=0 m=1 m=2
processing:
−7 −7
+5 +3
+3
+9 +2
−7 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

m=0 m=1 m=2


Check node
processing:
+3 +2
−3 −5
−7
−3 −7
+3 −7

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

90
LDPC Decoding Example, 1/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−7 −7 +3 +2
+5 +3 −3 −5
+3 −7
+9 +2 −3 −7
−7 +8 +3 −7

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
−4 +6 +2 +1 −3 +1 −5
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−3 −5
−7
−3 −7
+3 −7

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

91
LDPC Decoding Example, 1/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−7 −7 +3 +2
+5 +3 −3 −5
+3 −7
+9 +2 −3 −7
−7 +8 +3 −7

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
−4 +6 +2 +1 −3 +1 −5
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−3 −5
−7
−3 −7
+3 −7

n=0 n=1 n=2 n=3 n=4 n=5 n=6 1 0 0 0 1 0 1

+3 +9 +5 −7 +3 +8 +2
HD
92
LDPC Decoding Example, 2/3
Variable node m=0 m=1 m=2
processing:
−5 −1
+5 +3
+3
+9 +2
−2 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

93
LDPC Decoding Example, 2/3
Variable node m=0 m=1 m=2
processing:
−5 −1
+5 +3
+3
+9 +2
−2 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

m=0 m=1 m=2


Check node
processing:
+3 +2
−1 −1
−5
−3 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

94
LDPC Decoding Example, 2/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−5 −1 +3 +2
+5 +3 −1 −1
+3 −5
+9 +2 −3 −1
−2 +8 +3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
−1 +6 +4 +1 +2 +7 +1
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−1 −1
−5
−3 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

95
LDPC Decoding Example, 2/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−5 −1 +3 +2
+5 +3 −1 −1
+3 −5
+9 +2 −3 −1
−2 +8 +3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
−1 +6 +4 +1 +2 +7 +1
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−1 −1
−5
−3 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 1 0 0 0 0 0 0

+3 +9 +5 −7 +3 +8 +2
HD
96
LDPC Decoding Example, 3/3
Variable node m=0 m=1 m=2
processing:
−2 −1
+5 +3
+3
+9 +2
−2 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

97
LDPC Decoding Example, 3/3
Variable node m=0 m=1 m=2
processing:
−2 −1
+5 +3
+3
+9 +2
−2 +8

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

m=0 m=1 m=2


Check node
processing:
+3 +2
−2 −2
−2
−2 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

98
LDPC Decoding Example, 3/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−2 −1 +3 +2
+5 +3 −2 −2
+3 −2
+9 +2 −2 −1
−2 +8 +3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
+1 +7 +3 +1 +1 +7 +1
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−2 −2
−2
−2 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6


+3 +9 +5 −7 +3 +8 +2

99
LDPC Decoding Example, 3/3
Variable node m=0 m=1 m=2 P and HD m=0 m=1 m=2
processing: update:
−2 −1 +3 +2
+5 +3 −2 −2
+3 −2
+9 +2 −2 −1
−2 +8 +3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 n=0 n=1 n=2 n=3 n=4 n=5 n=6
+3 +9 +5 −7 +3 +8 +2 +3 +9 +5 −7 +3 +8 +2
+1 +7 +3 +1 +1 +7 +1
m=0 m=1 m=2
Check node
processing:
+3 +2 P
−2 −2
−2
−2 −1
+3 −1

n=0 n=1 n=2 n=3 n=4 n=5 n=6 0 0 0 0 0 0 0

+3 +9 +5 −7 +3 +8 +2
HD
100
Overview
• Why Is Error Correction Needed in Flash
Memories?
• Error Correction Codes Fundamentals
• Low-Density Parity-Check (LDPC) Codes
• LDPC Encoding and Decoding Methods
• Decoder Architectures for LDPC Codes

101
Decoder Architectures

• Parallelization is good-but comes at a steep cost for LDPC decoders.

• Fully Parallel Architecture:


- All the check updates in one clock cycle and all the bit updates in one more
clock cycle.
- Huge Hardware resources and routing congestion.

• Serial Architecture:
- Check updates and bit updates in a serial fashion.
- Huge memory requirement. Memory in critical path.
- Very low throughput

102
Semi-parallel Architectures

• Check updates and bit updates using several units.

• Partitioned memory by imposing structure on H matrix.

• Practical solution for most of the applications.

• There are several semi-parallel architectures proposed.

• Complexity differs based on architecture and scheduling.

103
Layered Decoder Architecture
• Optimized Layered Decoding with algorithm transformations for reduced memory and computations
0 0
𝑅𝑙,𝑛 = 0, 𝑃𝑛 = 𝐿𝑛 [ Initialization for each new received data frame ]
∀𝑖 = 1,2, . . . , 𝑖𝑡𝑚𝑎𝑥 [ Iteration loop ]
∀𝑙 = 1,2, . . . , 𝑗 [ Layer loop ]
∀𝑛 = 1,2, . . . , 𝑘 [ Block column loop ]
𝑖 𝑆 𝑙,𝑛 𝑖−1
𝑄𝑙,𝑛 = 𝑃𝑛 𝑆 𝑙,𝑛 − 𝑅𝑙,𝑛 , (𝑄new = 𝑃new − 𝑅old )
𝑖 𝑖 𝑆 𝑙,𝑛′
𝑅𝑙,𝑛 =𝑓 𝑄𝑙,𝑛′ , ∀𝑛′ = 1,2, . . . , 𝑑𝑐𝑙 − 1

(𝑅new = 𝑓 𝑄new = R_Select(FS, Qsign))


𝑖 𝑆 𝑙,𝑛 𝑖
𝑃𝑛 𝑆 𝑙,𝑛 = 𝑄𝑙,𝑛 + 𝑅𝑙,𝑛 , (𝑃 = 𝑄old + 𝑅new )
𝑃new is then computed by applying delta shift on 𝑃
• 𝑄 and 𝑅 messages are computed for each 𝑝 × 𝑝 block of 𝐻 where 𝑝 is the parallelization
• 𝑓 ∙ is the check node processing unit
• 𝑆 𝑙, 𝑛′ is the upward (right) shift for block row (layer) 𝑙 and block column 𝑛′
• 𝑑𝑐𝑙 is the degree of layer 𝑙

104
Block Serial Layered Decoder
Architecture with On-the-Fly Computation
• Proposed for irregular H matrices
• Goal: minimize memory and re-computations by employing
just in-time scheduling
• Advantages compared to other architectures:
1) Q (or L/P/Q) memory can be used to store L/Q/P
instead of 3 separate memories- memory is managed
at circulant level as at any time for a given circulant we
need only L or Q or P.
2) Only one shifter.
3) Value-reuse is effectively used for both Rnew and Rold
4) Low complexity data path design-with no redundant
data path operations.
5) Low complexity CNU design.
6) Out-of-order processing at both layer and circulant
See [8, P1-P6] and references therein for more
level for all the processing steps such as Rnew and PS
details on features and implementation. processing to eliminate the pipeline and memory
access stall cycles.
105
Data Flow Diagram

R Selection for R NEW operates out of order to


feed the data for PS processing of the next layer.

106
Illustration for out-of-order
processing

• Rate 2/3 code. 8 Layers, 24 block columns. dv, column weight varies from 2 to 6. dc, row weight is 10 for all
the layers.
• Non-zero circulants are numbered from 1 to 80. No layer re-ordering in processing. Out-of-order processing
for Rnew. Out-of-order processing for Partial state processing.
• Illustration for 2nd iteration with focus on PS processing of 2nd layer.
• Rold processing is based on the circulant order 11 16 17 18 20 12 13 14 15 19 and is
indicated in green.
• Rnew is based on the circulant order 72 77 78 58 29 3 5 6 8 10 and is indicated in blue.
• Q memory, HD memory access addresses are based on the block column index to which the green
circulants are connected to.
• Q sign memory access address is based on green circulant number.
• Superscript indicates the clock cycle number counted from 1 at the beginning of layer 2 processing.

107
Out-of-order layer processing for R
Selection

• Normal practice is to compute Rnew messages for each layer after CNU PS processing.
• Here the execution of R new messages of each layer is decoupled from the execution of corresponding layer’s CNU
PS processing. Rather than simply generating Rnew messages per layer, they are computed on basis of circulant
dependencies.
• R selection is out-of-order so that it can feed the data required for the PS processing of the second layer. For
instance Rnew messages for circulant 29 which belong to layer 3 are not generated immediately after layer 3 CNU
PS processing .
• Rather, Rnew for circulant 29 is computed when PS processing of circulant 20 is done as circulant 29 is a dependent
circulant of circulant of 20.
• Similarly, Rnew for circulant 72 is computed when PS processing of circulant 11 is done as circulant 72 is a
dependent circulant of circulant of 11.

108
Out-of-order block processing for
Partial State

• Re-ordering of block processing . While processing layer 2, the blocks which depend on layer 1
will be processed last to allow for the pipeline latency.
• In the above example, the pipeline latency can be 5.
• The vector pipeline depth is 5. So no stall cycles are needed while processing the layer 2 due to
the pipelining. In other implementations, the stall cycles are introduced – which will effectively
reduce the throughput by a huge margin.
• The operations in one layer are sequenced such that the block that has dependent data available
for the longest time is processed first.

109
Memory organization

• Q memory width is equal to circulant size * 8 bits and depth is number of block columns.
• HD memory width is equal to circulant size * 1 bits and depth is number of block columns.
• Qsign memory width is equal to circulant size * 1 bits and depth is number of non-zero
circulants in H-matrix.
• FS memory width is equal to circulant size * (15 bits (= 4 bits for Min1 + 4 bits for Min2 index
+ 1 bit + 6 bits for Min1 index).
• FS memory access is expensive and number of accesses can be reduced with scheduling.
• For the case of decoder for regular mother matrices (no 0 blocks and no OOP): FS access is
needed one time for Rold for each layer; is needed one time for R new for each layer.
• For the case of decoder for irregular mother matrices: FS access is needed one time for
Rold for each layer; is needed one time for R new for each non-zero circulant in each layer.

110
From Throughput Requirements to
Design Specification
• Requirements
- Throughput in bits per sec.
- BER
- Latency
• BER would dictate Number of Iterations and degree profile (check node degrees and
variable node degrees).
• Circulant Size (Sc)
• Number of Circulants processed in one clock (NSc)
• Number of bits processed per clock = Throughput/clock frequency
• Sc * NSc = Nb * Iterations * Average Variable Node degree
- Sc is usually set to less than 128 for smaller router.

111
References
1. Y. Cai, E. F. Haratsch, et al., “Threshold Voltage Distribution in MLC NAND Flash Memory:
Characterization, Analysis, and Modeling,” Proceedings of the Conference on Design, Automation and Test
in Europe. EDA Consortium, 2013.
2. R. Tanner, “A recursive approach to low-complexity codes,” IEEE Trans. on info. Theory, 27.5, pp. 533-
547, 1981.
3. R. G. Gallager, “Low density parity check codes,” IRE Trans. Info. Theory, IT-8:21-28, Jan 1962.
4. R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963.
5. W. Ryan and S. Lin, Channel codes: classical and modern, Cambridge University Press, 2009.
6. Levine, et. al., “Implementation of near Shannon limit error-correcting codes using reconfigurable
hardware,” IEEE Field-Programmable Custom Computing Machine, 2000.
7. E. Yeo, “VLSI architectures for iterative decoders in magnetic recording channels,” IEEE Trans. Magnetics,
vol. 37, no.2, pp. 748-55, March 2001.
8. K. K. Gunnam, “LDPC Decoding: VLSI Architectures and Implementations,” Flash Memory Summit, 2013.

112
References
Several features presented in this tutorial are covered by the following
patents by Texas A&M University System (TAMUS):
[P1] U.S. Patent 8359522, Low density parity check decoder for regular LDPC codes.
[P2] U.S. Patent 8418023, Low density parity check decoder for irregular LDPC codes.
[P3] U.S. Patent 8555140, Low density parity check decoder for irregular LDPC codes.
[P4] U.S. Patent 8656250,Low density parity check decoder for regular LDPC codes.
[P5] U.S. Patent 9112530, Low density parity check decoder.
[P6] U.S. Patent 20150311917, Low density parity check decoder.

113

You might also like