Charlie Viet — Data, ML & GenAI

Điểm cần nhớ

Hãy xem nhãn rủi ro tín dụng như một thỏa thuận nghiệp vụ, không phải chỉ là một cột bad_flag.
Một label tốt luôn cần bốn phần: điểm quan sát, sự kiện bad, outcome window và quy tắc maturity.
Không dùng chung một khoảng thời gian quan sát cho mọi sản phẩm. BNPL, cash loan, thẻ tín dụng và behavior scorecard có nhịp rủi ro rất khác nhau.
Kiểm tra cohort đã đủ mature, tức đã đủ thời gian để rủi ro xuất hiện, trước khi chia train/test.
Gian lận, tất toán thỏa thuận, xóa nợ và tái cơ cấu cần rule riêng; nhét tất cả vào một flag “bad” sẽ gây khó khi theo dõi và review chính sách.

Sơ đồ dưới đây là cách nhìn tối thiểu về một label tín dụng. Điểm cần chú ý là label không xuất hiện ngay lúc giải ngân; bạn phải chờ đủ outcome window để rủi ro có cơ hội bộc lộ.

Loading diagram…

Hình 1. Outcome window nối thời điểm quan sát với thời điểm label đủ mature; nếu chốt label quá sớm, dữ liệu huấn luyện sẽ thiếu bad thật.

A label is a risk contract

A credit model does not start with XGBoost, a feature store, or hyperparameter tuning. It starts with a business contract: which risk is being modeled, where observation starts, how long outcomes are measured, and when a case is mature enough to be called Good or Bad.

If that contract is vague, a model can still have strong AUC and answer the wrong decisioning question. Risk may be reviewing 60+ DPD within 12 months, Product may care about early delinquency, while DS trains on 30+ DPD within 6 months.

This post covers the minimum set of decisions to settle before modeling: DPD, outcome windows, label maturity, Ever-90/FPD/MOB/roll-rate labels, and the traps that make a model look clean in a notebook but fail policy review.

A credit label is a contract, not a column

In a dataset, the label may look like a simple bad_flag. In production, that flag represents an agreement: where observation starts, how long outcomes are measured, which event counts as bad, and which cases should be excluded from credit-risk modeling.

DPD — Days Past Due

DPD measures how many days a borrower is overdue relative to their payment due date.

DPD = Current date − Most recent missed due date

Example: due date is March 1st; as of March 15th, no payment has been made → DPD = 14.

Common conventions:

Label	DPD Definition	Notes
Bad-30	DPD ≥ 30 at any point in window	More lenient; higher bad rate
Bad-60	DPD ≥ 60	Balanced between signal and sample
Bad-90	DPD ≥ 90	Strict; near write-off policy
Ever-90	Ever reached DPD 90+	Does not require consecutive

No definition is correct in isolation. A label is only correct when it matches the portfolio risk appetite, product mechanics, and decision the model will support.

The “label zoo” in practice: choosing the right target

In real projects, “bad” is usually defined along three axes: (1) delinquency severity, (2) horizon/window, and (3) observation point (application vs after-booking).

Here are the most common label families you’ll see in the market:

1) Ever-delinquency within an outcome window (application PD-style)

Ever-30/60/90 in 12M/24M: within 12/24 months from origination, the account reaches DPD ≥ 30/60/90 at least once.
Used when the main objective is application scoring (approval decisions).

2) FPD / EPD (early-warning / early performance)

FPD (First Payment Default): the borrower fails to make the first scheduled payment within a delinquency threshold (public sources often describe default in a 30+ DPD sense).
EPD (Early Payment Default): delinquency/default occurring very early after origination, commonly described as within the first 3–6 months or 90–180 days.

On names like FPD10/FPD15/FPT15

Labels such as FPD10/FPD15/FPT15 are often internal naming conventions. A common interpretation is “first-payment delinquency at (x)+ DPD” (e.g., 10+ or 15+ days late) or a “first payment test” rule specific to a platform. In documentation, always include the plain-language definition + formula, not just the label name.

A few definition patterns (templates; fill parameters per product):

FPD(x) (first-installment delinquency threshold):
- FPD_x = 1 if the first installment reaches DPD ≥ x within [first_due_date, first_due_date + grace_days]
- Parameters: x (10/15/30…), grace_days (policy-dependent), and whether partial payments count
FPT(x) (first-payment test / first-cycle test):
- FPT_x = 1 if by a cutoff_date (e.g., end of cycle 1 or due_date + k days) the borrower has not met minimum payment, mapping to an x+ DPD-equivalent status under platform rules
- Parameters: cutoff_date, minimum-payment rules, and platform-status → DPD-bucket mapping
MOBk_Ever(t) (early performance in first k months):
- MOBk_Ever_t = 1 if within the first k months on book the account ever reaches DPD ≥ t
- Parameters: k (3/6/12), t (30/60/90), and whether the label is cumulative vs point-in-time at MOBk

3) MOB-based labels (months-on-book)

MOB3 / MOB6 / MOB12: assign labels based on behavior within the first 3/6/12 months on book.
Typical examples: “Ever-30 within MOB3” or “60+ by MOB6”.
Useful when you need faster feedback loops or you want a model aligned to early performance.

4) Roll rate / next-cycle delinquency (revolving / collections)

Roll rate measures the % of accounts that migrate from one delinquency bucket to a worse one in the next cycle (30→60, 60→90, etc.). This is common in credit cards and loss forecasting.
Targets like “next-cycle 30+” or “roll 30→60” are often used for behavior scoring and collection strategies.

Outcome window: how long are you waiting for risk to show up?

The outcome window is the period from the observation point (usually loan origination) to when you assign the label.

Loading diagram…

Example with a 12-month window: a loan originated in Jan 2023 is observed through Jan 2024. If DPD ≥ 60 occurs during that period → label = bad.

This is a real modeling trade-off, not a cosmetic parameter:

Short window (3–6 months): More data, faster training, but misses late defaults. Suitable for short-tenor products (BNPL, sub-6-month consumer loans).
Long window (12–24 months): Richer signal, but you must wait for data to mature. Suitable for personal loans, mortgage.

Label maturity: do not split data before outcomes are ready

A cohort is considered mature when most cases have had sufficient time to exhibit bad behavior (if they are going to).

Signs a cohort is not yet mature:

Bad rate is still rising steadily by observation month (not yet flattening).
A high number of cases are still "open" or "pending outcome."

Practical check: Plot bad rate by vintage — one curve per origination cohort. If curves are still sloping upward at the right edge, the cohort is not mature.

Other bad events: do not hide everything inside one flag

Event	Meaning	Typical Timing
Charge-off	Bank writes the debt off its books	Usually after DPD 90–180
Write-off	Similar to charge-off; policy-dependent	Varies
Settlement	Borrower pays partial; bank closes account	After serious delinquency
Bankruptcy	Personal/corporate insolvency	May occur without DPD history
Fraud	Identity fraud, not a credit default	Must be excluded from label

Pre-label checklist before modeling

Before starting model training, answer all 7 questions. If the answers are unclear, the problem is not the algorithm yet.

Bad definition (DPD threshold) signed off by Risk?
Outcome window chosen to match product tenor?
Cohort is mature? (verify with vintage curve)
Fraud cases removed?
Charge-off / write-off cases: include or exclude? (per definition)
Definition consistent between training set and monitoring (population stability)?
Definition documented and shared with the full team before EDA begins?

Tham khảo / References

Siddiqi, N. (2017). Intelligent Credit Scoring, 2nd ed. — Ch. 3: Bad Definition.
Thomas, L. C. et al. (2017). Credit Scoring and Its Applications, 2nd ed. — Ch. 2.
Anderson, R. (2007). The Credit Scoring Toolkit — Ch. 7: Data Preparation.
Experian. What Lenders Need to Know About First Payment Default (FPD). https://experian.com/blogs/insights/first-payment-default
CreditCards.com Glossary. Roll rate definition. https://www.creditcards.com/glossary/term-roll-rate/
Oracle OFS Analytical Applications Docs. Delinquent Roll Rate Computation. https://docs.oracle.com/en/industries/financial-services/ofs-analytical-applications/loan-loss-forecasting/8.1.2.0.0/llfpug/delinquent-roll-rate-computation.html
Bank of England (2024). Credit risk: definition of default (Supervisory Statement). https://www.bankofengland.co.uk/-/media/boe/files/prudential-regulation/supervisory-statement/2024/credit-risk-definition-of-default-supervisory-statement.pdf

Nhãn Rủi Ro Tín Dụng: Trước Khi Train Model, Hãy Thống Nhất “Bad” Là Gì

Điểm cần nhớ

A label is a risk contract

A credit label is a contract, not a column

DPD — Days Past Due

The “label zoo” in practice: choosing the right target

1) Ever-delinquency within an outcome window (application PD-style)

2) FPD / EPD (early-warning / early performance)

3) MOB-based labels (months-on-book)

4) Roll rate / next-cycle delinquency (revolving / collections)

Outcome window: how long are you waiting for risk to show up?

Label maturity: do not split data before outcomes are ready

Other bad events: do not hide everything inside one flag

Pre-label checklist before modeling

Tham khảo / References

In This Series