How We Investigated Racial Disparities in Federal Mortgage Data

Image of a map of the US with different color people images in various state, with the stamp DENIED over the top

Have you read this article yet? You may want to start here.

The Secret Bias Hidden in Mortgage-Approval Algorithms

Even accounting for factors lenders said would explain disparities, people of color are denied mortgages at significantly higher rates than White people

August 25, 2021 06:50 ET

Share This Article

I. Introduction

In the United States, homeownership is synonymous with building wealth. Homeowners are wealthier than those who rent. Owning a home is often less expensive than renting. And the wealth accumulated through a home is generational: Children of homeowners are more likely to own a home than those of renters.

But people of color, particularly Black Americans, have historically been shut out of homeownership.

See our data here.

In the 1930s, the federal government encouraged lending institutions to deny mortgages to prospective homebuyers who lived in neighborhoods with high populations of Black people or immigrants. The practice became known as “redlining” because the government-sponsored Home Owners’ Loan Corporation drew red lines around these neighborhoods, deeming them a hazardous credit risk.

It took more than 30 years for the federal government to outlaw redlining. The Fair Housing Act of 1968 made it illegal to deny someone housing based on race or other protected categories.

Despite the law, people of color continue to be denied mortgages at higher rates than their White counterparts. In 1988, The Atlanta Journal-Constitution found lenders in Atlanta made five times as many loans to people living in White neighborhoods compared to those living in Black neighborhoods, even when the applicants made the same amount of money.

The trend persists today, more than 50 years after the passage of The Fair Housing Act. One of the authors of this investigation, Emmanuel Martinez, reported in 2018 that Black applicants in Philadelphia were almost three times as likely to be denied a mortgage compared to White borrowers there, even when they had similar financial characteristics. That investigation looked at denial rates in major metropolitan areas across the country, rather than computing a national rate, because of the limitations of the data available at the time.

The lending industry criticized these findings, saying the analysis lacked key variables that lenders use to make mortgage decisions and would explain the lending gap: debt-to-income ratio, combined loan-to-value ratio, and credit score. None of those variables were attainable for analysis because they were not included in the public data at the time.

Starting in 2019, the federal government began releasing two of those three variables—debt-to-income and combined loan-to-value ratios—in federal mortgage data. Those two metrics measure how well applicants manage their debt and how much of the property’s overall value is being financed.

We sought to investigate whether including these factors in a statistical analysis would eliminate well-documented lending disparities between White applicants and applicants of color. We found that they did not.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Our new analysis shows that, even when accounting for debt-to-income and combined loan-to-value ratios in addition to other financial characteristics, lenders were still more likely to deny people of color home loans than White applicants. Applicants’ credit scores are still not released publicly, though lenders are required to report them to the federal government as part of their Home Mortgage Disclosure Act (HMDA) reporting.

The improved data also allowed us to discern statistically valid odds of denial nationwide for people of different races and ethnicities. To our knowledge, this is the first time that the public data has been robust enough to do so.

We found that nationwide, Black applicants are nearly twice as likely to be denied conventional mortgages as similarly qualified White applicants. Latino, Asian/Pacific Islander, and Native American applicants are also more likely to be denied mortgages at higher rates than White ones, even when they have the same financial characteristics.

The most common reason listed for denials was that applicants had too much debt relative to their income, which is in line with previous findings by the Consumer Financial Protection Bureau (CFPB) and a survey of mortgage bankers that found they were most concerned about applicants’ high debt ratios.

In addition to looking at the data nationally, we analyzed lending rates at the local level. We found lending disparities vary not only by location but also by racial and ethnic demographic. We found 89 metropolitan areas, spanning every region of the country, where lenders were more likely to deny people of color conventional mortgages than White people with similar financial characteristics. Of the biggest metropolitan areas, Chicago, had one of the widest disparities: Black applicants there were 150 percent more likely to be denied than White applicants. In Minneapolis, lenders were more likely to deny Black, Latino, Asian/Pacific Islander, and Native American applicants than White applicants.

Lastly, HMDA’s volume of applications also allowed us to look at lending patterns of various financial institutions. Seven lenders—DHI Mortgage Company, Lennar Mortgage (formally known as Eagle Home Mortgage), Pulte Mortgage, Freedom Mortgage Corporation, Movement Mortgage, Fairway Independent Mortgage Corporation, and Navy Federal Credit Union—showed wide disparities between the lending rates to people of color and similarly qualified White applicants.

Despite our including the two new key variables, mortgage lending groups and individual lenders said our analysis is still incomplete and doesn’t accurately reflect their lending patterns. Their main critique was the lack of credit scores and credit histories; those factors, they said, can explain the differences in lending outcomes.

II. Data Acquisition

Analyses of lending disparities have traditionally relied on data released through the Home Mortgage Disclosure Act of 1975, known as HMDA data. It’s currently kept and maintained by the CFPB, and its purpose is to ensure that lenders are meeting the housing needs of the communities they serve and to help identify potential discriminatory lending patterns.

Over the years, lenders have been required to report more data about their mortgage applications and prospective borrowers to the federal government. HMDA’s latest expansion came from the 2010 Dodd-Frank Act, which was in response to the 2008 housing crisis. The federal law enacted a series of reforms of the financial industry—among them, requiring lenders to report and publicly disclose more information, including borrowers’ debt-to-income and combined loan-to-value ratios. Those disclosure rules went into effect in 2019 and were first applied to 2018 mortgage data. Lenders are also required to disclose applicants’ credit scores, but the government still does not release those, citing privacy concerns. The public database does not contain applicants’ names or home addresses.

HMDA data is loan-level data containing many details of individual mortgage applications. We downloaded the 2019 dataset multiple times (as it is updated from time to time), most recently on Aug. 10, 2021. It contained more than 17 million applications, nearly 90 percent of all loans made in the country from more than 5,500 financial institutions. Smaller lenders are not required to report the applications they receive. Of those financial institutions that report their data, some don’t have to disclose the new fields required by the Dodd-Frank Act.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

The dataset provides detailed information about the lender, the borrower, the loan, and the property. It contains the borrowers’ income, sex, race and ethnicity, among other characteristics. It details whether the mortgage was secured through conventional means or insured by the government, the reason the mortgage is being sought, the size of the loan, etc. It lists information about the property that’s being purchased, like the number of units and its general location (census tract).

The Dodd-Frank Act expanded the number of fields in the HMDA data that are publicly available, adding more than 50 columns. This makes the 2018 and 2019 public datasets the most expansive ever released. We used the 2019 dataset, which was the most recent available when we began our investigation.

One of the most significant additions to the data is the applicant’s debt-to-income ratio, which banks and federal regulators say is the most important of a handful of key factors lenders consider.

Banks say they place more emphasis on debt-to-income ratios than credit scores. A survey conducted for the Fair Isaac Corporation, otherwise known as FICO, the company that produces credit scores, found that almost 60 percent of bankers are most concerned about high debt-to-income ratios when approving loans. Only 10 percent of them said that low credit scores were their biggest concern.

The CFPB arrived at a similar conclusion in a 2020 report. When looking at why lenders denied prospective borrowers in 2019, the agency said, debt-to-income ratio was the most common reason.

This ratio tells financial institutions whether a person can afford the mortgage. It’s calculated by dividing a borrower’s total monthly payments on all of their lines of credit by their monthly income. A high debt-to-income ratio means a person devotes most of his or her income to paying debts. Lenders also have to report credit scores, but the CFPB publicly reveals only the type of credit model used to assess each applicant’s creditworthiness, not the score itself.

III. Method of Analysis

For this analysis, we focused on mortgages that reflect everyday homeownership and where the government does not insure the loan. Therefore, we limited our data to so-called first-lien conventional mortgages for home purchase on one- to four-unit properties, where the borrower intends to live in the property. We then filtered those further, using only those mortgages with the clearest outcome: loans either made or denied. We excluded all other outcomes, including applications that were withdrawn and applications that were approved by the lender but not ultimately accepted by the applicant.

Researchers and government analysts and regulators often analyze conventional loans separately from Federal Housing Administration (FHA) mortgages and other government-backed loans, such as those from the U.S. Department of Veterans Affairs (VA). Conventional mortgages are the best measure of the mortgage industry’s behavior on individual decisions without government intervention.

Limiting our analysis to these specific loans and removing other records for statistical reasons reduces our universe of data to about 2.4 million mortgage applications nationwide in 2019.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

To determine whether the inclusion of debt-to-income and combined loan-to-value ratios explains away the lending disparities between people of color and their White counterparts, we used a statistical technique called binary logistic regression. This type of regression allows us to assess and quantify the relationship between multiple independent variables against a single binary outcome—in this case, the yes or no decision: whether a lender made the loan or denied the application.

We built one main regression model for all conventional mortgages in our universe and applied it to the entire country. We then created two derivative models from that main one: one to analyze major metropolitan areas and another to analyze individual lenders.

Our national model for conventional applications contained 17 variables:

When we analyzed metro areas, we created more than 950 subsets based on the metro area location of the property (metropolitan statistical areas, metropolitan divisions, micropolitan statistical areas). This model has 16 variables because we excluded the metro area size variable and we also combined some of the categories within those variables because of small sample sizes.

We also applied our regression model to lenders that reported more than 5,000 conventional home purchase applications in 2019. We removed three variables from this regression equation: the type of lender, the size of the lender, and the metro area size flag. Most financial institutions lend in specific markets and tend to stay within those geographic boundaries.

IV. Findings

Nationwide Findings

When holding 17 independent variables constant against the dependent variable of being denied a mortgage, we found that lenders are more likely to deny applicants of color compared to White ones with similar financial characteristics, with Black applicants faring the worst.

National odds of denial by race and ethnicity

Race/ethnicity P-value Likelihood of denial for a conventional mortgage compared to White applicants
Black01.8 times as likely to be denied
Latino01.4 times as likely to be denied
Native American01.7 times as likely to be denied
Asian/Pacific Islander01.5 times as likely to be denied
Number of applications: 2,433,071; McFadden's pseudo r-squared: 0.2256. Source: 2019 HMDA data

Our analysis shows that financial institutions were almost twice as likely to deny Black applicants conventional mortgages in 2019 compared to White applicants who had the same debt-to-income ratios, made the same amount of money, and shared other important financial characteristics.

Lenders were also more likely to deny Latino, Asian/Pacific Islander, and Native American applicants than their White counterparts when we held the key financial characteristics constant. The disparities for these racial and ethnic groups ranged from 40 to 70 percent more likely to be denied.

Debt-to-Income Findings

Various lenders say they are more comfortable making loans to applicants with low debt-to-income ratios. Some banks say applicants with ratios of 50 percent or higher should reduce their debt before applying for a mortgage. The federal government says applicants with ratios of 43 percent or higher may qualify for only nontraditional mortgages, such as those with balloon payments.

HMDA data displays debt-to-income ratios for each loan application as either categorical or continuous data. Those whose ratios fall below 36 percent are divided into three categories, with breaks at every 10th percent: 36 to 30 percent, 30 to 20 percent, and below 20 percent. The data splits ratios above 50 percent into two buckets: those between 50 and 60 percent and those that are more than 60 percent. Ratios that fell between 36 and 49 percent are displayed as the raw percentage.

To standardize this column, we created buckets using the most consistent definitions among lenders. Bank of America, JPMorgan Chase, and Wells Fargo agree that DTI ratios below 36 percent are the best. These banks, among other lenders, also say that ratios above 50 percent need the most work.

Because lenders define ratios between 36 and 50 percent differently, we split the raw percentages into two buckets at 43 percent. The CFPB says borrowers can end up with different loan products if they are above or below that ratio. By doing this, we created four categories for debt-to-income ratio.

In our nationwide model, the “struggling” debt-to-income ratio category was the most important factor predicting mortgage denial.

Applicants with the worst debt-to-income ratio in 2019 (“struggling” category) were nearly 46 times as likely to be denied compared to those with “healthy” ratios, when we controlled for other factors.

Those in the “nearing unmanageable” category (between 43 and 49 percent) were 1.4 times as likely to be denied than those who have the best DTI ratios.

Applicants with manageable DTIs showed no statistically significant results.

Debt-to-income ratio categories

DTI Category P-value Likelihood of denial for a conventional mortgage compared to healthy DTI ratios
Manageable0.845Not statistically significant
Nearing Unmanageable01.4 times as likely
Struggling045.6 times as likely
Source: 2019 HMDA data

Our regression analysis takes into account only loans and denials, excluding the other outcomes, which are more ambiguous.

However, even when we included approvals in our analysis—those where the applicant chose not to accept the loan, for instance—we still found that people of color had worse outcomes. They were still approved at lower rates than their White counterparts.

Approval rates by race/ethnicity and debt-to-income ratio

Source: 2019 HMDA data

Lenders approved Black applicants with “healthy,” “manageable,” and “nearing unmanageable” DTIs about 80 percent of the time. But they approved White applicants in those three categories about 90 percent of the time.

The starkest disparity was in loan approval rates for Black applicants who are in the “struggling” category compared to White applicants with the same amount of debt. Lenders approved White applicants with a debt-to-income ratio of 50 percent or more at more than twice the rate as Black applicants in that category. Lending rates are also much lower for Latino, Native American, Asian/Pacific Islander applicants than White ones in the “struggling” category.

Taking the analysis one step further, the differences in approval rates are stark for some applicants of color compared to White applicants in worse financial shape. We compared approval rates for high-earning applicants of color—those who earn $100,000 or more—with less affluent White applicants—those earning less than $100,000.

Lenders approved poorer White applicants with the same debt-to-income ratio at higher rates than richer Black applicants across the “healthy,” “manageable,” and “nearing unmanageable” categories. Only in the “struggling” category do lenders approve loans to higher earning Black applicants at the same rate as poorer White applicants.

Approval rating for richer people of color vs. poor White applicants

Source: 2019 HMDA data

Other applicants of color—Asian/Pacific Islander, Latino, and Native American—have the same or slightly higher approval rates compared to poorer White applicants.

Metro Area Findings

We applied our model to 959 metropolitan areas, metropolitan divisions, and micropolitan areas across the country, testing for racial and ethnic differences in mortgage lending.

The majority of those metro areas, 709 of them, did not produce meaningful results because they’re too small (most of these had fewer than 500 applications). We tossed out another 122 areas because they either lacked heterogeneity among the variables in our equation or the regression model produced a poor fit. That left 128 metro areas with meaningful results.

In 37 of those 128 metro areas, the four racial and ethnic variables (Black, Latino, Asian/Pacific Islander, and Native American) were unreliable because they were not statistically significant or there were not enough applications. That doesn’t mean that there are no lending disparities for applicants of color in these areas; we would need more data to draw a more definitive conclusion.

The remaining 91 metro areas produced reliable findings. People of color in those metro areas were between 20 percent and 230 percent more likely to be denied than similar White applicants. In the Los Angeles metro area, for example, our regression showed that Latino applicants are 20 percent more likely to be denied than their White counterparts who have similar financial characteristics.

For this story, we are defining metro areas as having a “statistically significant disparity” if a group is at least 50 percent more likely to be denied than similar White applicants.

In two metro areas—the Fort Lauderdale/Pompano Beach/Sunrise area and the Las Vegas/Henderson/Paradise area—the only reliable results were for Black and Asian/Pacific Islander applicants, and those disparities were less than 50 percent. While those are valid results, they are not statistically significant disparities under our definition.

That left 89 metro areas where we found at least one of the four racial and ethnic groups were more likely to be denied compared to similarly situated White applicants. All of these metro areas had a pseudo r-squared of 0.1 or above, and the race and ethnicity variables had p-values below 0.05, and the odds ratio was 1.5 or above.

These lending disparities appeared in every region of the country, from the South—including Nashville and Birmingham—to the Northeast (Boston), the Midwest (Minneapolis), and the West (Riverside, Calif.).

Disparities were present in some of the nation’s largest cities, including New York, Los Angeles, Chicago, and Houston, and in smaller cities such as Florence, S.C., and Waco, Texas.

Our analysis showed Black applicants are more likely to be denied in 71 of the 89 metro areas (80 percent). Of all the metro areas where Black applicants were more likely to be denied, they fared the worst in the Dayton-Kettering metro area of Ohio. They were nearly three times as likely to be denied as similarly qualified White applicants there.

Lenders were more likely to deny Latino applicants in 39 metro areas (49 percent), Asian/Pacific Islander applicants in 55 metro areas (62 percent), and Native American applicants in one metro area. (There were 15 metro area where Native American applicants are more likely to be denied, but we are not including those because of the small number of applications.)

Striking disparities in denial rates in the nation’s largest metro areas

Metro area Race/ ethnicity Likelihood of denial compared to White applicants 2019 overall population
New York–Jersey City–White Plains, NY-NJ Black1.6 times as likely to be denied11,915,488
Los Angeles–Long Beach–Glendale, CABlack1.8 times as likely to be denied10,081,570
Chicago-Naperville-Evanston, ILBlack2.5 times as likely to be denied7,175,030
Houston–The Woodlands–Sugar Land, TXBlack2.0 times as likely to be denied6,884,138
Atlanta–Sandy Springs–Alpharetta, GABlack1.7 times as likely to be denied5,862,424
Dallas-Plano-Irving, TXBlack2.1 times as likely to be denied4,903,580
Washington-Arlington-Alexandria, DC-VA-MD-WV Black2.1 times as likely to be denied4,901,633
Phoenix-Mesa-Chandler, AZBlack1.9 times as likely to be denied4,761,603
Riverside–San Bernardino–Ontario, CABlack1.6 times as likely to be denied4,560,470
Minneapolis–St. Paul–Bloomington, MN-WIBlack2.2 times as likely to be denied3,573,609
Source: 2019 HMDA data

In the 10 most populous metro areas in the country, Black prospective homebuyers fared the worst in Chicago. Lenders there were 2.5 times as likely to reject Black applicants than White ones with similar financial circumstances.

They were also 1.6 times as likely to deny Latino applicants in Chicago as similarly qualified White applicants there and 1.4 times as likely to turn away Asian/Pacific Islander applicants in Chicago as their White counterparts. The regression didn’t produce statistically significant results for Native American applicants.

Chicago findings

Race/ethnicity P-value Likelihood of denial for a conventional mortgage compared to White applicants
Black02.5 times as likely to be denied
Latino01.6 times as likely to be denied
Native American0.341Not statistically significant
Asian/Pacific Islander01.4 times as likely to be denied
Number of applications: 57,155; McFadden’s pseudo r-squared: 0.2251. Source: 2019 HMDA data

Minneapolis also showed a rare distinction: Of all the metro areas with statistically significant lending disparities, it was the only one where lenders were more likely to turn away applicants of all four racial and ethnic groups than their White counterparts.

Minneapolis findings

Race/ethnicity P-value Likelihood of denial for a conventional mortgage compared to White applicants
Black02.2 times as likely to be denied
Latino0.0061.5 times as likely to be denied
Native American0.0392.1 times as likely to be denied
Asian/Pacific Islander01.7 times as likely to be denied
Number of applications: 42,162; McFadden’s pseudo r-squared: 0.26663 Source: 2019 HMDA data

In Minneapolis, all four racial and ethnic groups were more likely to be denied conventional mortgages than similarly qualified White applicants.

Lender Findings

More than 5,500 lenders reported at least one mortgage application to the government in 2019. Wells Fargo was by far the largest, reporting more than one million applications. The next closest lender, Quicken Loans, reported 20,000 fewer applications. Nearly three-quarters of financial institutions reported fewer than 900 applications that year, a relatively small number for purposes of statistical analysis.

We applied our regression equation to the nation’s biggest lenders—those that reported 5,000 or more conventional home loan applications in our dataset. There were 72 financial institutions that met this criterion.

Of those, the regression equation did not produce meaningful results for 30 lenders, because of a lack of variance for some of the variables or a poor fit produced by the model. We tossed out another 12 lenders because of collinearity issues and an additional four lenders because none of the race and ethnicity variables were statistically significant. This left 26 lenders with statistically significant results.

Our analysis showed these 26 lenders were between 30 to almost 260 percent more likely to deny applicants of color than similarly situated White borrowers. All of these financial institutions consistently denied Black borrowers at higher rates; some lenders were 30 percent more likely to deny Asian/Pacific Islander and Latino applicants than similar White borrowers. Our sample did not include enough Native American applicants to draw a conclusion.

Seven lenders received more than 1,000 applications from Black and Latino borrowers, and these lenders were at least twice as likely to deny applicants of those races and ethnicities compared to their White counterparts. These were some of the widest disparities among those lenders that produced statistically significant results.

DHI Mortgage Company, Lennar Mortgage, Pulte Mortgage, Freedom Mortgage Corporation, Movement Mortgage Corporation, Fairway Independent Mortgage Corporation, and Navy Federal Credit Union were all at least twice as likely to deny applicants of color when we held 14 variables constant—including income, debt-to-income ratio, and combined loan-to-value ratio.

Lenders with the widest disparities for applicants of color

Lender Race/ ethnicity Number of apps P-value Likelihood of denial for a conventional mortgage compared to White applicants
DHI Mortgage CompanyBlack1,27602.6 times as likely to be denied
DHI Mortgage CompanyLatino2,15402.0 times as likely to be denied
Lennar MortgageBlack1,28102.3 times as likely to be denied
Lennar MortgageLatino2,83702.1 times as likely to be denied
Freedom Mortgage CorporationLatino1,04302.2 times as likely to be denied
Pulte MortgageLatino1,29602.2 times as likely to be denied
Fairway Independent Mortgage CorporationBlack2,01402.1 times as likely to be denied
Movement MortgageBlack1,28902.1 times as likely to be denied
Movement MortgageLatino2,22802.1 times as likely to be denied
Navy Federal Credit UnionBlack1,46702.1 times as likely to be denied
Source: 2019 HDMA data

Denial Reasons

Lenders can deny mortgage applications for multiple reasons. They can list up to four reasons in HMDA. The options include:

Across most racial and ethnic groups, with the exception of Native American applicants, lenders reported rejecting most applications because the debt would be too high relative to their income.

Three most common reasons for denial for each racial and ethnic group

Source: 2019 HMDA Data

Lenders listed debt-to-income ratio as one of the reasons they refused to make loans to Black applicants 35 percent of the time. Debt was also the most common reason lenders listed in denying White applicants—33 percent of their rejections.

When lenders listed “credit history” as the reason for denial in 2019, it was cited more often for Black applicants than White ones: 33 percent versus 21 percent.

V. Limitations

Credit Scores

Lenders say an applicant’s credit score is an essential variable they use to evaluate mortgage applications. A credit-scoring algorithm takes a person’s history of making payments and attempts to predict that person’s likelihood of paying back a loan in the future, with a numerical score. That score is a snapshot in time; a person’s credit score can fluctuate from one day to the next based on outstanding bills and balances on lines of credit.

Credit scores usually range from 350 to 850, though each credit-scoring company uses a slightly different range of scores. A higher score indicates a higher likelihood of repayment. FICO says a score above 670 in its model is considered good, and anything above 800 is exceptional. About two-thirds of people have a good or better credit score, according to Experian.

In its analysis of the 2019 HMDA data, the CFPB found that for all applications, Black applicants had, on average, the lowest credit score of all racial and ethnic groups—a median credit score of 694. For conforming conventional mortgages, the Black median credit score jumped to 724. The median credit score for White applicants was nearly 60 points higher for all applications and 35 points higher for conforming conventional mortgages.

Researchers said people of color are given lower credit scores because some of the factors that go into the algorithm have a disproportionate effect on people of color.

Despite the importance of credit scores in lending decisions, we could not control for this variable in any of our models. As mentioned earlier, the CFPB shares credit scores with other government regulators but not with the public, saying this data point could reveal too much about a prospective homebuyer’s identity. The agency says it must weigh applicants’ privacy with the benefits of releasing data.

Other variables the CFPB chose to exclude from the public version of the data for privacy concerns include the property address, the date the application was submitted, the date the decision was made, and the decision made by the automated underwriting system, which lenders can override. Applicant names are not reported under HMDA.

Conventional Loans

HMDA data includes various types of mortgages that are either conventional loans, where the lender assumes the risk if a borrower defaults, and government-backed loans, where one of several federal agencies would be on the hook.

Conventional mortgages typically require a higher down payment and a higher credit score than government-guaranteed mortgages and are often less expensive for the borrower in the long run.

FHA loans, which are insured by the Federal Housing Administration, require a lower down payment and credit score in exchange for higher fees and interest rates for the life of the loan. Previous studies have shown borrowers in neighborhoods of color receive FHA loans at higher rates than conventional loans, raising concerns that lenders push applicants of color into government-backed loans. The City of Philadelphia sued and settled with Wells Fargo over this matter. Other studies have argued that FHA loans “have long been implemented in a manner that promotes segregation.”

We limited our analysis to only conventional loans, to gauge lenders’ actions without the safety net of this government backing.

The lending industry criticized our analysis for not including government-backed loans, arguing that FHA loans, for example, are used to bridge the gap between prospective borrowers who don’t qualify for conventional loans and homeownership.

Our analysis of 2019 HMDA data does show Black, Latino, and Native American applicants have more success securing FHA loans than conventional ones. However, White borrowers still get FHA loans at higher rates than applicants of color.

Lending rates for conventional vs. FHA loans by race/ethnicity

Source: 2019 HMDA data

Black applicants receive conventional loans 61 percent of time, while they are able to secure FHA loans almost 65 percent of time. White borrowers receive both conventional and FHA loans about 75 percent of the time.

Race and Ethnicity

Racial and ethnic identities are rarely straightforward, and the data reflects that complexity. Each applicant and co-applicant can list up to five different racial and ethnic categories.

To streamline the analysis, we focused on the first racial and ethnic identity listed for the main applicant. This unfortunately meant we didn’t focus on mixed-race individuals, as that would have added another complex layer to the analysis.

Since 2018, HMDA data has disaggregated the ethnic identities for Asians (Filipino, Vietnamese, Korean, etc.), Latinos (Mexican, Puerto Rican, Cuban, etc.), and Pacific Islanders (Native Hawaiian, Guamanian, Samoan, etc.). But along with those specific ethnicities, HMDA still provides the general umbrella term for each race and ethnicity as an option: Lenders can still choose Latino, Asian, or Native Hawaiian/Pacific Islander.

We re-aggregated these ethnicities to have a statistically significant sample size for each racial and ethnic group, with the full understanding that no racial or ethnic group is a monolith.

According to our analysis, Latino applicants nationwide are 1.4 times as likely to be denied conventional mortgages compared to similarly qualified non-Hispanic White applicants. That number may be an underestimate because of the diversity within the Latino population. The 2019 HMDA data shows higher denial rates for the specific Latino identities as opposed to the general umbrella “Latino” category. Studies have also shown that lending patterns for Latino applicants vary by geography.

We also aggregated Asian and Pacific Islander applicants for the sake of applying the most consistent model across the nationwide dataset, the individual metro area subsets, and on the lender analysis. It’s important to note that on a national level, Pacific Islander applicants have higher denial rates than Asian applicants.

VI. Industry Response

We sent our methodology to industry leaders—American Bankers Association (ABA), the Mortgage Bankers Association (MBA), The Community Home Lenders Association, and The Credit Union National Association. They all criticized our analysis, saying we didn’t include credit scores and credit histories and that our scope was too narrow because we focused on conventional mortgages.

But credit scores are stripped from public HMDA data, and credit histories are not reported. Lending industry representatives said that because of this, HMDA data is not complete enough to explain why disparities between people of color and their White counterparts exist.

“Any meaningful review of mortgage lending practices for possible discrimination, as regulators and the courts have made clear, must also consider individual factors such as a borrower’s credit score and credit history, which lenders are required by law to take into account,” said Blair Bernstein, director of public relations for the ABA, in a written statement. “An individual’s credit history can help explain why seemingly comparable applicants may not always end up with the same lending outcome.”

In addition to credit scores and histories, the MBA argued that our analysis should have included government-backed loans because it would have painted a more complete picture of homeownership and lending rates.

“It [the analysis] purposely excludes mortgages guaranteed by the Federal Housing Administration (FHA), designed to help borrowers with lower credit scores and small down payments,” said Mike Fratantoni, chief economist at the MBA, in a written statement.

When we sent our findings to the seven lenders with large lending disparities, all of them responded with written statements saying that they follow the law. In addition:

VII. Conclusion

In the past, the lending industry argued that HMDA data couldn’t provide an accurate picture of lending disparities because it lacked key variables: credit scores, debt-to-income ratios, and combined loan-to-value ratios.

Because of the Dodd-Frank Act, debt-to-income and combined loan-to-value ratios are now included in the latest data. We found debt-to-income ratio was the biggest predictor in the public data, deciding when lenders make or deny mortgages.

However, holding debt-to-income and combined loan-to-value ratios constant, along with 15 other variables, we still found applicants of color are more likely to be denied a mortgage than their White counterparts. Nationwide, Black applicants fared the worst: Financial institutions were nearly twice as likely to deny them when compared to similarly qualified White applicants.

Lenders even made loans at higher rates to poorer White applicants than richer applicants of color with the same debt ratios.

While we calculated statistically significant odds of denial for applicants of color for the entire country, we found regional differences.

In 89 metro areas, financial institutions were significantly more likely to deny mortgages to applicants of color than to their White counterparts. Black applicants were more likely to be denied in 71 of those areas, Latinos in 39, Asian/Pacific Islander in 55, and Native Americans in one area.

The top 10 most populous metro areas, from New York and Los Angeles to Houston and Atlanta, all showed statistically significant disparities for at least one racial or ethnic group. Among the largest metropolitan areas, disparities for Black applicants were highest in Chicago. In Minneapolis, lenders were more likely to deny all four racial and ethnic groups—Black, Latino, Asian/Pacific Islander, and Native American applicants—than similar White applicants, the only metro area with this distinction.

Of the lenders we were able to test, seven stand out with the widest lending disparities between people of color and White people: DHI Mortgage Company, Lennar Mortgage (formerly known as Eagle Home Mortgage), Pulte Mortgage, Freedom Mortgage Corporation, Movement Mortgage Corporation, Fairway Independent Mortgage Corporation, and Navy Federal Credit Union. These all had received more than 1,000 applications from Black and Latino applicants in 2019 and were at least twice as likely to deny them than their White counterparts, even though they had similar financial characteristics.

VIII. Appendix

How we filtered specific fields

Loan Type

HMDA data denotes four types of loans:

Loan type categorizes the institution at risk of losing money if borrowers default. In a conventional loan, the lender is at risk; in guaranteed loans, one of several federal agencies would be on the hook.

Conventional mortgages typically require a higher down payment and a higher credit score than government-guaranteed mortgages. Although they are considered more difficult to get than government home loans, these mortgages are often less expensive for the borrower in the long run.

We limited our analysis to conventional loans, in order to gauge lenders’ actions without the safety net of this government backing.

Loan Purpose

An applicant can get a mortgage, buy a new home, or refinance an existing mortgage to get better terms or take out a bigger loan.

We limited our data to mortgages sought for home purchase because we wanted to measure people’s access to new homeownership. Those who want to refinance already own a home.

Property Type

Applicants can get mortgages for different types of property, from a single-family home to an entire apartment complex. We included only applications for properties that had one-to-four units, which would capture single-family homes and duplexes, triplexes and quadplexes.

Occupancy Type

Occupancy type indicates whether the applicant is going to live in the home, use it as a second home, or use it as an investment property. Because we are looking at principal homeownership, we narrowed the data to applications where the home would be the applicant’s primary residence.

Construction Method

In the HMDA data, properties are either site-built, meaning they were originally built from the ground up at the location of the property, or they are “manufactured” homes, like mobile homes. We focused on site-built homes.

Business or Commercial Purpose

Commercial and business property loan applications are also included in the HMDA data. We excluded these from our analysis.

Lien Status

A lien refers to the order in which a person or entity would be repaid if a borrower defaults on a loan. Whichever entity holds the first lien on a loan is scheduled to be repaid first. For this analysis, we focused on first-lien mortgages.

Action Taken

Action Taken refers to the outcome of the application. These can be:

We focused on the two most straightforward outcomes: where the lender approved the application and the loan was made (originated), or where the lender denied it. In terms of looking for disparities in lending, researchers and advocates we interviewed told us the other outcomes are less clear.

For example, a lender can approve the loan, but the applicant can turn it down because they shopped around and received multiple offers, or the terms of the loan were too expensive or otherwise unsatisfactory. Researchers and advocates also told us that lenders have tried to artificially reduce their rejection rates by purposefully sitting on applications for so long that the applicants have been forced to withdraw them.

Combined Loan-to-Value Ratio

This data point looks at all the money used to finance the property—the primary mortgage, as well as any additional lines of credit or secondary mortgages—in proportion to the property value. It tells lenders how much of the overall home’s value an applicant is financing. This figure is represented as a percentage.

While it’s not uncommon for CLTVs to be above 100 percent, most lenders prefer them to be under. In our data, CLTV was above 100 percent in more than 60,000 applications, with the highest being listed at more than 61 million percent.

Because it’s hard to distinguish between correctly reported CLTV and erroneous ones, we decided to filter out those records where the CLTV was above 100 as well as those where the data is missing.

The independent variables in regression model

Race and Ethnicity

HMDA data has five columns for race and ethnicity because applicants and co-applicants can list multiple racial and ethnic identities on their applications. In determining the race and ethnicity of the applicant, we relied on the first race and ethnicity of the main applicant. This was the most straightforward approach.

Applicants who marked their ethnicity as Latino regardless of race were grouped together to form a Latino group. Non-Hispanic applicants, including those who didn’t report an ethnicity, were grouped together according to their race: Non-Hispanic White, Non-Hispanic Black, etc. Those applicants who didn’t report a race or an ethnicity were grouped separately. The racial and ethnic categories we used are:

Sex

We created three categories for sex: male, female, and not applicable. Those records in the not applicable category were those that either didn’t provide any information or where both the male and female options were marked.

Co-applicant

HMDA data doesn’t have a single column that flags whether there’s a co-applicant or not. Instead, that information is scattered across five columns: co-applicant’s race, ethnicity, sex, age, and credit model used. Each of those columns includes an option that states there’s no co-applicant.

We determined there was no co-applicant when all five columns were consistent in listing the “no” option or when one column marked the no co-applicant option and the other four columns didn’t provide any information.

We determined there was a co-applicant when at least one of the five columns provided clear information about the co-borrower and none of the rest were marked in the negative (no co-applicant).

Age and credit model are new fields, and not every lender is required to report that information. But all lenders must report an applicant’s race, ethnicity, and sex, where applicable. Because of those requirements, we gave more weight to the race, ethnicity, and sex fields when looking for co-applicants, meaning that when race, ethnicity, or sex clearly state there’s a co-applicant, but age or credit models say there isn’t, we marked those records as having a co-applicant. Of the 17.5 million records in the 2019 HMDA dataset, there were only 20,000 instances (less than one percent) where those sets of columns contradict each other.

We created a third bucket for those cases where we did not have enough information to unequivocally say whether there was a co-applicant or not. This included records that did not report any information on a co-applicant’s race, ethnicity, sex, age, or credit model. Additionally, we included those records where information in various columns contradicted each other: for example, records where race, ethnicity, and sex said no co-applicant, but age and credit score said there was a co-applicant.

Our co-applicant variables included:

For the metro area and lender regression model, we filtered out applications where the co-applicant field was “not applicable” because they accounted for less than 0.5 percent of the total data and that variable would present issues when analyzing individual metro areas and lenders.

Age

The data displays an applicant’s age as categorical data, in buckets, rather than as a continuous variable. Those buckets are constructed in 10-year increments, as well as buckets for the youngest and oldest applicants:

For the most part, we left the HMDA age categorizations intact and used them as variables, with one exception. For those applicants whose age is greater than 74, we folded those records into the 65–74 category because they accounted for a small percentage of the data. This created a variable we called 65 or greater.

Any records where the age was missing we put into their own category.

Our final age variables groups:

When analyzing metro areas and lenders, we combined the “less than 25” and “between 25 and 34” into one category, merged “between 35 and 44” and “between 45 and 54” into one bucket and used it as the reference variable, and consolidated the “between 55 and 64” and “65 of older” into one variable. We also filtered applications where age was not provided because they account for less than a 10th of one percent.

Income

This is the annual income of the applicant and co-applicant. Income is skewed to the right, meaning the majority of applicants are clustered at the lower end of the income spectrum, with a few outliers with drastically high incomes, forming a long tail on the right side of the histogram. Because of this and the fact that there is a wide gap between the lowest and highest incomes, we took the logarithm of income to shrink the gap. We used this variable as a continuous one. We also included only those records where income was greater than zero.

Loan Amount

Loan amount has the same characteristics as income, skewed to the right and with a large gap between the lowest loans and the highest ones. We also took the logarithm of the loan amount to handle these issues. We used this variable as a continuous one.

Property Value Ratio

Like income and loan amount, property value is skewed to the right and has extreme outliers. Property values differ depending on their location: An $800,000 home in San Francisco is different from a home of the same value in Fresno, Calif.

To normalize the data, we decided to calculate a ratio between the property value and median property value of the county. We chose the median property values of a county over the metro area because not every county in the HMDA data is associated with a metro area. We used 2019 American Community Survey data for median property values for each county in the country.

Property value data is closer to having a normal distribution when the ratios are considered. The normal distribution becomes more apparent when removing ratios that are greater than 10.

We first tried using property values as a categorical variable, calculating the z-scores of the property value ratios to create six distinct buckets. The last bucket, the sixth, contained the outliers, the super expensive homes relative to the county they are situated in. But we realized those types of homes don’t exist in abundance in most metro areas.

We settled on using the property value ratio as a continuous variable.

Mortgage Term

This variable describes the length of time a borrower has to pay off the mortgage. We turned this variable into categorical data with four buckets:

When looking at metro areas and individual lenders, we combined the “less than 30 years” and “more than 30 years” variables into one category and filtered out the records where the data was missing.

Credit Model

Lenders have 10 options for indicating which credit scoring model was used to assess an applicant’s creditworthiness:

Because TransUnion has two different models, and the 98 version only accounts for a small fraction of the total data, we combined the two into a single TransUnion category. The two Vantage models are not widely used either, so we folded those two models with the “other” credit model option. We grouped the “not applicable” and “exempt” options together.

This gave us five categories for credit models:

Debt-to-Income Ratio

This variable looks at the relationship between an applicant’s monthly debt compared to his or her monthly income. HMDA data lists the variable as both categorical and continuous data, meaning some of the ratios are grouped into buckets, while others have the exact ratios listed.

We standardized this variable by turning it into categorical data. There were many ways we could have categorized debt-to-income ratios. Bank of America says the best debt-to-income ratio for a mortgage applicant to have is less than 36 percent. JPMorgan Chase breaks the ratio into four distinct buckets: “healthy,” “manageable,” “nearing unmanageable,” and “struggling to manage debt.” The CFPB says a borrower should be able to get a mortgage with the most stable features and consistent terms, like interest rates, with a ratio of monthly debt to income up to 43 percent.

We created four categories based on the most consistent definitions and also relied on the CFPB for the middle categories:

Combined Loan-to-Value Ratio:

For home purchases, the combined loan-to-value ratio represents all aggregated loans being taken to purchase the property in relation to the property’s appraised value. This figure is represented as a percentage.

We first included this data point as a categorical variable, split at 80 percent. But we found a stronger fit in our model when we used the raw percentages. We used this variable as a continuous one.

The ratio between the median income of a census tract and of the metro area

This column describes how rich or poor the neighborhood (census tract) where the property is located is, compared to the metro area where the census tract is located. It’s calculated by dividing the median income of the neighborhood by the median income of the metro area. The variable is expressed as a percentage.

We converted this data into categorical data, as defined by government regulators and lending requirements. These categories are used to look at lending in poor and working-class neighborhoods.

Lender Type

We divided lenders into three primary categories: banks, credit unions, and independent mortgage companies.

These categories were based on another dataset, called HMDA panel, that comes from the Federal Housing Finance Agency. That dataset details more specifically the different types of lenders.

For lenders that didn’t fall into those three categories, we looked them up in the Federal Deposit Insurance Corporation databases and researched their websites to find out what categories they belonged to. We created a “not applicable” group for lenders we could not fit into the categories.

Lender definition categories:

We removed the lenders that were classified as “not applicable” for the metro area regression analysis because they accounted for less than half of one percent of the data.

The size of the lender

The CFPB puts out a dataset called Public Panel that details specific information about the lenders that report their applications, including the regulator of the financial institution, their location, and their assets. We would have liked to use assets as a variable for size, but asset information is missing for 20 percent of all lenders. However, the total number of applications strongly correlates with a lender’s assets, so we used the number of applications as a proxy for size. We used this variable as a continuous one.

The Automated Underwriting System

Financial institutions input all the information they have collected about a prospective homebuyer into an algorithm, called the automated underwriting system, and that spits out a decision about whether they should deny or approve the loan. Sometimes lenders will manually overwrite that decision.

Each federal guarantee entity has its own underwriting system. Fannie Mae uses Desktop Underwriter, Freddie Mac uses Loan Prospector, the Federal Housing Administration uses TOTAL Scorecard, and the United States Department of Agriculture uses Guaranteed Underwriting System. Individual banks and lenders may also use their own, internal, underwriting algorithms when assessing a mortgage application.

Financial institutions can run an application through multiple automated underwriting systems. HMDA data offers five options for lenders to indicate which systems they used.

Because Fannie Mae’s Desktop Underwriter was used about half of the time for all conventional mortgages, we broke up all five columns into three categories:

Non-Hispanic White demographic percentage of the property’s location

Redlining has historically been based not only on the race and ethnicity of the applicant but also the demographic composition of the neighborhood. Based on old redlining maps, the federal government and lenders viewed White neighborhoods as the most favorable ones to invest in, and the risk they associated with financing a neighborhood increased as its residents became less White.

Because of this, we included the non-Hispanic, White population of each census tract in the country. We used the 2019 American Community Survey data from the census and created four variables that took into account the decreasing White population. Those groups were split at every 25th percentage point:

The size of the metro area where the property is located

To control for larger metro areas, we downloaded 2019 American Community Survey population data. Population data is skewed to the right, with a few large outliers. We decided to break up metro areas by population using percentiles. The population range in the 90th percentile was wide, so we separated those areas in the 99th percentile from the rest. Micropolitan statistical areas tend to be smaller than their metropolitan statistical area counterparts, so we kept those in a single category.

Our categories for the metro areas are as follows:

Acknowledgements

We thank Calvin Bradford (Calvin Bradford & Associates, Ltd.), Rebecca Goldin (Sense About Science USA and George Mason University), Jennifer LaFleur (Center for Public Integrity), and José Loya (UCLA) for reviewing an earlier draft of this methodology.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

You just read

How We Investigated Racial Disparities in Federal Mortgage Data