University of South Carolina University of South Carolina
Scholar Commons Scholar Commons
Senior Theses Honors College
Spring 2024
Barriers to Baseball: A Comprehensive Analysis of Factors That Barriers to Baseball: A Comprehensive Analysis of Factors That
Impact Contract Value in Major League Baseball Impact Contract Value in Major League Baseball
Ryan A. Crowl
University of South Carolina
Director of Thesis:Director of Thesis: Dr. Matthew Brown
Second Reader:Second Reader: Dr. Johan Rewilak
Follow this and additional works at: https://scholarcommons.sc.edu/senior_theses
Part of the Labor Relations Commons, and the Sports Management Commons
Recommended Citation Recommended Citation
Crowl, Ryan A., "Barriers to Baseball: A Comprehensive Analysis of Factors That Impact Contract Value in
Major League Baseball" (2024).
Senior Theses
. 686.
https://scholarcommons.sc.edu/senior_theses/686
This Thesis is brought to you by the Honors College at Scholar Commons. It has been accepted for inclusion in
Senior Theses by an authorized administrator of Scholar Commons. For more information, please contact
BARRIERS TO BASEBALL
2
ABSTRACT
The landscape of Major League Baseball’s free agency is an evolving one with numerous
factors impacting negotiated contracts as player salaries escalate and teams adopt new strategies
of luring players to their organizations. This study delves into the intricate dynamics of this free
agent market, exploring the factors that influence contract values and negotiating power.
Through empirical analysis and mathematical regressions, we explore the enduring significance
of multiple factors on contract value, encompassing demographic indicators as well as on-field
performance. Subsequently, we propose our own explanations of any observed inequities or
significance and provide rational approaches for players and their agents to follow in entering
future free agencies. By shedding light on these complex phenomena, this research seeks to
equip players and their representatives with a deeper understanding of their relative value in the
marketplace, creating a method through we which can observe and understand baseball’s unique
labor market and advocating for fairer, more transparent contract negotiations in Major League
Baseball free agency.
BARRIERS TO BASEBALL
3
TABLE OF CONTENTS
Introduction…………………………………………………………………………………….4
Literature Review………………………………………………………………………………5
Methodology……………………………………………………………………………………12
Results…………………………………………………………………………………………..22
Explanation of Findings……………………………………………………………………….39
Conclusion……………………………………………………………………………………...46
Appendix………………………………………………………………………………………..48
Bibliography……………………………………………………………………………………49
BARRIERS TO BASEBALL
4
INTRODUCTION
Since the death of the reserve clause and inception of the current system of free agency in
Major League Baseball (MLB) in 1974, the ability for players to negotiate for contracts based on
their perceived worth has defined the game. Immediately following the conclusion of the World
Series in November, baseball pundits shift from focusing on the play on the field to the closed
door meetings between players and teams that occur off the field, pondering where superstar free
agents will sign and for how much, how teams will find players to fill the gaps and take them to
the next level, and what teams will go on this offseason’s spending spree. Conjointly, salaries
awarded to free agents are on the rise—the MLB’s average salary has risen 23% in the past two
seasons and looks to be rising ever higher with the minimum annual salary in 2024 being
increased to $740,000 (Blum, 2024; NBC Sports Staff, 2023). A 7.1% rise between 2022 and
2023 helped spur this salary increase, yet teams’ revenues and income have also ballooned far
higher to an average team value of $2.32B, resulting in player payroll constituting a mere 7% of
overall value (Blum, 2024; Ozanian & Teitelbaum, 2023; NBC Sports Staff, 2023). Questions
have arisen in recent years around the worth of players as well as the proportionality of teams’
spending, especially when viewing a severe dichotomy between the approaches of teams in free
agency. Teams like the Oakland Athletics are slashing payroll to save money while teams like
the Mets, Padres, and Dodgers consistently engage in lucrative free agent spending in the hopes
of winning an elusive World Series Championship.
As players approach this dynamic free agent market, they and their agents are constantly
trying to evaluate the market, to determine their worth to a team, to emphasize factors of value,
and to ultimately garner higher salaries in negotiations, but doing so is no easy task. There exists
no systemic and calculable method through which these players, their agents, nor their teams can
cohesively quantify value. Our research seeks to empirically answer these questions surrounding
the creation of value that dominate the baseball landscape today to build a foundation for
analysis of current and future contracts negotiated in free agency. Past studies have attempted to
quantify these factors that significantly affect contract values; through our work, we will expand
upon this past research and answer questions about these factors.
First, we can approach analysis and quantification of the only consistent finding of
previous research: performance is a significant and prevailing indicator of contract value.
BARRIERS TO BASEBALL
5
Intuitively, players who perform better in-game add value to their teams and should be rewarded
with a higher salary proportional to their marginal revenue product, but we will explore if this
trend persists in today’s game. Beyond this preeminent theme, however, we will also address
many other questions, first with an aspect of performance. Has the rise of social media, extensive
media coverage of the game, and increased presence of other outside factors diminished the
impact of performance in negotiations and shifted the narrative at the bargaining table to account
for off-the-field revenue generation? Does there exist a significant difference in pay between
players of different positions, and does this difference signify a league-wide view of one position
as more important? Is there a correlation between contract value and a player’s age, and are
teams thus able to manipulate deals based on where a player lies in the progression of their
career? Are players engaging in long-term deals likely to receive a higher annual salary than
those receiving short-term or one-year deals? Do certain teams around the league significantly
overpay or underpay free agents, and how and why are they able to do so? Do Scott Boras and
other large agencies’ sizeable asking prices in free agency actually lead to significantly larger
contracts for their clients, or do they adversely affect negotiations and hurt these players’
markets? In the increasingly globalized game of baseball, is there inherent discrimination
present, leading to significantly different contract values between players of different races or
from various countries?
Through our research, we seek to answer these questions and explain the existence of
their answers in the context of Major League Baseball. With our analysis, we hope to return
power to the players and their agents in negotiations in understanding relative value and being
able to negotiate for fair contracts.
LITERATURE REVIEW
1. Overview
Being one of the four major sports leagues in the United States, Major League Baseball
experiences extensive coverage in the media landscape, which comes with its own scrutiny and
subsequent conversation on social media, but it also has found itself as the subject of research
and analysis in regards to numerous aspects of the game: the labor market surrounding free
agency, teams’ actions in foreign markets, discrimination inherent in the game, and more. In a
BARRIERS TO BASEBALL
6
review of the literature and research surrounding Major League Baseball, we may segment our
review into two sections that help to contextualize the selected factors we seek to explore. First,
we must establish a brief foundational understanding of the history of discrimination present in
Major League Baseball from three facets: the original discrimination, the exploitation of
minority players, and possible discrimination today. This will help to understand social trends
that impact free agency as well as factors the public care about. Then, we will review the
literature conducted previously on the unique labor market that drives negotiations in baseball,
beginning with Gerald Scully’s initial analysis of the relationship between pay and performance
in 1974. Such research and knowledge establishes a foundation through which we can
understand the objectives of analysis of recent free agency transactions and contextualize any
significant findings.
2. History of Discrimination
Baseball’s history is not complete without the story of Jackie Robinson. Without Robinson,
many superstars within today’s league would never have stepped foot on a Major League
Baseball field; he charted the course for the diversity and internationalization we view in the
game today. Prior to 1947, a non-White player had never set foot on a Major League Baseball
field, and instead, premier black players like Jackie Robinson and Satchel Paige were found
exclusively in the Negro Leagues. However, Branch Rickey, whose “interest in integrating
baseball began early in his career,” changed this when he signed Jackie Robinson to a major
league deal for the Brooklyn Dodgers and effectively broke baseball’s color barrier (Breaking
the color line). Since then, we have witnessed a complete integration of Major League Baseball
not only with African-American players but with those from all backgrounds including Latin
American countries, Asia, and even Australia, yet over this period of integration, we have
continually witnessed prejudice and bias in multiple facets, beyond the initial backlash of
integration. While we could cite numerous incidents of racism and prejudice, which would only
encompass the views and beliefs of individuals and teams, we approach this historical analysis
with a view toward measurable discrimination in wages, hiring practices, and free agency.
As Major League Baseball desegregated, it saw a prime opportunity to pull players in
free agency froma pool of readily available, proven baseball talent [that] existed in the Negro
leagues. The contracts of most of these players could be purchased at a fraction of the cost that
BARRIERS TO BASEBALL
7
would be necessary to obtain comparable white players” (Gwartney & Haworth, 1974). As such,
teams were able to procure and underpay black players as they entered the league, and those
“’low discriminators’… obtained a competitive advantage in major league baseball relative to
other teams” (Gwartney & Haworth, 1974). In this way, discrimination was not only promoted
by teams but was advantageous in building competitive teams; however, we subsequently
observed a complete reversal of this trend as baseball approached the 1960s and 1970s and black
players received comparatively higher returns relative to their white counterparts (Kahn, 1991).
This trend slowly faded as wages corrected to become more in-line with performance, yet in
today’s game, we see continued instances less overt exploitation.
A journal except by Arturo Marcano and David Fidler establishes a background with
which to view the topic of Latin-born players’ contract negotiations with a forward-looking view
of the globalization we have seen dominate the game of baseball. Written in 1999 at a time when
the game was beginning to experience an influx of Latino talent, they explore a broad definition
of globalization in relation to baseball before delving into issues with contract negotiations and
interaction with Latin-American players and prospects: violations of international contract and
labor laws, exploitation of talent, a discouragement of the use of agents, and much more.
Examples include, but are not limited to, the following statements by the authors:
“None of the protections available to American baseball prospects are generally
available to Latino children and teenagers recruited by MLB teams… MLB teams
generally make little effort to afford Latino baseball prospects with some
minimum standards of treatment and take advantage of poverty and ignorance in
signing Latino baseball prospects… more Latino children are being brought into
MLB’s system without any improvement in their chances to play baseball
professionally in the United States. The number of children brought into the
academies and then released back into poverty after one or two years is
increasing… scouts and baseball academies are often less than family oriented in
the attempt to protect prospects from being pilfered by other MLB scouts and
teams… [and] the larger problem with Latino baseball prospects is not, however,
agent abuses; but, rather, the lack of agent involvement in the signing of Latino
ball players by MLB teams” (Marcano & Fidler, 1999).
BARRIERS TO BASEBALL
8
Exposing the MLB’s historical precent of mistreatment and unjust labor practices
regarding Latin-born players in this way provides context for the trends we may see when
exploring present-day treatment of Latino players and their subsequent negotiation processes.
With the continued existence of baseball academies, we may posit that these abuses continue,
and while Latin-born players perform in the league at the highest level, there are numerous
young players left behind due to these teams’ exploitative practices.
While the MLB has moved away from true segregation as well as seemingly eliminated
discriminatory practices in wages, forms of discrimination may still be present, and questions
regarding this discrimination, diversity, and more occasionally arise on social media as well as in
reporter commentary. Media coverage recently has discussed the diversity of Major League
Baseball players and managers withthe lack of African American players [continuing] to be a
glaring issue” even though the “overall diversity of MLB rosters remains strong with 40.5%
being players of color” (Lapchick, 2023). With the MLB’s history, it seems that these questions
and this scrutiny will never cease to exist, yet researchers have continually attempted to apply
statistical analysis to prove or disprove these hypotheses of a race issue in the league. We
attempt to summarize this research below before providing our own analysis to advance the
discussion.
3. Prior Research
Based on the historical precedent set in the league, discrimination is seemingly ingrained into
the culture of Major League Baseball in numerous ways. However, general sentiment largely
agrees that performance should form the basis of contract value in negotiations across sports, and
extensive research by the likes of individuals such as Lawrence Kahn and Gerald Scully have
explored this concept through analysis of baseball’s unique labor market. Kahn summarizes that
which makes this market unique and readily decomposable:
“There is no research setting other than sports where we know the name, face, and
life history of every production worker and supervisor in the industry. Total
compensation packages and performance statistics for each individual are widely
BARRIERS TO BASEBALL
9
available, and we have a complete data set of worker-employer matches over the
career of each production worker and supervisor in the industry… sports salary
and performance data, and the knowledge of the race of each player in a sport,
allow us to estimate the extent of discrimination in a much more detailed way
than is possible in other industries” (2000).
Previous literature inspiring our own has effectively compiled this publicly available information
to evaluate the determinants of wage differences in baseball’s labor market as well as the
possible existence of wage discrimination among its productive workers, the players. The
research of Gerald Scully marks the advent of this research in 1974 as Major League Baseball’s
Players Association first unionized and established a system of free agency for players with
service time in excess of six years (Kahn, 2000).
Before exploring the seminal analysis of Scully, we can use the literature of Lawrence
Kahn to provide a foundation to understand and approach analysis of baseball’s labor market
with his numerous research pieces on discrimination, whether due to owner, co-worker, or
customer prejudice (1991). He compiles research from numerous scholars, including Scully, to
elaborate on the existence of such discrimination, finding that “racial performance differentials
were less noticeable in baseball, as Hill and Spellman (1984) found that blacks outperformed
whites in four of six positions… [but] there is evidence that blacks are underrepresented at
pitcher, catcher, and infield positions…” (Kahn, 1991). This leads to questions surrounding this
underrepresentation, as these are positions of critical importance and decision-making; it could
additionally imply that teams are remiss to put black players in positions of significance.
However, while evidence exists of this discrimination in positional assignments, Kahn finds that
overall, there exists little racial discrimination against minorities across baseball with no
significant evidence of wage discrimination (2000). With this foundational understanding of
wage discrimination and the apparent lack thereof, we can move to research that identifies
factors that are significant determinants in contract negotiations.
While Kahn represents one of the most prominent researchers of baseball’s labor market,
research into this space began with the formative research of the aforementioned Scully who
analyzes the relationship between Pay and Performance in Major League Baseball, testing a
hypothesis that a “perfectly competitive” labor market would result in “player salaries… equated
BARRIERS TO BASEBALL
10
with player marginal revenue products (MRP)” (1974). In labor economic theory, MRP identifies
the additional revenue generated by a one unit increase in labor input. His model first determines
MRP for a player by crafting a linear equation including team winning percentage as well as
other market characteristics. Subsequently, he designates four factors as being significant in
baseball salaries: hitting/pitching performance, the weight of contribution to team performance,
number of years spent in the major leagues, and a player’s star status. The hitting and pitching
metrics utilized were slugging percentage for hitters as well as strikeout-to-walk ratio and
innings pitched for pitchers. Each of these measures could be considered independent of overall
team performance but also indicative of a player’s contribution to team play, and the inclusion of
a factor for a player’s heightened status was implemented to control for players that may receive
disproportionately higher salaries than what they would conceptually deserve based on their
performance. After conducting analysis based on these factors as well as outside factors such as
market size, fan intensity, and residence in the National League, Scully finds that the main
determinants of strikeout-to-walk ratio, service time, innings pitched, slugging percentage, and
lifetime at-bats all display significance with the multiple outside factors proving insignificant
(1974). The implication of these findings is the establishment of our foundational approach to
today’s labor market in which we assume that performance is a leading indicator of contract
value and forms the basis of a player’s worth in the labor market; future research in the space
generally branches from this research, taking its findings on the relation between salary and
performance metrics as fact.
The next major piece influencing our research is that of Walters, von Allmen, and
Krautmann who evaluate the variance in contract value received by players on long-term
contracts and “whether it is relatively more risk-averse employees (players) who insure
themselves against loss of income by accepting a discounted long-term agreement, or relatively
more risk-averse employers (teams) who insure themselves against the loss of player services by
offering wage premiums” (2017). They choose the latter argument and hypothesize that long-
term agreements result in a premium awarded to players as a risk aversion strategy meant to
“protect [teams] against market volatility,” or the unpredictable yearly inflation of baseball’s
labor market, and “potential inability to replace a key player on the open market” (Walters et al.,
2017). In reviewing data collected from contracts signed in the free agency periods of 2012 to
2014, their models reveal two key findings: First, the free agency market reveals lower long-term
BARRIERS TO BASEBALL
11
contract premiums awarded to younger players engaging in these long-term deals, as their lack of
experience demands less of a premium. Remaining consistent with Scully’s research, their
second finding confirms the continued significance of performance and on-field production in
generating higher salaries—in this case, their measurements of marginal revenue product are
measured by Wins Above Replacement (WAR) which we will also employ in our research, and
their research even finds that “larger-market teams pay more for each win produced…” (Walters
et al., 2017). Their most crucial conclusion, however, is the confirmation of their hypothesis that
teams generally pay a premium on long-term deals in an apparent risk aversion strategy intended
to:
“…secure the continuing employment of players that they evidently regard as
keys to their success… We suspect that such payments, which are comparable to
those seen in external markets for insurance against disabling injury to players,
are seen by teams as reasonable internal insurance premiums that protect them
against volatility in product and labor markets, risks associated with arbitration-
determined salaries, and inability to replace a player by hiring on the open
market” (Walters et al., 2017).
Based on the empirical evidence provided by Walters, von Allmen, and Krautmann in a more
recent period, we may enter our collection and analysis with a reasonable expectation of
continued premiums on contracts and a significant impact of performance on salary.
Beyond performance and contract length, Matthew Palmer and Randall King address the
role of racial/ethnic discrimination in wages and modify the research of previous writers to
explore wage discrimination in different classes of salaries, dividing players’ salaries in the 2001
season into classes of high, middle, and low (2006). In this extended analysis, they find that
wage discrimination not widespread in Major League Baseball but is present in the lowest salary
group of players making $2M or less, claiming that “baseball fans are more interested in having
a winning team than they are in having a white team” (Palmer & King, 2006). In previous
research, this is the only finding of significance surrounding wage discrimination based on race.
However, one other research piece finds that racial discrimination may be present on the
field, impacting the player performance metrics found to be significant in all the aforementioned
BARRIERS TO BASEBALL
12
research. Looking at racial discrimination from the perspective of the relationship between
umpires and pitchers, the authors of this piece explore how discrimination in this relationship
could affect pitching outcomes and thus affect the productivity measures that play into wage
negotiations. They find that “pitchers who match race/ethnicity of the home-plate umpire appear
to receive slightly favorable treatment, as indicated by a higher probability that a pitch is called a
strike…” (Parsons et al., 2011). Interestingly, we see the likelihood of this discrimination
decrease in instances where umpires are under more scrutiny: times when more fans occupy the
stadium, pitches that are “terminal” and would end the at bat, or in stadiums where umpire
evaluation software is in place. More important than the findings of mid-game discrimination is
how this favoritism may affect performance and thus influence performance metrics. “Indirect
effects on players’ strategies may, however, have larger impacts on the outcomes of plate
appearances and games,” meaning that the pitcher or batter who is aware of this preferential
treatment may alter their strategy or approach to an at-bat (Parsons et al., 2011). These
alterations may have numerous outcomes to affect performance, either positively or negatively.
For example, a pitcher with a racial match to the umpire may throw more pitches close to the
strike zone which consequently could increase their strikes and strikeouts (a positive outcome)
but could also leave more pitches for batters to put into play (a negative outcome). While we see
in multiple studies that performance acts as a leading factor in determining salary for players,
this additional research implies that we may want to be wary when approaching analysis
regarding these performance measures, considering discriminatory practices could affect on-field
performance and, subsequently, these metrics. We believe in the empirical evidence that such
discrimination exists, yet we contest that it is not widespread enough to significantly skew these
performances measures and, in turn, impact the significance (or lack of it) in our data analysis.
METHODOLOGY
Between the free agency periods of 2017 to 2023, Major League Baseball and the 30
teams therein signed 425 free agents to total contracts of $5,000,000 USD or more. We selected
this period as it encompasses the years from the signing of the most recent collective bargaining
agreement (CBA) between the Major League Baseball Players Association (MLBPA) and the
completion of the most recent free agency period as of the time of data collection, controlling for
BARRIERS TO BASEBALL
13
any change in team attitudes toward negotiations that may have arisen as a result of CBA terms.
Additionally, the 425 free agents captured here only evaluate those who officially entered free
agency and thus does not include the contracts of drafted players, minor league contracts,
contract extensions, or executed options.
Data on these 425 free agents was sourced from Cot’s Baseball Contracts, a repository of
contract information produced and distributed by Baseball Prospectus, and numerous qualitative
and quantitative variables were selected for analysis as factors that could act as determinants of
negotiated contract value. The following variables were selected and included from the data
available on Cot’s Baseball Contracts (Baseball Prospectus, 2024):
Total Contract Value: Negotiated pay for a free agent’s contract over the term of the
contract, not including incentives or options included in the deal. Additionally used to
define parameters for data collection in which only players with a Total Contract Value
of over $5,000,000 USD are included.
Average Annual Value: Total contract value divided by the number of years of the
contract. It is important to note that not all contracts pay a consistent amount from year to
year, but Average Annual Value is commonly used to explain contract value in sports
media.
Team: Team the player signs with.
Position: Primary position of the player.
Age: Age of the player at the time of contract’s signing.
Years: Number of years of the contract’s term, not including any player or team options.
Initial Year: First year of the contract’s term.
Player Agent: Agent or agency representing the player in negotiations.
To supplement the basic information provided by Cot’s Baseball Contracts, we
additionally include variables that identify players based on their ethnicity and country of origin.
In order to maintain consistency with prior research that finds performance as a significant
indicator of wage, measures of Wins Above Replacement (WAR) are also included to introduce
a modern performance metric into the models; we address WAR in both the year preceding
contract negotiations as well as throughout the career of the player prior to signing the new
BARRIERS TO BASEBALL
14
contract (Scully, 1974). Wins Above Replacement is a recently developed model that may vary
between sources but “want[s] to know how much better a player is than a player that typically
available to replace that player… comparing the player to average in a variety of venues…”
(Baseball-Reference.com WAR Explained, n.d.). While statistical measures of WAR may vary,
two main producers of WAR metrics exist which are generally accepted in the baseball
community: Baseball Reference and FanGraphs. Information built into the upcoming variables of
WAR, Country of Origin, and Ethnicity, are all sourced from Baseball Reference, an online
database of baseball statistics and player information (Baseball Reference, 2024). The variables
used are as follows:
Country of Origin: Birth country of the player.
Ethnicity: Ethnic background of the player, categorized into White, Latino, African-
American, Asian/Pacific Islander, and Mixed.
Prior Year WAR: Statistical measure of WAR as developed by Baseball Reference for the
season preceding free agency (colloquially referred to in the baseball community as
bWAR).
Career WAR: Total WAR according to Baseball Reference of the player throughout their
career prior to contract signing.
Appendix A provides an example of the structure of the data analyzed, reporting the
aforementioned variables for each player and contract signed in the defined period. Such tables
were compiled for all thirty teams individually, for each of the six MLB divisions (AL East, NL
East, AL Central, NL Central, AL West, NL West), and for the MLB in totality.
Appendixes B and C provide summary statistics for the data collected, displaying these
statistics for both Total Contract Value and Average Annual Value. For Total Contract Value, we
see the largest contract in the designated period coming from Aaron Judge with a value of
$400M, but overall, data shows an average contract value of about $33.5M, a median value of
$15M, and a standard deviation of $52.9M. For Average Annual Value, Judge similarly claims
the highest AAV on a contract with about $44.4M per year. In the period, the average AAV on a
contract was about $11.2M with a median of $8.7M and a standard deviation of $7.2M.
BARRIERS TO BASEBALL
15
Each of the variables present were then categorized as continuous or qualitative, with
Total Contract Value, Average Annual Value, Age, Initial Year, Prior Year WAR, and Career
WAR deemed continuous variables while Team, Position, Player Agent, Country of Origin, and
Ethnicity being categorized as qualitative variables. Continuous variables could be left in the
data table unaltered, but qualitative variables were altered to render them useful in creating
regression models and determining significance of the variables. As such, the data analysis here
employs the use of dummy variables to transform the qualitative variables into binary variables
that correspond with a certain qualitative value.
For the Team variable, each division’s data tables contained five columns to represent
each team; a 1 in that column signified that the player of that row had signed with that team
while a 0 indicated that the player of that row had signed with another team in the division. This
allows the model to identify significant differences between teams in divisions who generally
exist in the same geographical vicinity of the United States (and Canada for the AL East).
The binary effect imposed on the Position variable separates players into pitchers and
position players with a 1 identifying the player as a position player (1B, 2B, 3B, SS, OF, or C)
and a 0 being a pitcher (SP or RP). This distinction was made to identify if there existed a
discrepancy between contract value for position players and pitchers with the potential for future
in-depth analysis of statistical differences between positions possible.
For the Player Agent variable, players are categorized based on whether they are
represented by a top six sports agency or by another party. Brett Knight at Forbes identifies
CAA, Wasserman, WME Sports, Excel Sports Management, Octagon, and Boras Corporation as
the top six most valuable sports agencies (Knight, 2022). Players represented by these agencies
are denoted with a 1, and those not represented by one of the six are given a 0. The basis of this
distinction was to identify if larger sports agencies make a significant difference in bargaining
for higher contracts for their clients as compared to other agents.
The Country of Origin variable seeks to explore statistical differences that may arise in
the value of contracts between players born inside or outside of the United States. In Major
League Baseball, the public generally thinks of non-US born players originating from Latin
American countries such as the Dominican Republic and Cuba, but in the period of 2017 to
2023, we see players coming from thirteen countries outside the United States, including the
aforementioned countries as well as others such as the Netherlands, Australia, and Japan. In this
BARRIERS TO BASEBALL
16
category, findings of statistical significance, or implied discrimination, would signify the need
for further analysis of difference between these individual countries, but in our initial analysis,
players born in the United States are represented by a 0 while those born outside are denoted
with a 1.
Similarly, the Ethnicity variable seeks to identify potential discrimination in contract
negotiations based on ethnic background. For the purposes of our analysis, players are separated
into White and Non-White in order to identify if there exists discrimination between these
categories of players. White players are denoted with a 0 while those of alternate ethnic
backgrounds are denoted with a 1.
1. General Models and Expectations
After the collection of data and identification of variables, we analyzed the data using
multiple regression analysis to build a variety of models around the factors that may influence
contract negotiations and subsequent contract values. Players were compiled into seven datasets:
six based on division (AL East, NL East, AL Central, NL Central, AL West, NL West) and one
comprising of all 425 free agency signings in the designated term. Each compiled set of data was
subsequently divided into tables that formatted for analysis of the designated variables against
measures of contract value; as such, statistical significance in the resulting regression statistics
would imply that said variable carries weight in determining contract value in free agency
negotiations. Within each segmented dataset, two separate tables were created to compare the
designated variables against Total Contract Value and Average Annual Value. In this way, the
fourteen possible regressions created would produce equations that could be used to model the
value of contracts signed in the free agency period in terms of both total contract value as well as
average annual value.
Using the master database of all free agents as an example, regressions against Total
Contract Value used the following variables: Division (separated as AL East, NL East, AL
Central, NL Central, AL West, NL West), Position, Age, Initial Year, Player Agent, Country of
Origin, Ethnicity, Prior Year WAR, and Career War. In these regressions of Total Contract
Value category, the Years variable was eliminated from the analysis, as the number of years on a
contract would intuitively increase the total contract value. Equation 1 below presents the
BARRIERS TO BASEBALL
17
resulting equation that models Total Contract Value of all free agency contracts from 2017 to
2023 as a function of the aforementioned variables:
(1) Y
TCV
= α + β
1
(X
ALE
) + β
2
(X
NLE
)+ β
3
(X
ALC
) + β
4
(X
NLC
) + β
5
(X
ALW
) + β
6
(X
NLW
) + β
7
(X
P
) +
β
8
(X
A
) + β
9
(X
IY
) + β
10
(X
PA
) + β
11
(X
CoO
) + β
12
(X
E
) + β
13
(X
PWAR
) + β
14
(X
CWAR
)
The second segmented dataset sought to compare the designated variables against
Average Annual Value. Analyzing Average Annual Value as a function of the designated
variables standardizes player salaries, controlling for the increased contract value that comes
with long-term deals and instead viewing contract value in the context of seasonal value. Our
analysis through this lens aligns strongly with the previous research conducted by Walters, von
Allman, and Krautmann which finds that contract value may significantly increase in long-term
deals, reflecting a risk management strategy by teams seeking to protect against rising market
costs and inflation (2017). The variables used in this instance are Years, Division, Position, Age,
Initial Year, Player Agent, Country of Origin, Ethnicity, Prior Year WAR, and Career WAR.
Using the master database as an example, Equation 2 yielded by this model is as follows:
(2) Y
AAV
= α + β
1
(X
Y
) + β
1
(X
ALE
) + β
2
(X
NLE
)+ β
3
(X
ALC
) + β
4
(X
NLC
) + β
5
(X
ALW
) + β
6
(X
NLW
) +
β
7
(X
P
) + β
8
(X
A
) + β
9
(X
IY
) + β
10
(X
PA
) + β
11
(X
CoO
) + β
12
(X
E
) + β
13
(X
PWAR
) + β
14
(X
CWAR
)
Lastly, a regression analysis was conducted to analyze differences that may exist between
contracts awarded by team. Within baseball, teams are often categorized by market size, and an
analysis of individual team contracts would help identify and spur subsequent research on the
correlation that may exist between market size and contract value. Seemingly, in the most recent
years, the largest markets housing teams such as the Los Angeles Dodgers, New York Yankees,
and Philadelphia Phillies have witnessed some of the largest free agency “superstar” signings
while smaller markets like Milwaukee or Kansas City typically acquire free agents on smaller
contracts (Ozanian & Teitelbaum, 2023). Equations 3 and 4 below describe this analysis with a
variable included for each of the thirty teams in the MLB:
(3) Y
TCV
= α + β
1
(X
Y
) + β
2
(X
ARI
) + β
3
(X
ATL
) + β
4
(X
BAL
) ++ β
30
(X
WAS
)
BARRIERS TO BASEBALL
18
(4) Y
AAV
= α + β
1
(X
Y
) + β
2
(X
ARI
) + β
3
(X
ATL
) + β
4
(X
BAL
) ++ β
30
(X
WAS
)
Through Equations 1 through 4, we may identify the high impact factors of free agency
negotiations in recent years and may clarify conversations surrounding these recent free agency
cycles both in sports publications as well as on social media platforms. However, our expectation
is that not all of these variables will prove significant following a regression analysis. While
numerous conversations in the media have focused on the diversity of Major League Baseball,
we see no guarantee that Country of Origin nor Ethnicity will prove significant within a macro-
level analysis; however, further research after segmenting players into classes of contract value
may reveal continued wage discrimination targeted at non-White players in smaller contracts as
discovered previously by Palmer and King (2006). Similarly, we enter analysis with no
reasonable expectation of significance of the Player Agent, Position, Division, and Team
variables. However, we believe the possibility exists that the Player Agent variable positively
correlates with contract value, as major agencies like the Boras Corporation have the resources
available to bargain for higher wages and often sign premier players that demand higher contract
values.
Those that we do expect significance from include Age, Initial Year, Years, Prior Year
WAR, and Career WAR. For Age, we expect that contract values will decline as players increase
in age. Not only will Total Contract Value decline as teams sign older players to less years but
also both measures of contract value would decline with teams viewing older age as correlated to
lower productivity, higher risk of injury, and a higher probability of retirement during the period
of the contract. Initial Year is viewed as variable that will adjust for inflation and yearly growth
in the free agent market as team revenues increase. When compared to Average Annual Value,
we believe that Years will prove significant due to two main factors: larger overall contracts and
risk management strategies. On the player and agent side of the negotiation, long-term contracts
are generally provided to premier players that demand a higher AAV, and on the team side,
empirical evidence exists to suggest teams pay higher wages on longer-term deals as a risk
management strategy meant to avoid the need for further free agency negotiations in subsequent
years (Walters et al., 2017). Lastly, we believe the variables of Prior Year WAR and Career
WAR will continue to prove significant as performance measures impacting contract value,
BARRIERS TO BASEBALL
19
finding the same conclusion between performance and pay as Scully almost fifty years later
(1974).
2. Interdivision Subsample Analysis
Within Major League Baseball exists six divisions that segment teams into leagues and
regional categories of East, Central, and West. As a part of our analysis, we also conduct an
interdivision subsample analysis in which we substitute individual teams within a division for the
previously used divisions. Doing so helps control for variation among regions, as more value is
concentrated among coastal teams that lie in the East and West divisions as compared to teams
from the Central divisions like the Kansas City Royals, Cincinnati Reds, and Milwaukee
Brewers who lie in the bottom quartile of market size (Ozanian & Teitelbaum, 2017).
Additionally, it allows us to identify differences in free agency approaches among teams within
divisions that are in direct competition with each other for a guaranteed playoff spot throughout
the season. Using the American League East as an example, Equations 5 and 6 below show the
model of this analysis:
(5) Y
TCV
= α + β
1
(X
NYY
) + β
2
(X
TOR
) + β
3
(X
BOS
) + β
4
(X
TBR
) + β
5
(X
BAL
)
+ β
6
(X
P
) + β
7
(X
A
)
+ β
8
(X
IY
)
+ β
9
(X
PA
) + β
10
(X
CoO
) + β
11
(X
E
)
+ β
12
(X
PWAR
)
+ β
13
(X
CWAR
)
(6) Y
AAV
= α + β
1
(X
Y
) + β
2
(X
NYY
) + β
3
(X
TOR
) + β
4
(X
BOS
) + β
5
(X
TBR
)
+ β
6
(X
BAL
)
+ β
7
(X
P
) + β
8
(X
A
)
+ β
9
(X
IY
) + β
10
(X
PA
) + β
11
(X
CoO
) + β
12
(X
E
)
+ β
13
(X
PWAR
)
+ β
14
(X
CWAR
)
3. Contextualizing Models for Individual Variables
We find it necessary as well to evaluate variables individually to analyze their impact on
contract value. While larger models may more comprehensively describe the variation in values
present during these years of free agency, those based on single variables may provide further
insight into the reasoning behind any found variation.
Traditional economic theory in labor markets posits that wages are not linearly increasing or
decreasing variable, but rather, there exists a possible age at which earnings are
maximized/minimized. In the scope of Major League Baseball, this exact theory can be
described by the idea of a player’s “prime” in which they are thought to be operating at their
maximum potential. Thus, we may view the free agency market within the MLB as a traditional
BARRIERS TO BASEBALL
20
labor market in which salaries may be maximized when contracts are signed at a certain age
age 26 according to FanGraphs’ age curve that uses ΔWRC+ to model production
increase/decline (Gutwein, 2021). As such, we introduce the Age
2
variable into our discussion to
analyze whether such a relationship exists in this labor market; a significant finding on this
variable would indicate that the age variable acts logarithmically when interacting with the
dependent contract value. Equations 7 and 8 below model this relationship and analysis:
(7) Y
AAV
= α + β
1
(X
A
) + β
2
(X
A
2
)
(8) Y
TCV
= α + β
1
(X
A
) + β
2
(X
A
2
)
While identified previously that the Initial Year variable will likely prove significant as a
variable that adjusts our models for inflation and year-over-year growth in contract values, there
exists the possibility that this variable may not act linearly. From year to year, significant events
may occur that cause one year’s free agency to experience a more significant value change due to
events such as record free agent signings or the creation of a new regional television broadcast
deal. As such, another model in this analysis seeks to identify the specific variation in contract
values by year. To do so, we split the Initial Year variable into each year specifically, as shown
in Equations 9 and 10 below:
(9) Y
TCV
= α + β
1
(X
2017
) + β
2
(X
2018
)
+ β
3
(X
2019
) + β
4
(X
2020
) + β
5
(X
2021
) + β
6
(X
2022
) + β
7
(X
2023
)
(10) Y
AAV
= α + β
1
(X
2017
) + β
2
(X
2018
)
+ β
3
(X
2019
) + β
4
(X
2020
) + β
5
(X
2021
) + β
6
(X
2022
) +
β
7
(X
2023
)
Next, we seek to identify how accurately pure performance measures may affect contract
negotiations and contract values and, in doing so, also seek to identify if recency bias in
performance, measured in the Prior Year WAR variable, significantly impacts value as compared
to lifetime performance measured in Career WAR. Equations 11 and 12 below demonstrate and
model this relationship between performance and contract values:
(11) Y
TCV
= α + β
1
(X
PWAR
) + β
2
(X
CWAR
)
(12) Y
AAV
= α + β
1
(X
PWAR
) + β
2
(X
CWAR
)
BARRIERS TO BASEBALL
21
In contrast to a model that proposes performance as the sole metric that influences and causes
variation in contract value among free agents, we also explore the relationship between purely
qualitative factors and contract value, including only the variables of Age, Position, Player
Agent, Ethnicity, and Country of Origin. These are seen as the only variables which tie solely to
the player and eliminate variation that may stem from performance or contract-specific factors.
Equations 13 and 14 below show this model:
(13) Y
TCV
= α + β
1
(X
A
) + β
2
(X
P
)
+ β
3
(X
PA
) + β
4
(X
CoO
) + β
5
(X
E
)
(14) Y
AAV
= α + β
1
(X
A
) + β
2
(X
P
)
+ β
3
(X
PA
) + β
4
(X
CoO
) + β
5
(X
E
)
Lastly, as a part of our analysis of demographic factors’ influence, we separate the Ethnicity
and Country of Origin variables to examine their influence on contract values. As elaborated
previously in our review of the existing literature and conversations surrounding baseball, racial
discrimination continues to be embedded in the culture of Major League Baseball whether
through in-game decisions or exploitation in the academies of Latin American countries. As
such, we seek to identify any significance that may arise between White and Non-White players
as well as those born inside the United States versus those born outside, and in doing so, we may
identify if racial bias acts as a significant determining factor in contract negotiations. Equations
15 and 16 below model this relationship:
(15) Y
TCV
= α + β
1
(X
CoO
) + β
2
(X
E
)
(16) Y
AAV
= α + β
1
(X
CoO
) + β
2
(X
E
)
4. Approach to Analysis
After conducting linear regressions based on each of the equations elaborated above, we will
subsequently analyze the statistics generated with a specific focus on the p-value of each
independent variable, the adjusted r-squared for the model, and the generated coefficient. While
all statistics will be taken into account and contextualized, only those variables with a p-value of
less than 0.05 will be accepted as statistically significant, and the generated variable and its
coefficient will be inserted into the resulting model. Additionally, the adjusted R-squared of the
BARRIERS TO BASEBALL
22
entire model will be considered in understanding how the resulting equation accounts for overall
variability in the dependent variables of either Total Contract Value or Average Annual Value.
An adjusted R-squared over 0.5 is generally accepted in research, but we will discretionarily
accept models as reliable when both the R-squared is high and multiple independent variables are
significant. Since the contextualizing models for individual variables break our broader variable
categories into their specific subsets, we can take the adjust R-squared values for these models to
indicate what individual values are significant within those variables and how much variation
they account for when fed back into the overall model. After analysis, we will have a
comprehensive summary of tables to show the results of these regressions and the equations that
can be used to model free agency contracts in the period of 2017 to 2023 as well as model them
moving forward.
RESULTS
Main Models:
Table 1: Divisionally-Stratified MLB Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(14,410)
31.43
Model
6.16E+17
14
4.40E+16
Probability > F
0.000
Residual
5.74E+17
410
1.40E+15
R-Squared
0.5177
Adj. R-Squared
0.5012
Total
1.19E+18
424
2.81E+15
Root MSE
3.70E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
AL East
-0.417
AL Central
-0.537
AL West
-0.457
NL East
-0.495
NL Central
-0.424
NL West
-0.012
Position
-0.008
Age
-0.342
Initial Year
0.097
Player Agent
0.007
Country of Origin
-0.057
Ethnicity
-0.032
Prior Year WAR
0.519
Career WAR
0.201
Constant
-
BARRIERS TO BASEBALL
23
In a simple linear regression of all contracts including all divisions and selected factors,
we see statistical significance for all divisions except the National League West as well as
significance in Age, Initial Year, Prior Year WAR, and Career WAR at the p < 0.05 level. The
Position and Age variables are both assigned negative coefficients to show that an increase in
Age would reduce Total Contract Value, and a designation of the binary 1, signifying a position
player, would similarly decrease Total Contract Value; all other significant variables outside of
the divisions have a positive coefficient attached. Notably, all of the statistically significant
divisions have a negative coefficient attached, and with the NL West being omitted from
significance, this signifies that the NL West overpays free agents relative to the rest of the
league. Additionally, our adjusted R-squared indicates that this resulting model accounts for
50.12% of the variation in Total Contract Value in the designated period, and though only
slightly above 50%, the model includes multiple significant explanatory variables. As such, we
accept this regression as a reliable foundational basis that may be used to model free agency
contracts in recent years.
Table 2: Divisionally-Stratified MLB Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(15,409)
31.43
Model
1.46E+16
15
9.74E+14
Probability > F
0
Residual
7.20E+15
409
1.76E+13
R-Squared
0.6697
Adj. R-Squared
0.6576
Total
2.18E+16
424
5.14E+13
Root MSE
4.20E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Years
157383.3
0.000
AL East
3122466
0.334
AL Central
3114220
0.235
AL West
3128011
0.255
NL East
3107668
0.124
NL Central
3122264
0.343
NL West
561273.1
0.244
Position
452197.6
0.000
Age
97123.63
0.000
Initial Year
100650.1
0.002
Player Agent
458377.4
0.145
Country of Origin
674086.4
0.465
Ethnicity
628230.5
0.231
Prior Year WAR
168401
0.000
Career WAR
19481.44
0.000
Constant
2.03E+08
0.002
BARRIERS TO BASEBALL
24
Our regression analysis of Average Annual Value against those same factors selected in
Table 1 shows statistical significance at the p < 0.05 level for Years, Position, Age, Initial Year,
Prior Year WAR, and Career WAR. The significance of the Years variable and its positive
coefficient align with our aforementioned expectations that an increase in years on a contract
would increase Average Annual Value, and similarly, the significance of Age with a negative
coefficient aligns with our expectation of a decline in value as a player ages. Position is
significant as well with a negative coefficient, implying that position players across the league
are underpaid relative to pitchers. Important to note also when comparing the resulting
regressions of Table 1 and Table 2 is the difference in adjusted R-squared; Table 2 demonstrates
a higher R-squared of 0.6697 and adjusted R-squared of .6576. Therefore, our resulting model
for Table 2 captures 65.76% of the variation in Average Annual Value captured by the dependent
variables. While we deemed Table 1 to be a reliable explanation of Total Contract Value, this
higher R-squared implies an even more reliable model stemming from Table 2.
Interdivision Subsample Analysis:
Table 3: AL East Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
66
F(12,53)
14.58
Model
2.29E+17
12
1.91E+16
Probability > F
0
Residual
6.93E+16
53
1.31E+15
R-Squared
0.7676
Adj. R-Squared
0.7149
Total
2.98E+17
65
4.58E+15
Root MSE
3.60E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
New York Yankees
1.90E+07
0.101
Toronto Blue Jays
1.85E+07
0.127
Boston Red Sox
1.80E+07
0.106
Tampa Bay Rays
2.05E+07
0.909
Baltimore Orioles
-
-
Position
1.09E+07
0.238
Age
1991042
0.014
Initial Year
2511460
0.011
Player Agent
9800074
0.575
Country of Origin
1.66E+07
0.036
Ethnicity
1.46E+07
0.171
Prior Year WAR
2834078
0.000
Career WAR
486841.8
0.810
Constant
5.08E+09
0.012
BARRIERS TO BASEBALL
25
Table 4: AL East Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
66
F(13,52)
14.58
Model
3.06E+15
12
2.35E+14
Probability > F
0
Residual
7.68E+14
53
1.48E+13
R-Squared
0.7993
Adj. R-Squared
0.7491
Total
3.83E+15
65
5.89E+13
Root MSE
3.80E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
New York Yankees
2072991
0.771
Toronto Blue Jays
2027944
0.457
Boston Red Sox
1957162
0.609
Tampa Bay Rays
2176727
0.836
Baltimore Orioles
-
-
Years
469983.1
0.000
Position
1169318
0.364
Age
225775.3
0.190
Initial Year
275527.9
0.132
Player Agent
1069363
0.845
Country of Origin
1830454
0.743
Ethnicity
1556321
0.659
Prior Year WAR
433946.5
0.019
Career WAR
51967.88
0.008
Constant
5.56E+08
0.136
Table 5: AL Central Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
62
F(12,49)
5.11
Model
4.24E+16
12
3.53E+15
Probability > F
0
Residual
3.39E+16
49
6.91E+14
R-Squared
0.5557
Adj. R-Squared
0.4469
Total
7.62E+16
61
1.25E+15
Root MSE
2.60E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Chicago White Sox
1.20E+07
0.878
Minnesota Twins
1.16E+07
0.760
Detroit Tigers
1.37E+07
0.898
Kansas City Royals
-
-
Cle. Guardians
1.57E+07
0.982
Position
7911428
0.318
Age
1559696
0.002
Initial Year
1856806
0.148
Player Agent
8120868
0.399
Country of Origin
1.20E+07
0.244
Ethnicity
1.23E+07
0.250
Prior Year WAR
2845817
0.000
Career WAR
349239
0.031
Constant
3.75E+09
0.159
BARRIERS TO BASEBALL
26
Table 6: AL Central Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
62
F(13,48)
13.17
Model
1.95E+15
13
1.50E+14
Probability > F
0
Residual
5.48E+14
48
1.14E+13
R-Squared
0.7811
Adj. R-Squared
0.7218
Total
2.50E+15
61
4.10E+13
Root MSE
3.40E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Chicago White Sox
1540809
0.279
Minnesota Twins
1509354
0.015
Detroit Tigers
1763307
0.774
Kansas City Royals
-
-
Cle. Guardians
2033197
0.092
Years
448906.2
0.000
Position
1027923
0.590
Age
216570.4
0.288
Initial Year
241795.3
0.247
Player Agent
1063772
0.926
Country of Origin
1556042
0.092
Ethnicity
1587943
0.008
Prior Year WAR
391539.5
0.000
Career WAR
45091.23
0.002
Constant
4.88E+08
0.252
Table 7: AL West Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
73
F(12.60)
5.31
Model
1.08E+17
12
8.97E+15
Probability > F
0
Residual
1.01E+17
60
1.69E+15
R-Squared
0.5148
Adj. R-Squared
0.4178
Total
2.09E+17
72
2.90E+15
Root MSE
4.1E+0.7
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Houston Astros
2.16E+07
0.930
Texas Rangers
1.96E+07
0.466
Los Angeles Angels
1.98E+07
0.569
Seattle Mariners
-
-
Oakland Athletics
2.21E+07
0.591
Position
1.18E+07
0.862
Age
3018446
0.001
Initial Year
2351172
0.128
Player Agent
1.27E+07
0.761
Country of Origin
1.62E+07
0.177
Ethnicity
1.59E+07
0.351
Prior Year WAR
3113925
0.000
Career WAR
483117.5
0.001
Constant
4.74E+09
0.145
BARRIERS TO BASEBALL
27
Table 8: AL West Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
73
F(13,59)
18.52
Model
2.94E+15
13
2.26E+14
Probability > F
0
Residual
7.21E+14
59
1.22E+13
R-Squared
0.8032
Adj. R-Squared
0.7598
Total
3.66E+15
72
5.09E+13
Root MSE
3.50E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Houston Astros
1846758
0.728
Texas Rangers
1669521
0.268
Los Angeles Angels
1699349
0.427
Seattle Mariners
-
-
Oakland Athletics
1880445
0.491
Years
370492.4
0.000
Position
1002549
0.007
Age
281659.7
0.016
Initial Year
202132.4
0.022
Player Agent
1084153
0.833
Country of Origin
1391750
0.419
Ethnicity
1352851
0.330
Prior Year WAR
310470.4
0.071
Career WAR
42086.1
0.000
Constant
4.07E+08
0.025
Table 9: NL East Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
96
F(12,83)
7.74
Model
1.64E+17
12
1.36E+16
Probability > F
0
Residual
1.46E+17
83
1.76E+15
R-Squared
0.5279
Adj. R-Squared
0.4597
Total
3.10E+17
95
3.26E+15
Root MSE
4.20E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
New York Mets
1.69E+07
0.525
Phi. Phillies
1.72E+07
0.230
Atlanta Braves
1.79E+07
0.798
Was. Nationals
1.86E+07
0.904
Miami Marlins
-
-
Position
9862302
0.938
Age
1810988
0.000
Initial Year
2217057
0.182
Player Agent
1.05E+07
0.438
Country of Origin
1.47E+07
0.713
Ethnicity
1.38E+07
0.265
Prior Year WAR
3866044
0.000
Career WAR
462145.2
0.088
Constant
4.48E+09
0.199
BARRIERS TO BASEBALL
28
Table 10: NL East Regression for Average Annual Value
Sum of Squares
Mean Square
Number of
Observations
96
F(12,83)
4.20E+15
3.23E+14
Probability > F
1.64E+15
2.01E+13
R-Squared
Adj. R-Squared
5.84E+15
6.15E+13
Root MSE
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
New York Mets
1820368
0.181
Phi. Phillies
1840101
0.059
Atlanta Braves
1913164
0.180
Was. Nationals
1981620
0.999
Miami Marlins
-
-
Years
320872.8
0.005
Position
1060505
0.008
Age
205405.9
0.000
Initial Year
238851.3
0.528
Player Agent
1120678
0.225
Country of Origin
1576969
0.378
Ethnicity
1473739
0.226
Prior Year WAR
464121
0.006
Career WAR
49329.21
0.000
Constant
4.83E+08
0.564
Table 11: NL Central Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
55
F(12,42)
4.84
Model
3.60E+16
12
3.00E+15
Probability > F
0.0001
Residual
2.60E+16
42
6.19E+14
R-Squared
0.5802
Adj. R-Squared
0.4603
Total
6.20E+16
54
1.15E+15
Root MSE
2.50E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Chicago Cubs
1.23E+07
0.427
St. Louis Cardinals
1.39E+07
0.298
Mil. Brewers
1.34E+07
0.089
Cincinnati Reds
-
-
Pittsburgh Pirates
1.61E+07
0.913
Position
9079029
0.779
Age
1626627
0.005
Initial Year
1943849
0.955
Player Agent
7795283
0.666
Country of Origin
1.12E+07
0.451
Ethnicity
1.04E+07
0.657
Prior Year WAR
3112909
0.000
Career WAR
471247
0.303
Constant
3.92E+09
0.924
BARRIERS TO BASEBALL
29
Table 12: NL Central Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
55
F(13,41)
7.67
Model
1.12E+15
13
8.59E+13
Probability > F
0
Residual
4.60E+14
41
1.12E+13
R-Squared
0.7085
Adj. R-Squared
0.6161
Total
1.58E+15
54
2.92E+13
Root MSE
3.30E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Chicago Cubs
1656761
0.985
St. Louis Cardinals
1903211
0.807
Mil. Brewers
1886100
0.737
Cincinnati Reds
-
-
Pittsburgh Pirates
2164040
0.398
Years
482984.6
0.000
Position
1225326
0.284
Age
241642
0.023
Initial Year
263752.3
0.462
Player Agent
1083409
0.051
Country of Origin
1516584
0.495
Ethnicity
1396239
0.676
Prior Year WAR
526742.3
0.136
Career WAR
63431.41
0.026
Constant
5.33E+08
0.485
Table 13: NL West Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
73
F(13,59)
8.96
Model
2.67E+15
13
2.05E+14
Probability > F
0
Residual
1.28E+15
59
2.17E+13
R-Squared
0.6419
Adj. R-Squared
0.5703
Total
3.95E+15
72
5.48E+13
Root MSE
3.60E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
San Diego Padres
1.59E+07
0.491
LA Dodgers
1.50E+07
0.518
SF Giants
1.45E+07
0.150
Colorado Rockies
-
-
Ari. Diamondbacks
1.83E+07
0.185
Position
9492147
0.216
Age
2137630
0.000
Initial Year
2217348
0.781
Player Agent
9909155
0.073
Country of Origin
1.78E+07
0.895
Ethnicity
1.40E+07
0.883
Prior Year WAR
3321040
0.000
Career WAR
380753.2
0.049
Constant
4.47E+09
0.840
BARRIERS TO BASEBALL
30
Table 14: NL West Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
73
F(13,59)
9.45
Model
2.67E+15
13
2.05E+14
Probability > F
0
Residual
1.28E+15
59
2.17E+13
R-Squared
0.6756
Adj. R-Squared
0.6041
Total
3.95E+15
72
5.48E+13
Root MSE
4.70E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
San Diego Padres
2108278
0.056
LA Dodgers
1940196
0.184
SF Giants
1893291
0.925
Colorado Rockies
-
-
Ari. Diamondbacks
2371737
0.102
Years
448331.3
0.007
Position
1235738
0.294
Age
308929.1
0.006
Initial Year
287961.7
0.090
Player Agent
1293991
0.522
Country of Origin
2284773
0.611
Ethnicity
1800241
0.657
Prior Year WAR
494754.4
0.010
Career WAR
48943.45
0.002
Constant
5.82E+08
0.101
Our interdivision subsample analysis, conducted through simple linear regressions,
analyzes the effects of individual teams and our previously selected factors on Total Contract
Value and Average Annual Value. Within these regressions, we see multiple consistencies in
statistically significant factors at the p < 0.05 level, indicating similarities in negotiations from
region to region.
For those regressions addressing Total Contract Value as a function of the selected
factors, all six divisions saw Age and Prior Year WAR as statistically significant with negative
coefficient for the Age variable and a positive assigned to Prior Year WAR, indicating that an
increase in Prior Year WAR would increase Total Contract Value while an increase in Age
would result in a value decline. Along with the performance metric of Prior Year WAR, Career
WAR is also significant in four out of the six division, with the AL East and NL Central being
the only two exceptions. The AL East is notable additionally, as it sees statistical significance for
the Initial Year in its regression, and the division also stands above the rest with an adjusted R-
squared almost 0.15 higher than any other division; the model generated for the AL East
accounts for 71.49% of the variation in contract value of signings within the division while the
NL West model is the next best with 57.03% accounted for. Otherwise, the remining four
BARRIERS TO BASEBALL
31
divisions have adjusted R-squareds below 0.5, indicating weakness in their models in accounting
for the variability of Total Contract Value in recent years.
Simple linear regressions to model Average Annual Value all experienced statistical
significance with the newly included Years variable as well as the Career WAR variable at the p
> 0.05 level, and all but the NL Central additionally included Prior Year WAR as a statistically
significant variable in analysis. All of these variables couple with a positive coefficient to imply
that their increase would correspond with an increase in yearly contract value: more years on the
contract would increase average annual value and better performance as measured in both WAR
statistics would raise that value as well. Another variable displaying significance in multiple
divisions is that of Age which shows to be significant in the AL West, NL East, NL Central, and
NL West with a negative coefficient, just as it does in those regressions for Total Contract Value.
One outlier among the regressions is the AL West in which six variables are significant at the p <
0.05 level, with those not mentioned yet being Position with a negative coefficient and Initial
Year with a positive one. Additionally, the AL Central has Ethnicity as a statistically significant
variable with a negative coefficient assigned. While these models vary more greatly than those
for Total Contract Value in which variables prove significant, the average adjusted R-squared for
all regressions of Average Annual Value is higher than that of the Total Contract Value
regressions. The adjusted R-squared implies that 75.98% of the variation in Average Annual
Value is captured by the NL West, 74.91% for the AL East, 72.18% for the AL Central, 67.39%
for the NL East, 61.61% for the NL Central, and 60.41% for the NL West. As such, we may
view all of these divisional models as reliable foundational pieces for analysis of Average
Annual Value in contracts.
Lastly, it is important to note that in each regression, one team’s statistics have been
omitted due to collinearity, likely with another team present in the divisionthose omitted teams
are the following: the Baltimore Orioles, the Kansas City Royals, the Miami Marlins, the
Cincinnati Reds, and the Colorado Rockies. However, seeing as these teams are likely colinear
with another team in the model and no other team displays statistical significance, we cannot
reasonably expect any significance to be missing in the missing statistics.
BARRIERS TO BASEBALL
32
MLB Team Analysis:
Table 15: MLB-Wide Team-Based Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
# of
Observations
425
F(29,395)
1.33
Model
1.06E+17
29
3.66E+15
Probability > F
0.1185
Residual
1.08E+18
395
2.74E+15
R-Squared
0.0893
Adj. R-Squared
0.0224
Total
1.19E+18
424
2.81E+15
Root MSE
5.20E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Ari. Diamondbacks
2.91E+07
0.719
Atlanta Braves
2.47E+07
0.754
Baltimore Orioles
3.02E+07
0.693
Boston Red Sox
2.45E+07
0.506
Chicago Cubs
2.42E+07
0.368
Cincinnati Reds
3.02E+07
0.566
Cle. Guardians
3.02E+07
0.709
Colorado Rockies
2.66E+07
0.275
Chicago White Sox
2.49E+07
0.499
Detroit Tigers
2.70E+07
0.499
Houston Astros
2.51E+07
0.604
Kansas City Royals
2.76E+07
0.896
Los Angeles Angels
2.45E+07
0.435
LA Dodgers
2.45E+07
0.156
Miami Marlins
2.76E+07
0.703
Mil. Brewers
2.62E+07
0.757
Minnesota Twins
2.44E+07
0.374
New York Mets
2.35E+07
0.237
New York Yankees
2.49E+07
0.012
Oakland Athletics
2.66E+07
0.996
Phi. Phillies
2.40E+07
0.024
Pittsburgh Pirates
-
-
San Diego Padres
2.56E+07
0.025
Seattle Mariners
3.02E+07
0.557
SF Giants
2.41E+07
0.616
St. Louis Cardinals
2.70E+07
0.428
Tampa Bay Rays
2.91E+07
0.831
Texas Rangers
2.42E+07
0.120
Toronto Blue Jays
2.49E+07
0.339
Was. Nationals
2.49E+07
0.343
Constant
2.14E+07
0.647
BARRIERS TO BASEBALL
33
Table 16: MLB-Wide Team-Based Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(29, 395)
1.71
Model
2.43E+15
29
8.39E+13
Probability > F
0.0138
Residual
1.94E+16
395
4.91E+13
R-Squared
0.1115
Adj. R-Squared
0.0463
Total
2.18E+16
424
5.14E+13
Root MSE
7.00E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Arizona
Diamondbacks
1,321,627 3897047 0.340 0.735 0.023
Atlanta Braves
3302040
0.102
Baltimore Orioles
4044157
0.438
Boston Red Sox
3280245
0.239
Chicago Cubs
3242539
0.236
Cincinnati Reds
4044157
0.385
Cle. Guardians
4044157
0.283
Colorado Rockies
3555012
0.203
Chicago White Sox
3326231
0.170
Detroit Tigers
3617204
0.384
Houston Astros
3353238
0.120
Kansas City Royals
3691793
0.539
Los Angeles Angels
3280245
0.135
LA Dodgers
3280245
0.002
Miami Marlins
3691793
0.496
Mil. Brewers
3502343
0.372
Minnesota Twins
3260504
0.041
New York Mets
3141579
0.012
New York Yankees
3326231
0.004
Oakland Athletics
3555012
0.920
Phi. Phillies
3211058
0.005
Pittsburgh Pirates
-
-
San Diego Padres
3417937
0.173
Seattle Mariners
4044157
0.408
SF Giants
3226121
0.122
St. Louis Cardinals
3617204
0.237
Tampa Bay Rays
3897047
0.528
Texas Rangers
3242539
0.051
Toronto Blue Jays
3326231
0.074
Was. Nationals
3326231
0.238
Constant
2859651
0.037
In an analysis of all individual teams’ impact on Total Contract Value and Average
Annual Value, two teams stand out as being statistically significant at the p < 0.05 level in
predicting both measures of contract value: the New York Yankees and the Philadelphia Phillies.
However, in looking at those teams whose inclusion may reliably predict Total Contract Value,
we also view the San Diego Padres as significant, while we can also do so for the Los Angeles
Dodgers, New York Mets, and Minnesota Twins in regards to Average Annual Value. Assigned
to all significant variables are positive coefficients, characterizing this significance as an instance
of the team overpaying relative to their competitors in the league. Overall, both regressions have
BARRIERS TO BASEBALL
34
a low adjusted R-squared, showing a low reliability attributable to these models. The adjusted R-
squared metric shows that the model created for Total Contract Value only captures 2.24% of the
variation, and that of the Average Annual Value regression captures only 4.63%. As such,
neither of these models are reliable in predicting contract values. However, this spurs an idea for
future research in which the researchers may substitute the thirty teams for divisions in our
general model in order to develop a more comprehensive comparison of pay between teams
when also interacting with quantitative and qualitative variables. We may assume that such a
model would prove more reliable, as this would result in a twenty-four variable increase and thus
an increase in the models degrees of freedom.
Age Variable Analysis:
Table 17: Age-Based Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
39.95
Model
1.89E+17
2
9.47E+16
Probability > F
0
Residual
1.00E+18
422
2.37E+15
R-Squared
0.1592
Adj. R-Squared
0.1552
Total
1.19E+18
424
2.81E+15
Root MSE
4.90E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Age
1.33E+07
0.000
Age
2
197468.1
0.000
Constant
2.24E+08
0.000
Table 18: Age-Based Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
23.3
Model
2.17E+15
2
1.08E+15
Probability > F
0
Residual
1.96E+16
422
4.66E+13
R-Squared
0.0994
Adj. R-Squared
0.0952
Total
2.18E+16
424
5.14E+13
Root MSE
6.80E+06
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Age
1867841
0.000
Age
2
27671.86
0.000
Constant
3.14E+07
0.000
BARRIERS TO BASEBALL
35
In both models, we see no significant difference between the effect of the previously
utilized Age variable versus the Age
2
variable, and neither model shows a high R-squared value
that implies effectiveness of the subsequently created model. As such, we see no reason to revise
previous models to include Age
2
in place of Age even when considering the age curve that may
exist in baseball in regards to players’ primes.
Year-by-Year Value Change:
Table 19: Year-Based Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(6,418)
1.53
Model
2.56E+16
6
4.27E+15
Probability > F
0.166
Residual
1.16E+18
418
2.79E+15
R-Squared
0.0215
Adj. R-Squared
0.0075
Total
1.19E+18
424
2.81E+15
Root MSE
5.30E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
2017
-
-
2018
1.05E+07
0.757
2019
1.03E+07
0.288
2020
1.03E+07
0.149
2021
1.02E+07
0.894
2022
9448032
0.095
2023
9270589
0.040
Constant
7318414
0.002
Table 20: Year-Based Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(6,418)
1.98
Model
6.02E+14
6
1.00E+14
Probability > F
0.0675
Residual
2.12E+16
418
5.07E+13
R-Squared
0.0276
Adj. R-Squared
0.0137
Total
2.18E+16
424
5.14E+13
Root MSE
7.10E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
2017
-
-
2018
1418251
0.633
2019
1397029
0.265
2020
1390424
0.219
2021
1377847
0.799
2022
1275307
0.059
2023
1251356
0.033
Constant
987848.7
0.000
BARRIERS TO BASEBALL
36
The purpose of a breakdown of the Initial Year variable into each individual year
included in our free agency analysis was to see if any significant changes in value occurred from
year to year. Statistical significance would imply that the Initial Year variable did not act linearly
but instead saw distinct changes in value based on the year of the negotiated contract. Both the
regressions for Total Contract Value and Average Annual Value have 2023 as a statistically
significant variable with 2022 being a significant factor for Total Contract Value as well, but
each model also displays an extremely low adjusted R-squared around 1% which implies that
little to no variation in contract values is captured by the breakdown of the Initial Year variables
into different years. Based on these results, we can conclude that a breakdown of the Initial Year
variable in broader models is not necessarily, but rather, we can accept Initial Year as a linear
variable.
Performance Impact Analysis:
Table 21: Performance-Based Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
139.85
Model
4.74E+17
2
2.37E+17
Probability > F
0
Residual
7.16E+17
422
1.70E+15
R-Squared
0.3986
Adj. R-Squared
0.3957
Total
1.19E+18
424
2.81E+15
Root MSE
4.10E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Prior Year WAR
1342895
0.000
Career WAR
158806.8
0.347
Constant
3421796
0.197
__________
Table 22: Performance-Based Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
189.14
Model
1.03E+16
2
5.16E+15
Probability > F
0
Residual
1.15E+16
422
2.73E+13
R-Squared
0.4727
Adj. R-Squared
0.4702
Total
2.18E+16
424
5.14E+13
Root MSE
5.20E+06
Average Annual
Value
Coefficient Standard Error t-value p-value Beta
Prior Year WAR
170264.4
0.000
Career WAR
20134.96
0.000
Constant
433846.2
0.000
BARRIERS TO BASEBALL
37
In regressions analyzing the relationship of the performance-based variables of Prior
Year WAR and Career WAR, we see a statistically significant relationship between Prior Year
WAR and Total Contract Value at the p < 0.05 level. However, the model does also have a
relatively lower adjusted R-squared of only 0.3957 to show that only 39.57% of the variation in
this Total Contract Value is captured by the model. In a regression comparing these metrics
against Average Annual Value, we see statistical significance for Prior Year WAR as well as
Career WAR. The model carries a similarly low but more robust adjusted R-squared of 0.4702
which indicates that it accounts for 47.02% of the variation, close to the generally accepted 0.5.
Our findings in these models indicate that in previous regressions analyzing these performance-
based metrics among other variables against contract value measures, Prior Year WAR may be
more impactful in determining this value. This assumption is confirmed when looking at the
values of Prior Year WAR’s coefficients in both regressions—the coefficient in both models is
larger than that of Career WAR. This translates into the practical application that a one point
increase in WAR in the year preceding contract negotiations is more impactful than a one point
increase in Career WAR when determining contract value. Additionally, we can understand Prior
Year WAR as being more impactful as it is factored into the model twice, being accounted for in
both variables.
Categorical Factor Analysis:
Table 23: Demographic Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(5,419)
12.99
Model
1.60E+17
5
3.19E+16
Probability > F
0
Residual
1.03E+18
419
2.46E+15
R-Squared
0.1342
Adj. R-Squared
0.1238
Total
1.19E+18
424
2.81E+15
Root MSE
5.00E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Age
898296.3
0.000
Position
5169506
0.036
Player Agent
5232929
0.018
Country of Origin
7859039
0.060
Ethnicity
7304886
0.232
Constant
3.00E+07
0.000
BARRIERS TO BASEBALL
38
Table 24: Demographic Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(5,419)
7.21
Model
1.73E+15
5
3.46E+14
Probability > F
0
Residual
2.01E+16
419
4.79E+13
R-Squared
0.0793
Adj. R-Squared
0.0683
Total
2.18E+16
424
5.14E+13
Root MSE
6.90E+06
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Age
125429.9
0.002
Position
721822.3
0.473
Player Agent
730678.2
0.000
Country of Origin
1097364
0.518
Ethnicity
1019987
0.533
Constant
4185236
0.000
Table 25: Ethnic Regression for Total Contract Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
2.38
Model
1.32E+16
2
6.62E+15
Probability > F
0.0942
Residual
1.18E+18
422
2.79E+15
R-Squared
0.0111
Adj. R-Squared
0.0064
Total
1.19E+18
424
2.81E+15
Root MSE
5.30E+07
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Country of Origin
8332915
0.046
Ethnicity
7523738
0.043
Constant
3252134
0.000
Table 26: Ethnic Regression for Average Annual Value
Source Sum of Squares
Degrees of
Freedom
Mean Square
Number of
Observations
425
F(2,422)
0.66
Model
6.77E+13
2
3.38E+13
Probability > F
0.519
Residual
2.17E+16
422
5.15E+13
R-Squared
0.0031
Adj. R-Squared
-0.0016
Total
2.18E+16
424
5.14E+13
Root MSE
7.20E+06
Total Contract
Value
Coefficient Standard Error t-value p-value Beta
Country of Origin
1132880
0.377
Ethnicity
1022870
0.254
Constant
442135.5
0.000
We see multiple statistically significant variables when we segment data into different
categories of demographic factors, yet in all generated models, we see very low adjusted R-
BARRIERS TO BASEBALL
39
squared statistics which indicate very low reliability in accounting for any variation present in
contract values. Our “Demographic Regression” remains consistent with prior models that find
age to be statistically significant with an inverse relationship with the dependent variables of
Total Contract Value and Average Annual Value, but in both models, we also see Player Agent
as significant at the p < 0.05 level. It is notable as well that Position is significant in the
regression for Average Annual Value with a positive coefficient, meaning this model says
position players are overpaid relative to pitchers.
In our “Ethnic Regression” that focuses solely on factors that could explain racial/ethnic
discrimination in contract negotiations, we see both the Ethnicity and Country of Origin
variables as being significant, yet in a regression against Average Annual Value, we see no
significance in any explored variable. Interestingly, the coefficients of those variables in the
Total Contract Value regression seem incongruous, as the negative coefficient of Ethnicity
suggests lower contract value for non-White players but the positive coefficient of Country of
Origin suggests higher values for those born outside of the United States. This may provide an
avenue for further research; however, we provide little focus on this finding based on the low R-
squared statistics of these models.
EXPLANATION OF FINDINGS
After conducting extensive data analysis encompassing our broad models as well as
analysis on selected factors, our next step is to apply our findings within the scope of today’s
Major League Baseball landscape in order to understand what our models imply and how they
can be applied in free agency negotiations in the 2024 offseason and beyond. Based on the
reliability of our models as represented by R-squared, we focus our subsequent analysis of
findings on Tables 1 through 14, encompassing the divisionally-stratified MLB regressions and
interdivision subsamples that displayed the highest reliability in our research; in our discussion,
we will address only these tables and will exclude Tables 15 through 26, unless otherwise
specified. Our overall findings remain consistent with previous literature exploring baseball’s
labor market with two overarching themes: wages in Major League Baseball are positively
correlated with and determined by on-field performance and teams will pay a premium for
BARRIERS TO BASEBALL
40
security in long-term contracts. However, our research expands upon the previous literature to
introduce new factors to model potential contracts in future negotiations including a yearly
adjustment in wages for inflation and an understanding of timing free agency relative to age,
among others.
While multiple variables show prominence in determining contract value throughout our
models, numerous factors prove to be insignificant in this determination. Beginning with the
Player Agent variable, no table sees this variable as significant at the p < 0.05 level to imply a
substantial impact of agent on negotiations (only in regressions for the NL Central and NL West
would it prove significant at the p < 0.10 level). Remember that the Player Agent variable
divided players into categories of those represented by a top-six sports agency versus those
represented by a firm outside of that top six. One question entering research was whether an
agent like Scott Boras could significantly impact contract value, either positively or negatively.
One can understand how the backing of a top-six agency and their greater resources could lead to
increased bargaining power in negotiations and a more lucrative contract; on the other hand, one
could also see how teams may develop a distaste for negotiations with these agencies for the
increased pressure they bring to the bargaining table. Our findings decisively answer this
question, showing that the choice of agent or agency is not a factor in contract negotiations.
A major original focus of our research additionally was to additionally explore any racial
or ethnic discrimination that may exist in Major League Baseball in the form of wage
discrimination. Previous literature elaborates on in-game discriminatory practices as well as
exploitative measures taken in Latin American countries by MLB teams, warranting such a facet
in our analysis, yet few tables provide prominent evidence that Ethnicity or Country of Origin
significantly impact contract value in free agency negotiations. The only tables in which we see
either of these variables being statistically significant is the Ethnicity variable in relation to
Average Annual Value in the AL Central and the Country of Origin variable in relation to Total
Contract Value in the AL East. As such, we can conclude that wage discrimination is not broadly
present in Major League Baseball. However, agents and their clients of multicultural
backgrounds may want to wary when approaching negotiations with teams in the AL East and
AL Central. The evidence shown here and the negative coefficients attached to significant
variables imply that teams in the AL East significantly underpay players born outside of the
United States over the course of a full contract; further research may confirm if this adverse
BARRIERS TO BASEBALL
41
defrayal results from shorter contracts or a lower collective value. In the AL Central, teams
underpay non-White players in Average Annual Value. While these findings should not
discourage agents with non-White/non-American from approaching teams in these respective
divisions, they do provide an understanding that offers from these teams may be lower, and as
such, they should not be the first choice divisions for players falling into these categories.
Our last explanatory variable demonstrating sporadic significance is that of Position
which segments players into pitchers and position players. Three regressions with significance
for Position demonstrate an inverse relationship between a position player and Average Annual
Value: MLB-wide, in the AL West, and in the NL East. Specifically, this translates to pitchers in
the two aforementioned divisions garnering higher annual salaries when compared to position
players, but in showing significance in our broader league analysis, it also exhibits a general
inclination to provide higher annual salaries to pitchers across the league, supported by the
evidence that four out of fourteen models pair Position with a negative coefficient. League-wide,
being a position player results in an annual value decrease of about $1.8M, while this decrease is
more significant in the AL West and NL East at $2.8M and $2.9M, respectively. This difference
may be due to the pitchers more frequently signing short-term deals with higher annual values
due to the unpredictable nature of their role on a yearly basis and the higher injury risk inherent
in the position. For example, in the AL West, the largest single year contract signed in the
designated period for a pitcher was a one-year, $25M deal signed by Justin Verlander with the
Houston Astros, while the largest single-year deal for a position player was Carlos Beltran’s
$16M contract with the Astros. However, we may also see this discrepancy as teams placing a
premium on pitching and its value in today’s game which has seen elite pitching rise to
prominence and has witnessed a decline in hitting. In the perspective of the free agency market,
this translates to a more advantageous market for pitchers, especially in the AL West and NL
East. As such, agents and pitchers in future markets would be able to continually demand higher
average salaries relative to position players and would be wise to consider these two divisions in
free agency. Similarly, position players can enter free agency with an understanding that they
may garner a lower AAV relative to pitchers and may want to avoid contracts with the teams of
the NL East and AL West if they seek a higher annual income.
In discussing the Average Annual Value of free agency contracts, we see evidence
confirming our hypothesis that there exists a positive relationship between the number of years
BARRIERS TO BASEBALL
42
on a contract and the AAV housed in said contract, the implication of this finding being that
longer-term contracts integrate higher annual contract values. While previous literature has
explored and established this concept, our research modifies this finding to insert it into a model
that quantifies the variable’s impact among other factors in contract negotiations. We see across
the MLB that a one-year increase in a contract increases the AAV by $1.5M, and when dividing
the data among divisions, the value increase ranges from a $0.9M increase in the NL East to a
$2.4M increase in the opposite league’s East division, the AL East. Reiterating some of the
points established by Walters, von Allmen, and Krautmann, we agree that this substantial
increase in average value in longer-term contracts likely stems from a risk aversion strategy by
teams seeking to avoid turnover in positions and to reduce the need to enter free agency
negotiations more often, but we contend as well that the higher AAV in long-term contracts is
additionally driven by a desire to lock star players into a roster for years and combat the yearly
rise in free agent contracts. Additionally, it may reflect a desire by teams to build loyalty among
a fanbase to players that drive revenues through jersey sales, ticket sales, and more. Our
collected data shows that those players contracted to nine years or more constitute some of the
largest contracts in terms of AAV, encompassing top players such as Aaron Judge, Corey
Seager, and Bryce Harper. These findings thus have three major implications in the approach of
players, especially those in the upper echelon of the MLB ranks, when entering free agency and
evaluating long versus short-term deals. One, players expecting or being offered longer contracts
must expect and demand a higher AAV to receive their fair value, and two, these players may
want to demand an even greater value expansion on long-term deals above our calculated $1.5M
in order to combat teams’ approach of circumventing inflation and future market increases.
Conversely, players approaching free agency expecting short-term contracts must understand
that they will be underpaid annually relative to players on long-term deals and may take one of
three paths: negotiate for a higher AAV to combat this discrepancy, negotiate for more years to
naturally capture a greater AAV, or accept a lower AAV on a short-term deal and hope to
capture greater value in subsequent free agencies due to inflation and an inflated market.
That third possible path leads to our second variable of significance: Initial Year. Prior to
analysis, we expected a positive correlation between both measures of contract value and the
Initial Year variable, as we believed the variable’s presence would capture inflation and as well
as yearly increases in the amount of money spent by teams in free agency. While significant in
BARRIERS TO BASEBALL
43
models for Total Contract Value in the AL East and Average Annual Value in the AL West,
Tables 1 and 2 are most important in our analysis of this yearly adjustment. The coefficients
paired with the significant variable in each of these models indicate that across the league, a free
agency occurring one year later will translate to a $2.5M increase in Total Contract Value as well
as a $315K increase in Average Annual Value. As a player, this simply translates to an
understanding that contract values overall increase from year to year, and players should expect
to receive a relatively higher contract offer compared to the free agent signings of the prior
offseason.
Though only significant in Table 1, another important finding is the significance of
almost all divisions in our MLB-wide regression for Total Contract Value. We see all divisions
but that of the NL West displaying statistical significance with a negative coefficient, indicating
that these divisions underpay players relative to those in the NL West. While not empirically
evident, we could assume that this finding is driven by the recent and continued spending habits
of teams like the Los Angeles Dodgers and San Diego Padres who have spent large amounts of
money in free agency in the pursuit of a World Series title (notably, an expansion of data in
future research or an extension to this research could prove even more impactful with the
inclusion of Shohei Ohtani’s signing with the Dodgers for $700M in the 2023-2024 offseason).
However, another possible explanation is that NL West teams are more inclined to offer long-
term contracts that naturally carry a higher total value; in either case, this may adjust an agent’s
approach to first consider NL West teams or be more willing to negotiate with the division’s
teams in order to garner a higher contract value.
Following the extensive data analysis above, however, our most significant finding, one
which remains consistent with the findings of previous research, rests in the indication that
performance remains a significant and prevailing factor in determining contract value. In MLB-
wide regressions for Total Contract Value and Average Annual Value, we see that a single point
increase in Career WAR leads to a $795K increase in total value as well as a $230K increase in
annual value, while Prior Year WAR is even more impactful with a similar one point increase
leading to increases of $17.3M overall and $1.2M per year. Similarly, the Prior Year WAR
variable proves significant in all but one divisionally-stratified models with its one point increase
ranging from a floor of $11M and $571K in total and annual value, respectively, to a ceiling of
$26M and $1.7M. Career WAR is shown to also be a significant determinant in four models for
BARRIERS TO BASEBALL
44
Total Contract Value (AL Central, AL West, NL East, and NL West) with a one point increase
ranging from $766K in extra value to $1.7M in the AL West; for Average Annual Value, these
increases range from $143K to $353K, with significance in all regressions. Additionally, we see
that our only contextualizing model that approaches the 0.5 threshold for an adjusted R-squared
value are Tables 21 and 22—for example, Table 22 shows that with all else equal, the Prior
WAR and Career WAR variables can explain about 47% of the variation in Average Annual
Value, solidifying our assumption that these performance measures are impactful factors in
contract negotiations. As such, we can determine that not only is performance continually
influential in contract negotiations, but there exists recency bias in awarding larger contracts to
players performing well in the season prior to their free agency. Rationalizing this finding, we
propose that teams may be negotiating under many situations in which this Prior Year WAR is a
more important indicator of future productivity than Career WAR. While the baseball
community theorizes that players already strive for greater performance in the year preceding
free agency, these findings serve to solidify this prevailing sentiment that higher quality
performance in the year prior to free agency will create momentum and capture higher value.
Through our explanation, we have proposed numerous ways to practically apply the
findings of our research, but while our research provides modern empirical evidence to explain
MLB free agency, it is not meant to be used as a tool to determine exact contract values and
player worth. The purpose of this research and the resulting models is not to define what a player
should make per season or in their total contract but, rather, to provide a foundational tool to
establish relative value when entering free agency and thus define a reasonable expectation
surrounding contract value. Similarly, our research contains limitations that may be expanded
upon in future research to improve the reliability of our models and more comprehensively
define these contract values.
One apparent deficiency of our research is the exclusion of contract extensions where a
player does not enter free agency but agrees to new contract terms beyond the years designated
in their current contract. Recent years have seen extremely large contract extensions coming
from superstar players like Mike Trout and Ronald Acuña which, if included in our dataset, may
produce altered results. Similarly, our dataset fails to include contracts generated through
arbitration, a contract dispute process between teams and players similar to typical arbitration
processes. In Major League Baseball arbitration, players between three and six years of service
BARRIERS TO BASEBALL
45
time can opt to enter arbitration with their teams, and after both the team and player argue for a
one-year contract value, an independent arbitrator determines the assigned contract value to the
player for a one-year deal in the upcoming season (Tracy, 2023). Once again, the exclusion of
these players who, for example, numbered nineteen in the offseason leading to the 2023 season
limits our findings to only explain contract value for those players who enter the free agency
market (Tracy, 2023).
Numerous other factors also exist that cannot be cohesively measured or reported yet
may significantly affect a player in their negotiations. Attitude is a major factor that cannot be
quantified yet is certainly considered in contract negotiations, as it may extend to a player’s off-
the-field actions, their interactions with fans, their relationship with the team’s manager or
coaching staff, their camaraderie with teammates, and their communications with media. None
of these actions are quantifiably linked to individual performance metrics as captured in WAR
nor are they captured in the qualitative variables in our research. The aforementioned fanbase
relationship may also prove important in contracts, as teams may offer a premium in value or
years to those seen as franchise players or fan favorites. However, we would not view this
approach as a team unjustifiably increasing contract value, as this strategic approach may lead to
greater long-term revenues from ticket sales, jersey sales, and more. As such, we may also posit
that the revenue attributable to a player may significantly impact contract negotiations. While his
on-field performance indeed justifies one of the top contracts by value, a player like Shohei
Ohtani would be seen as even more valuable to a team because of his extreme presence in the
country of Japan, which could draw more fans to a team, and his presence as a leader in jersey
sales. Based on this argument, we propose that further research could use jersey sales as a factor
to represent popularity among fans in future research.
Lastly, we propose that future research could explore differences in the significant factors
in contract negotiations among various periods of baseball history, comparing our designated
six-year stretch to the prior six years of free agency for example. The findings of this proposed
research may discover trends emerging in free agency market from year to year, such as a
emergence of new factors or differences in the value increase/decrease derived from significant
variables. Doing so would assist in more effectively identifying the trajectory of the free agent
market for subsequent offseasons.
BARRIERS TO BASEBALL
46
CONCLUSION
This paper comprehensively tested and identified the factors that impact contract
negotiations in Major League Baseball with an aim to identify factors that significantly influence
teams’ decisions in bargaining and consequently provide a tool for players entering free agency.
Previous literature initiated with the analysis of Gerald Scully identified one factor as
consistently impactful in the free agency negotiations: performance. Beyond that, however, other
literature suggested that long-term contracts would yield higher annual salaries, and similar
literature denied the presence of racism and wage discrimination in the game. Through our
research, we will not dispute the lack of such discrimination, and similarly, we accept the
identification of performance metrics (using WAR) and the years on a contract as being
significant indicators of contract value. Through our findings, we see that performance, as
measured in WAR, is a consistently significant indicator of contract value with an increase in
either Prior Year WAR or Career WAR generally resulting in an increase in both Total Contract
Value and Average Annual Value. However, our research expands upon these findings and
modifies the previous research, presenting further explanation for significant variables as well as
denying the existence of other possible variables in impacting contract negotiations.
While performance is consistently a significant indicator of contract value, we find that
Prior Year WAR is more impactful in determining both aspects of contract value than Career
WAR, introducing recency bias into free agent negotiations not measured before in previous
research. We may understand this finding through the example of an older player in their career:
though they may have performed consistently well over the course of their career, their recent
performance may have been lacking, resulting in a lower salary as they enter free agency. In
totality, this means that players must exhibit a higher performance in the year preceding their
free agency negotiations if they want to receive a loftier contract after the conclusion of the
season.
Additionally, we modify the previous literature to include the variable of Initial Year
which corrects for inflation and the yearly increase on contract values in free agency. Beyond the
typical indicators of value like performance, we may also assume that contract values will
naturally rise from year to year as teams earn a higher income and inflation renders previous
contracts less valuable.
BARRIERS TO BASEBALL
47
What is most important in our research is the practical application of these findings. The
purpose of using data from 2017 to 2023 was to analyze the modern landscape of contract
negotiations and free agency in baseball rather than take a wholesale historical view. As
elaborated previously, further research could focus on this historical view and how trends have
emerged in free agency negotiations over time, but in taking this approach of analyzing the
modern game, we believe the analysis provided can be rendered as an effective tool for players
and agents prior to entering free agency as a measure of relative value. Understanding impactful
factors such as how their recent performance will influence the value they are expected to
receive in a negotiated contract may help them better understand where they stand in the free
agent market, providing three new avenues for negotiation. One, players may use this approach
to acknowledge the value they deserve and not pass on contracts that provide fair value or
potentially overpayment. Two, players may use this tool to reject contract propositions that
represent an underpayment relative to recent free agent contracts, or lastly, they may use this to
understand their fair value yet leverage outside factors to boost their contract value. As salary as
a percentage of revenues dwindles in today’s league, all of these approaches and the data housed
within our findings return power to the players in negotiations in the hopes of creating a more
equitable labor market in baseball’s future.
BARRIERS TO BASEBALL
48
APPENDIX
A:
YANKEES FREE AGENT SIGNINGS 2017-2023
Player
Contract
Value
Position Age Years
Initial
Year
AAV
Player
Agent
Country
of Origin
Ethnicity
Prior
Year
WAR
Career
WAR (at
time of
signing)
Total
$1,229,800,000
Judge,
Aaron
$400,000,000
OF
31
9
2023
$40,000,000
PSI Spts
United
States
African-
American
10.6
37.0
Cole,
Gerrit
$324,000,000
SP
29
9
2020
$36,000,000
Boras
Corp.
United
States
White
6.6
22.8
Rodon,
Carlos
$162,000,000
SP
30
6
2023
$27,000,000
Boras
Corp.
United
States
Latino
5.4
16.6
LeMahieu,
D.J.
$90,000,000
2B
32
6
2021
$15,000,000
Excel
United
States
White
3.0
25.0
Rizzo,
Anthony
$40,000,000
1B
33
2
2023
$20,000,000
Sports One
United
States
White
2.3
39.1
Britton,
Zack
$39,000,000
RP
31
3
2019
$13,000,000
Boras
Corp.
United
States
White
0.3
11.1
Happ, J.A.
$34,000,000
SP
36
2
2019
$17,000,000
Rogers
Sports
United
States
White
1.0
18.4
Rizzo,
Anthony
$32,000,000
1B
32
2
2022
$16,000,000
Sports One
United
States
White
0.5
36.8
Ottavino,
Adam
$24,000,000
RP
33
3
2019
$8,000,000
All Bases
Covered
United
States
White
2.4
8.4
LeMahieu,
DJ
$24,000,000
2B
30
2
2019
$12,000,000
Excel
United
States
White
2.9
16.4
Gardner,
Brett
$12,500,000
OF
36
1
2020
$12,500,000
Pro Star
United
States
White
4.3
42.6
Kahnle,
Tommy
$11,500,000
RP
33
2
2023
$5,750,000
Ballengee
United
States
White
0.3
2.3
Kluber
Corey
$11,000,000
SP
35
1
2021
$11,000,000
Jet Sports
United
States
White
0.1
32.5
Sabathia,
CC
$8,000,000
SP
38
1
2019
$8,000,000
CAA / Roc
Nation
United
States
African-
American
1.8
61.4
Gardner,
Brett
$7,500,000
OF
35
1
2019
$7,500,000
Pro Star
United
States
White
3.3
38.3
Gardner,
Brett
$5,150,000
OF
37
1
2021
$2,525,000
Pro Star
United
States
White
0.7
43.3
Wilson,
Justin
$5,150,000
RP
33
1
2021
$2,525,000
ACES
United
States
White
0.2
7.3
B:
C:
Number of
Observations
Mean Median Max Min Standard Deviation
Contract Value
(Total)
425 $33,481,157 $15,000,000 $400,000,000 $5,000,000 $52,909,801
MLB FREE AGENT SIGNINGS 2017-2023
Number of
Observations
Mean Median Max Min Standard Deviation
Contract Value
(AAV)
425 $11,233,875 $8,666,667 $44,444,444 $2,500,000 $7,164,179
MLB FREE AGENT SIGNINGS 2017-2023
BARRIERS TO BASEBALL
49
Bibliography
Baseball Prospectus. (2024). Cot’s Baseball Contracts.
https://legacy.baseballprospectus.com/compensation/cots/
Baseball Reference. (2024). https://www.baseball-reference.com/
Baseball-Reference.com WAR Explained. Baseball Reference. (n.d.). https://www.baseball-
reference.com/about/war_explained.shtml
Blum, R. (2024, February 29). MLB average salary rose 7% to record $4.5m last year but
several teams cutting payroll in 2024. AP News. https://apnews.com/article/mlb-average-
salary-d7df2745aee8dd3dd59bf7a7c747841b
Breaking the color line: 1940 to 1946. Library of Congress. (n.d.).
https://www.loc.gov/collections/jackie-robinson-baseball/articles-and-essays/baseball-the-
color-line-and-jackie-robinson/1940-to-1946/
Clark, Ian. “Marginal Revenue Product of Labour.” Atlas of Public Management, 3 May 2019,
www.atlas101.ca/pm/concepts/marginal-revenue-product-of-labour/.
Gutwein, C. (2021, October 4). Checking in on the aging curve. FanGraphs.
https://blogs.fangraphs.com/checking-in-on-the-aging-curve/
Gwartney, J., & Haworth, C. (1974b). Employer costs and discrimination: The case of baseball.
Journal of Political Economy, 82(4), 873–881. https://doi.org/10.1086/260241
Kahn, L. M. (1991). Discrimination in professional sports: a survey of the literature. Industrial
and Labor Relations Review, 44(3), 395–418. https://doi.org/10.2307/2524152
Kahn, L. M. (2000). The sports business as a labor market laboratory. Journal of Economic
Perspectives, 14(3), 75–94. https://doi.org/10.1257/jep.14.3.75
Knight, B. (2022, November 17). The most valuable sports agencies 2022: The rich get richer
amid a wave of consolidation. Forbes.
https://www.forbes.com/sites/brettknight/2022/11/17/the-most-valuable-sports-agencies-
2022-the-rich-get-richer-amid-a-wave-of-consolidation/?sh=1d933f55e560
Lapchick, R. (2023, June 15). MLB must continue improving racial, gender hiring practices.
ESPN. https://www.espn.com/mlb/story/_/id/37841932/mlb-continue-improving-racial-
gender-hiring-practices
Marcano, A. J., & Fidler, D. P. (1999). The globalization of baseball: Major League Baseball and
the mistreatment of Latin American baseball talent. Indiana Journal of Global Legal
Studies, 6(2), 511–577.
BARRIERS TO BASEBALL
50
NBC Sports Staff. (2023, November 29). MLB free agency 2023-24: Start Date, largest
contracts, history, team payrolls and more. NBC Sports.
https://www.nbcsports.com/mlb/news/mlb-free-agency-2023-24-start-date-largest-
contracts-history-team-payrolls-and-more
Ozanian, M., & Teitelbaum, J. (2023, March 23). Baseball’s most valuable teams 2023: Price
tags are up 12% despite regional TV woes. Forbes.
https://www.forbes.com/sites/mikeozanian/2023/03/23/baseballs-most-valuable-teams-
2023-price-tags-are-up-12-despite-regional-tv-woes/?sh=7947249e6501
Palmer, M. C., & King, R. H. (2006). Has salary discrimination really disappeared from Major
League Baseball? Eastern Economic Journal, 32(2), 285–297.
http://www.jstor.org/stable/40326272
Parsons, C. A., Sulaeman, J., Yates, M. C., & Hamermesh, D. S. (2011). Strike Three:
Discrimination, incentives, and evaluation. American Economic Review, 101(4), 1410–
1435. https://doi.org/10.1257/aer.101.4.1410
Scully, G. W. (1974). Pay and Performance in Major League Baseball. The American Economic
Review, 64(6), 915–930. https://www.jstor.org/stable/1815242
Tracy, J. (2023, February 22). The uncomfortable reality of MLB arbitration. Axios.
https://www.axios.com/2023/02/22/mlb-baseball-arbitration-uncomfortable-reality
Walters, S. J. K., von Allmen, P., & Krautmann, A. (2017). Risk aversion and wages: Evidence
from the baseball labor market. Atlantic Economic Journal, 45(3), 385–397.
https://doi.org/10.1007/s11293-017-9545-7