Database Administration

Auditing data types and column sizes in SQL Server

Adding a few ounces of sanity to your database schema today can save you from a metric ton of trouble and confusion later.

Daniel Miller

14 Jul 2025 • 6 min read

In enterprise software, it's easy for a small decision to go unnoticed — until it creates a big problem. If a long period of time elapses between a small (but poor) decision, and a consequence that leads to large, systemic issues, then troubleshooting can be surprisingly difficult and time-consuming. This is especially true if the staff on your team has changed during that period of time.

Database performance optimization isn't trivial, but there are simple choices that can help to decrease the potential for future problems. One such choice is the data type and maximum character length defined for the columns in a database table that store text. Ultimately, this is a decision about provisioning, and if you are about to embark upon a long and arduous journey, then provisions matter, right?

Shift iQ is a large enterprise system, and the SQL Server database that drives the platform contains over 4,000 columns in more than 200 tables. Of these, over half (2,168 columns, to be exact) store text data. As you can imagine, the ever-growing volume of data introduces constant pressure on performance, maintainability, and cost.

Therefore, periodically throughout the year, I audit the database column types and sizes, and then identify specific, actionable opportunities to:

improve query performance,
decrease disk-space usage,
reduce network load,
ensure clean and consistent naming conventions, and
confirm the alignment between the deployed implementation and the supporting technical documentation.

A small database doesn't need this level of care and attention, but you can expect significant improvements to the overall performance of a large database if you haven't done this in a while. In the rest of the article here I'll outline some of the reasons why.

Overuse of varchar(max) and nvarchar(max)

The data types varchar(max) and nvarchar(max) are designed for unlimited-length text. These types are convenient, but they come with limitations and performance penalties that are easy to under-estimate:

1. Degraded query performance

When a varchar(max) value exceeds 8,000 bytes (or a nvarchar(max) value exceeds 4,000), SQL Server stores the data in LOB (Large Object) storage. This introduces extra I/O operations and removes the data from the main structure of the database record, leading to slower query execution.

2. Constrained indexing

You cannot index varchar(max) columns, and this limits the optimizer's ability to generate efficient query plans. This becomes a bottleneck as your application grows.

3. Inefficient storage

SQL Server must allocate additional space to accommodate these types, even if the full capacity isn't used. When you don't need unlimited text, this leads to wasted storage and increased costs, especially when dealing with backups and restores.

4. Added pressure on memory

Queries involving large text fields require more memory buffers, which can reduce throughput and increase latency under load.

5. Weakened data integrity

When the size of a column is unbounded, it becomes more difficult to validate and enforce clean data. Fields that should have length constraints — like email addresses or URLs — can then become risk vectors for inconsistent input.

Overprovisioned character limits

Even when you avoid varchar(max) and nvarchar(max), it's common to define character limits that are much larger than necessary — often out of caution or habit. While this avoids the most extreme issues of unlimited-length fields, it still creates meaningful technical and business problems.

1. Poor query plan estimates

SQL Server's query optimizer makes assumptions about the average size of data in fixed-length columns. For example, if a column is defined as varchar(4000) but most of the values it stores are only 100 characters long, then the optimizer assumes far more data than is actually used. This can lead to inefficient query plans, excessive memory allocation, and suboptimal performance.

2. Inflated index and memory usage

Indexing a column with an overly generous limit means SQL Server allocates more space per index entry than needed. This bloats the size of the index, increases I/O, and adds unnecessary pressure to memory and CPU.

3. Reduced data quality

Generous field limits remove an important layer of data validation. For example, allowing 4,000 characters in an email address or postal code invites inconsistent, malformed, or unexpected data — leading to downstream complications in reporting, integrations, and user interface.

4. Hidden maintenance costs

Wide columns increase the size of data pages and backups, even if they rarely hold large values. This translates into longer maintenance windows for index rebuilds, higher backup storage needs, and longer restore times — all of which become more noticeable at scale.

5. Missed optimization opportunities

When realistic limits are not applied, SQL Server can't take advantage of features like row compression or optimized buffer usage as effectively. Every overly wide column represents a missed chance to optimize.

Business Impact

The technical challenges I outlined above translate directly into business risks:

Scalability limits: As the number of concurrent users interacting with the system increases, performance issues that relate to unbounded and overprovisioned text fields become more pronounced.
Operational costs: Increased backup size, longer maintenance windows, and heavier I/O all translate to higher infrastructure and staffing costs.
Poor user experience: Slow queries result in slower applications, which impacts customer satisfaction and user retention.
Regulatory and data quality issues: Without enforced constraints, it becomes harder to maintain compliance or perform reliable data analytics.

Solution Strategy

We take a proactive and data-driven approach to optimization here.

Step 1: Audit and analyze

Inspect every text column to determine:

The actual size of the largest value currently stored in the column.
The context and purpose of the data stored in the column (e.g., user input, or external integration).
Whether or not a varchar(max) type is a mandatory business and/or technical requirement.

Step 2: Calculate optimal limits

For every varchar and nvarchar column:

Given the actual number of characters in the largest value, add a 20% buffer.
Round up to the nearest 100 characters.
If the result is less than 8K (varchar) or 4K (nvarchar), then adjust the MAXIMUM_CHARACTER_LENGTH accordingly.

For example, if the largest text value is 1,200 characters → Add 20% → 1,440 → Round up to the nearest 100 → Final column size = varchar(1500)

Step 3: Apply industry standards

Ensure columns that store email addresses are set to varchar(254), as per the standard RFC recommendations discussed here.
Ensure columns that store URLs are set to varchar(2000), as per the recommendations discussed here.

Step 4: Upgrade and align the platform

After database schema changes are made, update Object-Relational Mapping (ORM) configuration classes in the source code for the persistence layer, and then synchronize business rules in the user interface layer with Maximum Length input validation attributes.

Provisioning Best Practices

Defining character limits should be intentional and based on actual usage, not guesswork. Here are some recommended guidelines to your database schema design with real-world needs:

Use varchar(max) and nvarchar(max) only when truly necessary

Reserve these types for fields where values are expected to exceed 8,000 characters (varchar) or 4,000 characters (nvarchar). Most application data does not meet this requirement.

Define realistic, evidence-based character limits

Choose a reasonable size limit for each varchar and nvarchar column. If a column value is never expected to exceed 100 characters, for example, then specify a limit between 100 and 200 characters — and don't allow SQL Server to believe it can expect values with 400 characters!

Avoid multiple varchar(max) columns in the same table

If you really need a varchar(max) column or an nvarchar(max) column, then ensure only one such column exists per table. This prevents excessive weight in out-of-row LOB storage and maintains better row-level performance.

Prefer nvarchar only when Unicode is required

Use nvarchar only if you need to support multiple languages or character sets. For English-only or single-language applications, varchar is more efficient.

Support validation at every layer

Enforce limits in the database schema, reflect them in ORM configuration, and surface them in the UI with appropriate validation messages.

Let data drive the limits

This is it in a nutshell. Know (and understand!) your data, and regularly review the database schema to ensure it enforces tight, data-informed constraints. You'll improve performance, strengthen data integrity, improve resource utilization, and simplify maintenance.

Closing Thoughts

Sometimes it's easy to believe provisioning database columns properly is a low-value, low-impact concern, but the shape of your data matters. Especially when it's big data.

When your platform grows, the consequences of poor and uninformed decisions echo across time and space — quite literally. Making deliberate and informed choices about database column types and size constraints is about much more than technology improvement: it’s a strategic business investment to help ensure a leaner, faster, and more robust platform.

When was the last time you did an audit of the text columns in your SQL Server database? You might be surprised by the positive impact of a small, well-informed change. And a few hours of analysis today could save you days of troubleshooting tomorrow!

#DatabaseDesign #DataIntegrity #Scalability #SoftwareArchitecture #PerformanceOptimization

This article was originally posted on LinkedIn. https://www.linkedin.com/pulse/why-we-routinely-audit-data-types-sizes-sql-server-you-daniel-miller-2em9c