Skip to content

Fix compare of data fields for NULL values#77

Closed
dev-jan wants to merge 2 commits intoDBDiff:masterfrom
dev-jan:fix-null-data-compare
Closed

Fix compare of data fields for NULL values#77
dev-jan wants to merge 2 commits intoDBDiff:masterfrom
dev-jan:fix-null-data-compare

Conversation

@dev-jan
Copy link
Copy Markdown

@dev-jan dev-jan commented Apr 26, 2018

The MySQL md5() function will return NULL if one of the given columns has
a NULL values in it. So if one field in the row is NULL, the whole MD5
will be NULL and the compare not actually work.

If the values is wrapped into a "IFNULL", all NULL values will be replaced
with '' (an empty string) for the compare of the values. This can maybe be
bad also, as the values '' and NULL are threaded the same. For this reason
all NULL values are joined together to check if there are still the same
as before.

Fixes #53

dev-jan added 2 commits April 26, 2018 22:33
The MySQL md5() function will return NULL if one of the given columns has
a NULL values in it. So if one field in the row is NULL, the whole MD5
will be NULL and the compare not actually work.

If the values is wrapped into a "IFNULL", all NULL values will be replaced
with '' (an empty string) for the compare of the values. This can maybe be
bad also, as the values '' and NULL are threaded the same. For this reason
all NULL values are joined together to check if there are still the same
as before.

Fixes DBDiff#53
Because the dependencies require php 7 installed.
Copy link
Copy Markdown

@twouters twouters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Feb 8, 2026

@bienvenuushindi
Copy link
Copy Markdown

bienvenuushindi commented Feb 26, 2026

Hi @jasdeepkhalsa

We use DBDiff in our project Tikiwiki CMS and I recently ran into the same issue in our fork and implemented a fix. When I was about to open a new PR upstream, I noticed that this one already exists.
Since this PR has been open for quite some time and currently has merge conflicts, I was wondering whether it is still being considered.

Would you prefer that this PR be updated, or should we open a new one

Thank you

jasdeepkhalsa added a commit that referenced this pull request Mar 25, 2026
MySQL CONCAT() returns NULL if any argument is NULL, causing SHA2(CONCAT(...))
to produce NULL for any row with nullable columns. Two genuinely different
rows both hash to NULL and compare as equal — silently dropping real
data differences from the diff output.

Fix: wrap each column in IFNULL(col, '\0') inside the CAST before
concatenation, and add a NULL-presence bitmap (IF(col IS NULL, '1', '0'))
as a secondary comparison column. The WHERE clause now checks both
hash1 <> hash2 OR nullmap1 <> nullmap2, so:
- NULL vs empty string are distinguished (different bitmaps)
- NULL vs non-NULL values are caught (different hashes + bitmaps)
- Regular value changes continue to work via hash comparison

Refs: bugs.md Bug #7, PR #77, PR #63
jasdeepkhalsa added a commit that referenced this pull request Mar 26, 2026
MySQL CONCAT() returns NULL if any argument is NULL, causing SHA2(CONCAT(...))
to produce NULL for any row with nullable columns. Two genuinely different
rows both hash to NULL and compare as equal — silently dropping real
data differences from the diff output.

Fix: wrap each column in IFNULL(col, '\0') inside the CAST before
concatenation, and add a NULL-presence bitmap (IF(col IS NULL, '1', '0'))
as a secondary comparison column. The WHERE clause now checks both
hash1 <> hash2 OR nullmap1 <> nullmap2, so:
- NULL vs empty string are distinguished (different bitmaps)
- NULL vs non-NULL values are caught (different hashes + bitmaps)
- Regular value changes continue to work via hash comparison

Related contributor PRs: #77, #63
jasdeepkhalsa added a commit that referenced this pull request Mar 26, 2026
## Fix 10 bugs — data integrity, SQL generation, and robustness

This PR fixes 10 bugs across the SQL generator, data diff pipeline, and
core runtime. Each fix has independent unit tests and, where applicable,
dedicated end-to-end scenarios.

---

### Changes

#### Bug 1 — Empty constraint name generates invalid SQL
`AlterTableDropConstraintSQL::getUp()` now throws a dedicated
`InvalidConstraintException` (extending `BaseException`) instead of
emitting `DROP CONSTRAINT \`\``.
New custom exception class:
`src/Exceptions/InvalidConstraintException.php`.

#### Bug 2 & 9 — `UpdateDataSQL` crashes on `DiffOpAdd` / emits empty
string for NULL
`getDown()` crashed with a fatal error when the diff contained
`DiffOpAdd` objects (which have no `getOldValue()`). Also, both
directions wrote empty string instead of SQL `NULL` for null column
values.
Closes contributor PR #93.

#### Bug 3 — Hyphenated database names cause MySQL syntax error 1064
`LocalTableData` data-diff queries now use backtick-quoted ``
`db`.`table` `` references. MySQL was interpreting hyphens as
subtraction operators.
Closes contributor PR #92.

#### Bug 4 — `TableIterator::next()` passes Illuminate `Collection` to
`array_merge()`
`->get()` returns a `Collection`; it is now converted with `->toArray()`
and `stdClass` rows are normalised to associative arrays before being
passed to `ArrayDiff`.

#### Bug 5 — Null engine value generates `ALTER TABLE … ENGINE = ;`
`AlterTableEngineSQL::getUp()` and `getDown()` return empty string when
either engine value is empty/null. `TableSchema::getDiff()` also skips
engine diff creation in this case.

#### Bug 6 — Views appear in table diff and generate `DROP TABLE`
instead of `DROP VIEW`
`MySQLAdapter::getTables()` now uses `SHOW FULL TABLES WHERE Table_type
= 'BASE TABLE'`. PostgreSQL and SQLite adapters were already correct.
Closes contributor PR #123.

#### Bug 7 — Rows with NULL columns are silently dropped from data diff
MySQL `CONCAT()` returns `NULL` if any argument is `NULL`, collapsing
all such rows to an identical hash. Fix wraps each column in
`IFNULL(col, '\0')` and adds a NULL-presence bitmap as a secondary
comparison field.
Closes contributor PRs #77 and #63.

#### Bug 8 — INSERT statements use positional `VALUES(...)` instead of
named columns
`InsertDataSQL::getUp()` and `DeleteDataSQL::getDown()` now emit
explicit column lists: `` INSERT INTO `t` (`col1`,`col2`) VALUES(...)
``. This prevents silent data corruption when column order differs
between source and target.

#### Bug 10 — Library hardcodes `memory_limit = 512M` via `ini_set`
Both `ini_set('memory_limit', '512M')` calls removed from
`DBDiff::run()` and `DBDiff::getDiffResult()`. The CLI entry points now
set a sensible 1G default instead (see below).

---

### Tests

| Area | What was added |
|------|---------------|
| Unit | `AlterTableDropConstraintSQLTest`, `UpdateDataSQLTest`,
`MySQLDialectQuoteTest`, `ArrayDiffTest`, `AlterTableEngineSQLTest`,
`InsertDataSQLTest`, `DropTableSQLTest`, `AddTableSQLTest`,
`MemoryLimitTest` |
| E2E | `End2EndTest::testHyphenatedDatabaseNames` (Bug 3) |
| Comprehensive | `AbstractComprehensiveTest::testViewsExcludedFromDiff`
(Bug 6), `::testNullableColumnDataDetected` (Bug 7) |
| Baselines | All PostgreSQL 14–18 and SQLite comprehensive/e2e
baselines updated for Bug 8's new column-list INSERT format |

---

### Configurable memory limit

The CLI entry points (`dbdiff`, `dbdiff.php`, PHAR) now set a default
PHP memory limit of **1G** on startup. PHP's built-in default of 128M is
too low for real-world database sizes. The limit is fully configurable
at three levels (highest wins):

1. `--memory-limit=<value>` CLI flag (e.g. `--memory-limit=2G`)
2. `memory_limit: <value>` top-level key in `.dbdiff` / `dbdiff.yml`
3. `1G` hard default in the entry point scripts

Any PHP shorthand is accepted (`512M`, `1G`, `2G`, `-1` for unlimited).
The `ini_set` lives only in the CLI entry points — library consumers
embedding DBDiff via Composer are unaffected.

---

### SonarQube
- Trailing whitespace removed from `LocalTableData.php` L413–414
- `\RuntimeException` replaced with dedicated
`InvalidConstraintException` in `AlterTableDropConstraintSQL`
- `$memory_limit` renamed to `$memoryLimit` in `DefaultParams` to match
camelCase convention
@github-actions github-actions bot added bug fixed Fixed but not yet released labels Apr 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Thank you for raising this PR. This was independently fixed using the same IFNULL strategy plus an additional NULL-bitmap for more precise detection. See PR #157 (Bug 7). Closing as independently fixed.

@github-actions github-actions bot closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fixed Fixed but not yet released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Edited row in table does not compre in diff

4 participants