Fix compare of data fields for NULL values#77
Closed
dev-jan wants to merge 2 commits intoDBDiff:masterfrom
Closed
Conversation
The MySQL md5() function will return NULL if one of the given columns has a NULL values in it. So if one field in the row is NULL, the whole MD5 will be NULL and the compare not actually work. If the values is wrapped into a "IFNULL", all NULL values will be replaced with '' (an empty string) for the compare of the values. This can maybe be bad also, as the values '' and NULL are threaded the same. For this reason all NULL values are joined together to check if there are still the same as before. Fixes DBDiff#53
Because the dependencies require php 7 installed.
twouters
approved these changes
Jun 29, 2018
jasonschulte
approved these changes
Nov 6, 2023
|
|
We use DBDiff in our project Tikiwiki CMS and I recently ran into the same issue in our fork and implemented a fix. When I was about to open a new PR upstream, I noticed that this one already exists. Would you prefer that this PR be updated, or should we open a new one Thank you |
jasdeepkhalsa
added a commit
that referenced
this pull request
Mar 25, 2026
MySQL CONCAT() returns NULL if any argument is NULL, causing SHA2(CONCAT(...)) to produce NULL for any row with nullable columns. Two genuinely different rows both hash to NULL and compare as equal — silently dropping real data differences from the diff output. Fix: wrap each column in IFNULL(col, '\0') inside the CAST before concatenation, and add a NULL-presence bitmap (IF(col IS NULL, '1', '0')) as a secondary comparison column. The WHERE clause now checks both hash1 <> hash2 OR nullmap1 <> nullmap2, so: - NULL vs empty string are distinguished (different bitmaps) - NULL vs non-NULL values are caught (different hashes + bitmaps) - Regular value changes continue to work via hash comparison Refs: bugs.md Bug #7, PR #77, PR #63
jasdeepkhalsa
added a commit
that referenced
this pull request
Mar 26, 2026
MySQL CONCAT() returns NULL if any argument is NULL, causing SHA2(CONCAT(...)) to produce NULL for any row with nullable columns. Two genuinely different rows both hash to NULL and compare as equal — silently dropping real data differences from the diff output. Fix: wrap each column in IFNULL(col, '\0') inside the CAST before concatenation, and add a NULL-presence bitmap (IF(col IS NULL, '1', '0')) as a secondary comparison column. The WHERE clause now checks both hash1 <> hash2 OR nullmap1 <> nullmap2, so: - NULL vs empty string are distinguished (different bitmaps) - NULL vs non-NULL values are caught (different hashes + bitmaps) - Regular value changes continue to work via hash comparison Related contributor PRs: #77, #63
jasdeepkhalsa
added a commit
that referenced
this pull request
Mar 26, 2026
## Fix 10 bugs — data integrity, SQL generation, and robustness This PR fixes 10 bugs across the SQL generator, data diff pipeline, and core runtime. Each fix has independent unit tests and, where applicable, dedicated end-to-end scenarios. --- ### Changes #### Bug 1 — Empty constraint name generates invalid SQL `AlterTableDropConstraintSQL::getUp()` now throws a dedicated `InvalidConstraintException` (extending `BaseException`) instead of emitting `DROP CONSTRAINT \`\``. New custom exception class: `src/Exceptions/InvalidConstraintException.php`. #### Bug 2 & 9 — `UpdateDataSQL` crashes on `DiffOpAdd` / emits empty string for NULL `getDown()` crashed with a fatal error when the diff contained `DiffOpAdd` objects (which have no `getOldValue()`). Also, both directions wrote empty string instead of SQL `NULL` for null column values. Closes contributor PR #93. #### Bug 3 — Hyphenated database names cause MySQL syntax error 1064 `LocalTableData` data-diff queries now use backtick-quoted `` `db`.`table` `` references. MySQL was interpreting hyphens as subtraction operators. Closes contributor PR #92. #### Bug 4 — `TableIterator::next()` passes Illuminate `Collection` to `array_merge()` `->get()` returns a `Collection`; it is now converted with `->toArray()` and `stdClass` rows are normalised to associative arrays before being passed to `ArrayDiff`. #### Bug 5 — Null engine value generates `ALTER TABLE … ENGINE = ;` `AlterTableEngineSQL::getUp()` and `getDown()` return empty string when either engine value is empty/null. `TableSchema::getDiff()` also skips engine diff creation in this case. #### Bug 6 — Views appear in table diff and generate `DROP TABLE` instead of `DROP VIEW` `MySQLAdapter::getTables()` now uses `SHOW FULL TABLES WHERE Table_type = 'BASE TABLE'`. PostgreSQL and SQLite adapters were already correct. Closes contributor PR #123. #### Bug 7 — Rows with NULL columns are silently dropped from data diff MySQL `CONCAT()` returns `NULL` if any argument is `NULL`, collapsing all such rows to an identical hash. Fix wraps each column in `IFNULL(col, '\0')` and adds a NULL-presence bitmap as a secondary comparison field. Closes contributor PRs #77 and #63. #### Bug 8 — INSERT statements use positional `VALUES(...)` instead of named columns `InsertDataSQL::getUp()` and `DeleteDataSQL::getDown()` now emit explicit column lists: `` INSERT INTO `t` (`col1`,`col2`) VALUES(...) ``. This prevents silent data corruption when column order differs between source and target. #### Bug 10 — Library hardcodes `memory_limit = 512M` via `ini_set` Both `ini_set('memory_limit', '512M')` calls removed from `DBDiff::run()` and `DBDiff::getDiffResult()`. The CLI entry points now set a sensible 1G default instead (see below). --- ### Tests | Area | What was added | |------|---------------| | Unit | `AlterTableDropConstraintSQLTest`, `UpdateDataSQLTest`, `MySQLDialectQuoteTest`, `ArrayDiffTest`, `AlterTableEngineSQLTest`, `InsertDataSQLTest`, `DropTableSQLTest`, `AddTableSQLTest`, `MemoryLimitTest` | | E2E | `End2EndTest::testHyphenatedDatabaseNames` (Bug 3) | | Comprehensive | `AbstractComprehensiveTest::testViewsExcludedFromDiff` (Bug 6), `::testNullableColumnDataDetected` (Bug 7) | | Baselines | All PostgreSQL 14–18 and SQLite comprehensive/e2e baselines updated for Bug 8's new column-list INSERT format | --- ### Configurable memory limit The CLI entry points (`dbdiff`, `dbdiff.php`, PHAR) now set a default PHP memory limit of **1G** on startup. PHP's built-in default of 128M is too low for real-world database sizes. The limit is fully configurable at three levels (highest wins): 1. `--memory-limit=<value>` CLI flag (e.g. `--memory-limit=2G`) 2. `memory_limit: <value>` top-level key in `.dbdiff` / `dbdiff.yml` 3. `1G` hard default in the entry point scripts Any PHP shorthand is accepted (`512M`, `1G`, `2G`, `-1` for unlimited). The `ini_set` lives only in the CLI entry points — library consumers embedding DBDiff via Composer are unaffected. --- ### SonarQube - Trailing whitespace removed from `LocalTableData.php` L413–414 - `\RuntimeException` replaced with dedicated `InvalidConstraintException` in `AlterTableDropConstraintSQL` - `$memory_limit` renamed to `$memoryLimit` in `DefaultParams` to match camelCase convention
|
Thank you for raising this PR. This was independently fixed using the same IFNULL strategy plus an additional NULL-bitmap for more precise detection. See PR #157 (Bug 7). Closing as independently fixed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



The MySQL md5() function will return NULL if one of the given columns has
a NULL values in it. So if one field in the row is NULL, the whole MD5
will be NULL and the compare not actually work.
If the values is wrapped into a "IFNULL", all NULL values will be replaced
with '' (an empty string) for the compare of the values. This can maybe be
bad also, as the values '' and NULL are threaded the same. For this reason
all NULL values are joined together to check if there are still the same
as before.
Fixes #53