Skip to content

Fix CnosDB CI: replace insufficient sleep with readiness poll#1310

Open
mrigger wants to merge 6 commits intomainfrom
fix/cnosdb-ci-resource-error
Open

Fix CnosDB CI: replace insufficient sleep with readiness poll#1310
mrigger wants to merge 6 commits intomainfrom
fix/cnosdb-ci-resource-error

Conversation

@mrigger
Copy link
Copy Markdown
Contributor

@mrigger mrigger commented Apr 4, 2026

CnosDB setup had only a 5s sleep after starting the container, causing "Resource temporarily unavailable" errors. Poll the /api/v1/ping endpoint instead. Also fix copy-paste error in step name.

@mrigger mrigger force-pushed the fix/cnosdb-ci-resource-error branch 3 times, most recently from 6347adf to d63f9c6 Compare April 5, 2026 06:33
mrigger and others added 3 commits April 5, 2026 14:44
Two issues caused CnosDB test failures:
1. HTTP ping responded before storage layer was ready, causing
   "Resource temporarily unavailable" on first SQL command. Use
   SQL-level readiness check instead.
2. Mismatched Log4j versions (log4j-api 2.10.0 vs log4j-core 2.18.0)
   from Hive transitive dependencies caused NoSuchMethodError. Exclude
   log4j-slf4j-impl from Hive deps, align Log4j2 at 2.24.3, and add
   log4j-slf4j2-impl for SLF4J 2.x compatibility.

Also fix copy-paste error in CI step name and upgrade HSQLDB to 2.7.4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…and SQL readiness check

CnosDB cannot reliably drop a database when the API request targets that
same database. Use a separate client connected to "public" for DROP/CREATE
DATABASE operations. Also limit to single thread since CnosDB's storage
engine cannot handle concurrent database lifecycle operations, and replace
the CI port check with a SQL readiness poll.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mrigger mrigger force-pushed the fix/cnosdb-ci-resource-error branch from d63f9c6 to e3a5407 Compare April 5, 2026 15:01
mrigger and others added 3 commits April 6, 2026 09:28
CnosDB's Tskv index storage may not be fully ready even after the HTTP
API responds to simple queries. On CI runners, DDL operations like
DROP/CREATE DATABASE consistently fail with "Resource temporarily
unavailable (os error 11)". Additionally, a newly created database may
not be immediately queryable, causing "Database not found" errors.

Add executeWithRetry() that retries these specific transient errors
during database setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The DROP DATABASE may appear to succeed but not yet propagate in
CnosDB's storage layer, causing the subsequent CREATE DATABASE to
fail with "Database already exists".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Retrying just CREATE DATABASE on "already exists" is futile when the
preceding DROP hasn't propagated in CnosDB's storage layer. Instead,
retry the DROP+CREATE sequence together so the DROP is re-issued before
each CREATE attempt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant