-
Notifications
You must be signed in to change notification settings - Fork 3.1k
add flash_attention on model chatglm_v2 #9296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add flash_attention on model chatglm_v2 #9296
Conversation
|
Thanks for your contribution! |
|
huxinye seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9296 +/- ##
===========================================
- Coverage 53.11% 52.92% -0.19%
===========================================
Files 665 660 -5
Lines 109041 106857 -2184
===========================================
- Hits 57918 56555 -1363
+ Misses 51123 50302 -821 ☔ View full report in Codecov by Sentry. |
| self.hidden_size_per_attention_head, | ||
| ] | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一段reshape的逻辑不应该加在这里,破坏了原有的非fa2的逻辑。而且下面还有支持sequence_parallel的逻辑会重新reshape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按要求修改了
| ) | ||
| version_check = False | ||
| if self.config.use_flash_attention and version_check: | ||
| attention_mask = attention_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对qkv的reshape可以放在if config.use_flash_attention下面,并且需要考虑sequence parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按要求修改了
lugimzzz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
others
PR changes
models
Description
add flash_attention on model chatglm_v2