-
Notifications
You must be signed in to change notification settings - Fork 56
Clarification on Fine-tuning PE-Core Models using OpenCLIP Framework #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds great. Thank you for the update. |
Hello @rakomar, before the official (HuggingFace) timm and open_clip integration, a quick pe+open_clip integration (draft) is here: https://github.com/berniebear/open_clip (experimental usage only. no torch script support for now). It uses the original CLIP class in open_clip, featuring a customized PE vision transformer (w/ attention pool, abs + rope pos_emb, etc). The text transformer is identical to the one used in open_clip. The PE model configs are under src/model_configs/{PE-Core-B16-224, PE-Core-L14-336, PE-Core-G14-448}.json. You may use standard open_clip framework like: import torch
from PIL import Image
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('PE-Core-L14-336', pretrained=True)
tokenizer = open_clip.get_tokenizer('PE-Core-L14-336')
image = preprocess(Image.open("docs/cat.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.autocast("cuda"):
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", probs) # prints: [[0., 0., 1.]] Some open_clip functionalities may not be supported before the official open_clip integration by HF. Hope this helps! Cheers! |
Thank you a lot for the clarification. |
Hi @berniebear, thanks for sharing the repo (https://github.com/berniebear/open_clip) earlier. It has been very helpful. I encountered an issue with the embed_dim setting while using PE-Core-B16-224 and submitted a PR to address it. Just wanted to let you know in case it's useful. |
Merged! Thank you |
Also PE is now integrated in the latest timm (need to clone and install the latest one). eg import timm
model = timm.create_model('vit_pe_core_gigantic_patch14_448', pretrained=True) Hope this helps! |
Hi @rakomar, I have been using the pe+open_clip integration draft repo that Bernie shared earlier (https://github.com/berniebear/open_clip) to fine-tune PE-base on my own datasets, and have seen some gains. However, when it comes to more flexible operations like freezing some text / image layers, the draft hasn't incorporated functions like visual.freeze() so it still needs some extra work - but should be much easier now since PE has already been integrated in timm. I look forward to seeing PE fully integrated into open_clip to make full usage of the various features that open_clip supports. Wondering if there is any ongoing work for this? @berniebear. Thank you and I appreciate your wonderful work and follow-up. |
Uh oh!
There was an error while loading. Please reload this page.
Issue: Fine-tuning PE-Core Models with OpenCLIP
Thank you for your excellent work on the PE-Core models and for open-sourcing them!
In the documentation you only refer to using the OpenCLIP framework for training and evaluation of the PE-Code encoder models. We're attempting contrastive fine-tuning of these models (e.g.,
PE-Core-L14-336
) using OpenCLIP with custom image-caption datasets, but encountered a few challenges:1. Model Registration in OpenCLIP
PE-Core models (e.g.,
PE-Core-L14-336
) aren't registered directly in OpenCLIP:This results in
RuntimeError(f'Model config for {model_name} not found.')
2. JSON Configuration Challenges
We attempted to register the model using a custom configuration as suggested in OpenCLIP Discussion #1022:
With a configuration like:
However, this approach led to issues:
visual.attn_pool.*
)Question
Could you clarify:
Your support on this would be greatly appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: