LiteLVLM

Efficient Large Vision-Language Model for Pixel Grounding

1

Upload an Image

Retain Visual Tokens

576 / 576
2

Text Instruction

0 / 512

Examples

Select a sample to fill the image
and text instruction.

1 / 10
3

Output Result

Segmentation output will be shown here...

Token Pruning Animation

Visualize LiteLVLM's token pruning process.