FlowInOne¶
::: { .text-center }
Unified image-to-image generation via multimodal flow matching
Seamlessly fuse sketches, text, layouts, and symbols into photorealistic images with a single flow model.
:::
:::: {.grid .grid-3} ::: {.card}
๐๏ธ Multimodal Visual Encoding¶
Encode freehand sketches, handwritten text, layout primitives, and symbolic instructions into a shared 2D visual latent spaceโpreserving semantics and spatial structure without modality-specific decoders. :::
::: {.card}
๐ Geometry-Aware Flow Matching¶
Leverage geometry-preserving flow dynamics to generate high-fidelity images with accurate spatial alignment and structural coherence from fused visual prompts. :::
::: {.card}
๐งฉ Unified Latent Space¶
Train a single denoisable latent space that supports diverse input modalities, eliminating the need for alignment losses or separate conditioning pathways. :::
::: {.card}
๐ End-to-End Generation¶
Generate photorealistic target images directly from multimodal visual promptsโno cascaded models, no post-processing, no compromise on quality. ::: ::::
::: {.grid .grid-2}
::: ::: { .text-center } Get Started โ :::