The landscape of artificial intelligence is undergoing a radical transformation, particularly in the realm of large language models (LLMs) which have made significant strides in multimodal tasks. These models have demonstrated immense potential in understanding and generating language, but the majority of current multimodal models still rely on the autoregressive (AR) architecture, which lacks flexibility and has a singular reasoning process. Enter FUDOKI, a groundbreaking model developed by a research team from the University of Hong Kong and Huawei Noah's Ark Lab, designed to break through these limitations.
FUDOKI stands out with its innovative Discrete Flow Matching architecture, a departure from traditional AR models. It employs a parallel denoising mechanism that enables bidirectional information integration, significantly enhancing the model's performance in complex reasoning and generation tasks. This model not only bridges the gap between image generation and text understanding but also achieves unified modeling of both.
The power of FUDOKI lies in its mask-free design, which introduces more flexibility into the generation process. During reasoning, FUDOKI allows dynamic adjustment of generation results, mimicking human thought processes. Its prowess in generating images is particularly noteworthy, achieving a score of 0.76 on the GenEval benchmark, surpassing AR models of the same size and demonstrating high-quality generation effects and semantic accuracy.
The construction of FUDOKI relies on metric-induced probability paths and optimal velocity dynamics. These techniques enable the model to consider the semantic similarity of each token during the generation process, resulting in more natural text and image generation. Moreover, FUDOKI leverages pre-trained AR models for initialization, reducing training costs and increasing efficiency.
FUDOKI's introduction offers a fresh perspective on multimodal generation and understanding, laying a more solid foundation for the development of general artificial intelligence. As we look to the future, we anticipate that FUDOKI will lead to further exploration and breakthroughs, propelling the advancement of AI technology.