OpenAI Introduces Point-E: A Machine Learning System That Can Rapidly Generate 3D Images Based On Text Prompts
With a great deal of work already being done to improve existing machine learning and deep learning techniques, one area that has drawn researchers’ attention is that regarding 3D geometry and computer graphics applications. More precisely, 3D object generation, which has also produced some incredibly promising outcomes. The field is fairly broad and includes several use cases like generating 3D models from images, integrating 3D models, creating 3D models from text prompts, etc. Similar to how 2D art generators recently caused a frenzy among the common public, one can rightly believe that model-synthesizing AI could be the next big industry disruptor. However, despite their appreciable results, current state-of-the-art techniques for text-conditional 3D object synthesis fall short in terms of computational efficiency.
This computational inefficiency becomes even more apparent compared to state-of-the-art generative image models. These models are capable of generating samples within a matter of seconds, whereas text-conditional 3D object generation models often require many GPU hours to produce a single sample. Working on this problem statement, OpenAI recently released Point-E, an open-source machine learning system that can create a 3D object from a text prompt in one to two minutes using a single Nvidia V100 GPU.
Compared to other traditional 3D object generation models, Point-E is unique. This is so that the model can produce point clouds, which are discrete collections of data points in a 3D space representing the shape indicated by the input text prompt. The computational efficiency of Point-E is improved by the fact that these point clouds are simpler to synthesize. However, their major drawback is that point clouds often fall short of capturing the finer details of an object. To overcome this limitation, the team trained a second AI system that converts Point-E’s point clouds into meshes.
Apart from the abovementioned mesh generating model, Point-E also consists of two diffusion models, a text-to-image model, and an image-to-3D model. The text-to-image model was trained using annotated visual data to comprehend the relationship between certain words and visual concepts. This underlying model is comparable to other models like Hugging Face’s Stable Diffusion model. The succeeding image-to-3D model was trained differently using a set of images matched with 3D objects.
The researchers noted that although Point-E can often produce point clouds that frequently match text prompts, it is not flawless. Occasionally the underlying image-to-3D model is unable to comprehend the generated image from the text-to-image model, leading to a shape that does not correspond to the text prompt. There is still a lot more work to be done to achieve sample quality that is at par with other state-of-the-art models. However, Point-E can sample from data up to two orders of magnitude faster, which can be a useful trade-off in some use situations. According to OpenAI researchers, one such use for Point-E might be creating real-world objects using techniques like 3D printing. Furthermore, the technology may even be employed in the video game and animation industries.
3D models have found their use in several industries, including entertainment, interior design, architecture, and scientific fields. However, creating these 3D models requires grueling time and effort ranging from a few hours to even many days. Such time and effort are intended to be reduced through innovations like Point-E. Coming to one significant issue where Point-E could suffer concerns the biases that the model may inherit from the training data. As a result, Open AI views Point-E as more of a beginning point and even open-sourced the model to will encourage the community to study text-to-3D synthesis further. This is also where a lot of future development will be concentrated.
Check out the Paper and Github. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.