Researchers at Apple Develop ‘RoomPlan’: An API for Representing Rooms in a 3D Parametric View
Machine learning (ML) research on 3D scene interpretation has been ongoing for over a decade. For the developer and computer vision communities, a new era in scene understanding has recently begun with the introduction of LiDAR sensor functionality in Apple’s iPhone and iPad. Now, breakthroughs in ML and fundamental scene comprehension research are influencing daily life. Different approaches are being used to tackle various aspects of the problem, including depth estimation, 3D reconstruction, instance segmentation, object identification, etc. For many applications like augmented reality, robotics, e-commerce, gaming, and real estate, developing a 3D floor layout is increasingly essential.
Apple introduced RoomPlan in 2022 in response to the issue of automated 3D floor-plan production. RoomPlan is a new Swift API that leverages the camera and LiDAR scanner on iPhone and iPad to produce a 3D floor plan of space, including measurements and types of furniture. It is powered by the ARKit and RealityKit software frameworks for creating augmented reality games and apps. The user is guided by simple suggestions to more effectively record (or scan) the planes and objects in the space at the ideal speed, illumination, and distance. The finished room capture can be exported to different Universal Scene Description (USD) formats, such as USD, USDA, or USDZ, and is offered as a parametric representation.
Two essential elements form the basis of RoomPlan:
- Estimate of a 3D room layout
- Pipeline for detecting 3D objects
In the video below, one can see an illustration of the scanning procedure from the perspective of an iPhone user scanning a kitchen.
The 3D room layout estimator uses two neural networks; one recognizes walls and openings, and the other detects doors and windows. The estimator uses the predicted wall height to lift walls and doorways into three dimensions. Given the wall information and camera location, it recognizes doors and windows on 2D wall planes and then projects them into 3D space.
The 3D object-detection pipeline detects 16 item categories in 3D and includes classic pieces of furniture like the sofa, table, and refrigerator that define a room. These two primary 3D elements are covered in further detail in this article.
Owing to Apple’s commitment to user privacy, RoomPlan processing is solely performed on the user’s device rather than on Apple servers. To do this, Apple Neural Engine is the only computing platform for the three neural networks described previously in this article (ANE). To meet the latency and memory needs for RoomPlan to execute correctly on the device, they applied methods including quantization, model trimming, and architecture search. Their objective was to prevent CPU/GPU heat throttling, resulting in unexpected behaviors like frame dips while users scan vast rooms. To do this, they discovered that a five-minute scan is sufficient to cover a variety of room sizes, including spaces up to 15 m × 15 m.
In this post, they go through the technical specifics of the RoomPlan API, which enables customers to construct a 3D floor plan of their space. This capability is critical for numerous augmented reality applications, robotics, e-commerce, gaming, and real estate applications. They detail the two main parts of RoomPlan: room layout estimate and 3D object detection. They demonstrate how these pipelines can operate on iPhone and iPad devices with little power consumption and delay.
Please Don't Forget To Join Our ML Subreddit