INT-ACT is a probing suite to evaluate the generalization capability of robotic VLAs. 
            It consists of three categories of tasks that probe the generalization boundaries of VLAs.
            
            
            Object Diversity: Ability to handle out-of-distribution objects.
            
            Language Complexity: Ability to understand complex language instructions.
            
            Vision-Language Thinking: Ability to perform commonsense reasoning and visual-language thinking.
          
          
            
              Truly generalist policies require perceptual ability beyond the object distributions encountered during training or fine-tuning. 
              
              In SimplerEnv, which assume the fine-tuning dataset is BridgeV2, all manipulation tasks are Put {Source} on {Target}. Therefore, We introduce four categories of out-of-distribution objects that resemble original objects in affordances/grasping difficulty.
              
              
OOD Source: Source object not present in BridgeV2, but target object is.
              
OOD Target: Target object not present in BridgeV2, but source object is.
              
OOD Source + Target: Both source and target objects are not present in BridgeV2.
              
OOD Relation: Relation between objects is different from the training data. For example, if the training data has Put {Source} on {Target}, then the OOD relation can be Put {Target} on {Source}.