Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities Design Flashlight using altium — ChatGPT for robotics

Our team leveraged the potential of ChatGPT in the field of robotics, expanding its abilities to control various platforms, including robot arms and drones, through natural language.

Materials utilized in this project:

Hardware components:
Elephant Robotics myCobot-6 DOF collaborative robot
Hand tools and fabrication machines:
Elephant Robotics myCobot 280 pi
Elephant Robotics Suction Pump

ChatGPT About:

Microsoft’s report on chatGPT’s contribution to the developing field of robots proved to be a part of this selection. The following paragraphs comprise an excerpt from that article, focusing primarily on chatGPT’s controlling of robotic arms.

We extended the capabilities of our chatbot ChatGPT to controlling multiple types of devices. We programmed the language by gettingresponse to speak with a bot, such as a robotic arm, a drone, or a domestic robot.

Are you interested in communicating with robots using natural language, just like you would with other people? Imagine instructing your home assistant robot to “heat up your lunch” and having it locate the microwave without any additional effort. Although speaking is the most natural way for us to convey our desires, robots are still primarily controlled through written code. Our team is committed to transforming this paradigm by utilizing OpenAI’s revolutionary language model, ChatGPT, to enable more human-like interactions with robots.

ChatGPT is a language model that has been extensively trained on a vast corpus of text and human interactions, enabling it to generate coherent and grammatically correct responses to a broad range of prompts and inquiries. Our research seeks to determine whether ChatGPT can surpass its text-based capabilities and employ reasoning to facilitate robotic tasks in the physical world. Our ultimate objective is to make human-robot interactions more accessible, removing the need for individuals to acquire proficiency in complex programming languages or understand intricate details about robotic systems. The key obstacle in this endeavor is training ChatGPT to solve problems within the confines of physics, the operational environment’s context, and the impact of the robot’s physical actions on the world’s state.

Although ChatGPT possesses impressive capabilities, it still necessitates some assistance. In our technical paper, we present a set of design principles that can steer language models in tackling robotic tasks. These principles encompass various elements, such as unique prompting structures, high-level APIs, and human feedback through text. Our research represents the initial step in transforming the way we approach the development of robotic systems, and we aspire to motivate other researchers to delve into this exhilarating field. For additional technical information on our methods and concepts, read on.

What are the challenges in robotics right now? ChatGPT is a solution for them.

At present, robotics pipelines typically commence with an engineer or technical user who must translate the task’s demands into code for the system. The engineer is “in the loop,” implying that they must compose new code and specifications to rectify the robot’s behavior. This process is generally slow, given that users must write low-level code, costly, necessitating highly skilled users with extensive knowledge of robotics, and ineffective, as multiple interactions are required to ensure proper functionality.

Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities Design Flashlight using altium 3

ChatGPT introduces a fresh robotics paradigm, enabling a user (even someone without technical expertise) to “sit in the loop,” providing high-level feedback to the large language model (LLM) while monitoring the robot’s performance. By adhering to our design principles, ChatGPT can generate code for various robotics scenarios. Leveraging the LLM’s knowledge, we control various robot form factors across a broad range of tasks, all without requiring any fine-tuning. Our research showcases multiple instances of ChatGPT resolving robotics puzzles, as well as complex robot deployments in the manipulation, aerial, and navigation domains.

Robotics with ChatGPT: design principles

Crafting prompts for LLMs is a tremendously empirical field. Via experimentation, we constructed a methodology and a collection of design principles for developing prompts tailored to robotics tasks:

Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities image 1

we establish a collection of high-level robot APIs or function libraries. These libraries may be unique to a specific robot and should correspond to existing low-level implementations from the robot’s control stack or perception library. To enable ChatGPT to reason about their behaviors, it is crucial to utilize descriptive names for the high-level APIs.
Next, we write a text prompt for ChatGPT which describes the tasks we’re performing and provides detailed instructions on how to use the available libraries. Furthermore, it can provide mandatory information as needed by the ChatGPT , such as task constraints.
or ChatGPT should be able to code the answer its user inputs through auxiliary processing and code language. The user will either review the code directly from the machine or run tandem modes to refine the answer’s quality. If needed, the user can provide a rating for a particular answer with natural language and communicate with ChatGPT about the answer’s quality.

Enough theory… What exactly can ChatGPT do?

We would like to take a look at another case study as an example. You can see more case studies within our code repository.

（Here we only illustrate a more interesting example, there are other examples of robots in the original text）

Can elaborate responses be taken out of context? “User on the line: when something needs to be explained for a complex task”

Next, we used ChatGPT in an experiment where a robot arm was manipulated, using a conversational feedback strategy to help teach a model to code the algorithms first generated. By employing a curriculum-based approach, the model managed to learn to connect these skills together in an organized way to accomplish functions including stacking blocks.

Next, we used ChatGPT in an activity where there is a robot arm. We used consumer survey information to teach the model how to write the initially supplied APIs into more complex high-level functions: ChatGPT coded by itself. Using a step-by-step approach, the model was able to generate a succession of learned skills that had been saved in a logical manner to carry out operations like stacking blocks.

Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities image 2

The following is a conversation with chatGPT about how to control the robotics arm to make an SVG file with the Microsoft logo colors.

User:

The scene can be broken down into the following objects: white block, box, blue block, yellow block, green block, red block, brown block 1, and brown block 2. The cubes with a height of 40 mm are cubes located inside the wooden box that is 80 mm deep. They can only be taken from the top of the container. Help me to train you in being able to pick up a single object and hold it.

Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities image 3

you have access to the following functions:

grab(): Turn on the suction pump to grab an object
release(): Turns off the suction pump to release an object
get_position(object): Given a string of an object name, returns the coordinates and orientation of the vacuum pump to touch the top of the object [X, Y, Z, Yaw, Pitch, Roll]
move_to(position): It moves the suction pump to a given position [X, Y, Z, Yaw, Pitch, Roll].

You have permission to generate new functions utilizing these APIs, however, utilizing any other hypothetical functions is not permissible.

It is recommended that you maintain a straightforward and unambiguous approach while developing solutions. The positions are denoted in millimeters and angles in degrees. Additionally, to obtain further clarity, you may use the tag “Question -” to ask any relevant queries. Here’s an illustrative scenario that showcases how you can use the tag to seek clarification.

Suppose there are two spheres present in a scene. Please retrieve the sphere.

ChatGPT: There are two spheres. Which one would you like me to retrieve?

User: Please retrieve Sphere 1. Can you provide a Python code solution?

ChatGPT: Certainly. Before we proceed, can you please describe the scene and task at hand?

User: In the scene, there are various objects, including a white pad, a box, a blue block, a yellow block, a green block, a red block, and two brown blocks. The blocks are cube-shaped and have a height of 40 mm. They are situated inside the 80 mm deep box and can only be accessed from the top. The task is for you to learn how to pick up a single object and hold it. To do this, you must move to a safe distance above the object (100 mm), grasp it, and lift it.

ChatGPT: Understood. Here’s a Python function that will take the name of the object, move the robotic arm above the object, grasp it, and then lift it to a safe distance of 100 mm above the object. Please keep in mind that the function assumes that the robotic arm is initially in a safe starting position.

import time
def pick_up_object(object_name):
# get the position of the object
object_pos = get_position(object_name)
# move the robot arm to a safe position above the object
safe_pos = [object_pos[0], object_pos[1], object_pos[2]+100, object_pos[3], object_pos[4], object_pos[5]]
move_to(safe_pos)
# move the robot arm to the object's position
move_to(object_pos)
# grab the object
grab()
time.sleep(1) # wait for the object to be firmly attached
# move the robot arm back to the safe position above the object
move_to(safe_pos)

To utilize this function, you may directly call “pick_up_object” and provide the name of the object you wish to retrieve as a string, for instance: pick_up_object(“red block”).

User: Excellent! Next, I would like for you to acquire the skill of placing an object in a specific position. Assuming you are holding an object, you will initially need to shift to a secure position above the intended location, lower the object, and then release it.

ChatGPT: I understand your requirement. Here’s a Python function that will take the intended position where you want to place the object. The function will then move the robotic arm to a secure position above that position, lower the object to that location, and then release it. It’s important to note that the function assumes that the robotic arm is currently holding an object.

def place_object(position): # move the robot arm to a safe position above the target location safe_pos = [position[0], position[1], position[2]+100, position[3], position[4], position[5]] move_to(safe_pos) # move the robot arm to the target location move_to(position) # release the object release() # move the robot arm back to the safe position above the target location move_to(safe_pos)

To utilize this function, you can call “place_object” and provide the target position as a list of 6 numbers, for instance: place_object([100, 200, 50, 0, 0, 0]). It’s worth mentioning that the position is specified in millimeters, and the orientation angles are in degrees.

User: Excellent! Now, I would like for you to stack both of the brown blocks onto the green block.

ChatGPT: Here’s the Python code that will enable you to stack the two brown blocks onto the green block:

# get the positions of the blocks and the white pad
white_pad_pos = get_position("white pad")
blue_pos = get_position("blue block")
yellow_pos = get_position("yellow block")
red_pos = get_position("red block")
green_pos = get_position("green block")
# pick up the blue block
pick_up_object("blue block")
# calculate the position to place the blue block
place_pos = [white_pad_pos[0]-20, white_pad_pos[1]-20, white_pad_pos[2]+40, 0, 0, 0]
# place the blue block on the white pad
place_object(place_pos)
# pick up the yellow block
pick_up_object("yellow block")
# calculate the position to place the yellow block
place_pos = [white_pad_pos[0]+20, white_pad_pos[1]-20, white_pad_pos[2]+40, 0, 0, 0]
# place the yellow block on the white pad
place_object(place_pos)
# pick up the red block
pick_up_object("red block")
# calculate the position to place the red block
place_pos = [white_pad_pos[0]-20, white_pad_pos[1]+20, white_pad_pos[2]+40, 0, 0, 0]
# place the red block on the white pad
place_object(place_pos)
# pick up the green block
pick_up_object("green block")

This code works by first obtaining the positions of the green block and both brown blocks utilizing the get_position function. After that, it will pick up the first brown block using the previously defined pick_up_object function. The code will then compute the position to place the first brown block on top of the green block and use the place_object function (which we defined earlier) to place the first brown block on top of the green block. The code will then pick up the second brown block, compute the position to place the second brown block on top of the first brown block, and use the place_object function (which we defined earlier) to place the second brown block on top of the first brown block. It’s important to keep in mind that we add the height of the green block (40 mm) to the height of the previous block to ensure that the block is placed on top of the previous block.

Empowering Robotics through ChatGPT: Unveiling Design Principles and Model Capabilities image 4

Bringing robotics to the world

We aim to bring robotics to a wider audience through language-based control. However, please note that ChatGPT outputs should not be deployed on robots without careful analysis and safety precautions. We encourage users to use simulations to evaluate these algorithms. Our work represents a small fraction of what is possible, and we hope to inspire more research in this area.