The DataHub is UC Berkeley’s implementation of Jupyterhub and is a free service made available to UC Berkeley students to provide a small amount of computing and storage on a virtual machine running in the cloud.
This guide provides an overview of the tasks you will need to complete before the semester begins, during the semester, and at the end of the semester. This high-level information will help you navigate your coursework effectively using the UC Berkeley DataHub.
Before the Semester Begins
- Browser Compatibility: Ensure you are using a compatible web browser (Chrome, Firefox, Safari) that is updated to the latest version.
- Internet Connection: Verify you have a stable internet connection to access DataHub smoothly.
- Activate CalNet ID: Ensure your CalNet ID is activated and functioning properly.
- Access DataHub: Go to datahub.berkeley.edu and log in with your CalNet ID to verify you can access the platform (allow Bcourses to authenticate “DataHub is requesting access to your account”).
- Overview: Review student resources to understand the varied features of DataHub.
- Interface Tour: Explore the DataHub interface, including the JupyterLab and RStudio environments.
During the Semester
- Course Hub: Use nbgitpuller links shared by your course instructors to launch notebooks in DataHub
- Notebooks and Scripts: Open and work on Jupyter Notebooks, R scripts, or other files as provided by your instructor.
- To manage files in JupyterHub:
- To upload a file, click the "Upload" button in the JupyterHub interface and select the file from your local machine.
- To download a file, right-click on the file in the JupyterHub interface and select the "Download" option.
- Regular Use: Regularly log in to DataHub to complete assignments, run analyses, and work on projects.
- Save Work: Save your progress frequently. It's good practice to manually save your work as well.
Check Storage Space: Delete unnecessary files and constantly check the storage size of the home directory. You can do this by opening a Terminal, and executing du -sh
- Don’t Duplicate Shared Directory Content: If your course work requires shared directories where instructors are storing large datasets, don’t create a copy of the files in your home directory. Always, read data from the shared directory.
- Do Your Work in Sub Directories: Create a sub directory for each assignment and do your work there. Avoid working on assignments from the root directory as they may lead to data issues if done wrongly.
Datahub doesn’t have collaboration tools at this time as we are continuously testing the latest updates.
Instructor Feedback: Share your work with instructors or TAs for feedback by downloading and submitting your notebooks as required.
- Restart Kernel/Server: Try restarting your kernel as a classic troubleshooting step to see if the error goes away. If the problem persists, restart your server.
- Having too many things open on Datahub can cause issues. To check running processes and kill them follow instructions in Curriculum Guide
- UX Approach
- In Notebook - click Jupyter Icon in UR to get to ”tree”
- In “tree” click running processes tab
- Right click on any process to kill it ( or Kill all)
- In Lab you can see running processes in tab on left with this icon
- CLI approach
- Open the terminal in Datahub.
- Use the command ps aux to list all running processes.
- Find the process ID (PID) of the processes you want to stop.
- Use the command kill <PID> to terminate those processes.
- Repeat these steps as necessary to manage your system resources.
- Support: Reach out to course TAs for technical help, and they will contact infrastructure staff if they are unable to resolve your issue.
End of the Semester
- Backup coursework: Back up your notebooks and data to either your personal device or an external storage service like Google Drive, Dropbox. The course specific hubs will go away by the end of the semester.
- User Home Directory Archiving: Files unused for 90 days will be archived and stored in a low cost storage. You will need to open a request with the DataHub team to retrieve your unused files.
- Clean Up: Clean up your DataHub workspace by deleting unnecessary files and folders.
- Feedback: Provide feedback on your experience with the DataHub team to help improve the service for future students.
FAQ
Ensure your CalNet ID is active and try logging in again. If the problem persists, inform your TA or check out this guide for additional help.
Use !pip install package-name in a Jupyter Notebook cell for Python packages, or install.packages("package-name") in the R console.
Yes, you can access DataHub from anywhere with an internet connection.
First, try restarting your kernel. If the issue persists, contact your course TA, and if they can’t resolve it they will reach out to DataHub staff.