<aside>
⚠️ Your task is to develop a full-stack web application. Either as a browser plugin or as an website, where proto users can log in and contribute data in a pre-specified format. The format required is mentioned in this doc and your steps, UX flow and details are also available here. You may choose whatever UI you want to build on.
</aside>
1. Develop the Web Scraping Engine
- Objective: Build a system to scrape data from the internet using users' idle internet time.
- Tasks:
- Choose Scraping Tools: Select tools such as BeautifulSoup, Scrapy, or Puppeteer.
- Idle Time Utilization: Implement a mechanism to run the scraper during users' idle internet time.
- Data Extraction: Write scripts to extract names of places, addresses, lat-longs, and categorize the type of place.
- Error Handling: Ensure the scraper can handle errors and retries efficiently.
2. Implement the Scraping Control Interface
- Objective: Create a browser plugin for users to control the scraping process.
- UX Flow:
- User Starts scraping on our plugin / web-page
- User opens web-pages of places near them
- Only opened web-pages are scraped
- User needs to keep web-pages open till scraping is complete
- Plugin / Web-page should show a status bar for the scraping status
- Once scraping is done, user can close the web-page
- We need to show status of number of webpages done scraping
- User needs to complete at least 100 webpages to complete their session
- Tasks:
- Start/Stop Scraping: Develop functionality to start and stop the scraping process.
- Real-time Status: Display real-time status of the scraping process (e.g., number of places scraped).
- Frontend Development: Use React.js for the frontend to provide a responsive and user-friendly interface.
- Backend Integration: Use Node.js to handle the backend logic and communicate with the scraping engine.
3. CSV Generation and Review Interface
4. CSV Upload and Validation
- Objective: Enable users to upload their own CSV files and validate them.
- Tasks:
- CSV Upload Interface: Develop an interface for users to upload CSV files.
- Schema Validation: Implement a service to check the formatting and schema of uploaded CSV files.
- Approval/Reject Mechanism: Provide feedback to users on the validation results and allow resubmission if necessary.
- Integration with Database: Ensure that validated CSV data can be submitted and stored in the database.
<aside>
⚠️ Note: The search and scraping process is automated, but the opening of web-pages needs to be manual. This will let users select the places they want to scrape and add to the map. They need to find new places, find their websites and open the webpage.
</aside>
<aside>
⚠️ Note 2: In the search process, a user might open several pages which do not have the data needed, these need to be rejected and we need to only accept entries which provide the data needed.
</aside>