About CuyaCourts

About

A data scientist and social scientist walk into a bar... and, well, that's not far off from how this project started.

It was becoming ever more important for me to display my skillset. Was I a data scientist? a data engineer? a machine learning engineer? The punchline here plays on the linguistic conundrum that both plagues and is exacerbated by our own creations.

Whatever we may be, we're leaders in integrating tech with society and culture. This project has been featured in community meetups and upskilling events around Python, SQL, Web scraping, tech ethics, and cloud architecture. It's full potential is yet to be discovered.

The CuyaCourts project is an ETL pipeline, and only engages with the data acquisition and understanding step of the Data Science Life Cycle. It's a scraper that picks up one case at a time from the Cuyahoga County Criminal Court Case Docket and stores the information in an analyzable format - a relational database, available for download.

Similar projects make electronic court records available by CLI Stanford's Big Local News, by API CourtListener, Harvard Law School's Caselaw Access Project or paid subscription The Public Access to Court Electronic Records (PACER).

So if others have already done this, did I just duplicate their efforts? (No.) Is this really the most efficient way? (today, yes.) I address these questions, and more in depth in the FAQs below. Please let me know if you're curious about anything else, or have any additional information to share, by using this form.

One thing I am not - is a front-end engineer, however, I'm closer now than I was yesterday because of this project. Thank you for your patience with this website.

Frequently Asked Questions

The Criminal Court case docket is made available by the Cuyahoga County Clerk of Courts website, along with Civil/Domestic and Court of Appeals cases, searchable by at least case number or name. A bulk downloadable version was requested via a Freedom of Information Act (FOIA) request in 2022 and was met with no response. Programmatic access to this data was also not available, when requested through the website contact email. Requesting access to this data from other public civic data projects, such as the Marshall Project, presenters at Cleveland's Data Days 2022 - was met with non-response as well. At the time of its creation, browsing other sources for bulk downloadable or programmatic access to municipal and county courts yeilded no coverage for Cuyahoga County Courts. Furthermore, open source collections of county-level court scrapers such as Stanford's Big Local News did not have a scraper for Cuyahoga County. Paid subscription tools were not investigated as thoroughly.

At the Cuyahoga County Clerk of Courts website, an individual can obtain all the same information about each case at a rate of ~500 cases per day before their IP address is blocked. This consideration shaped the design of this project.

Storage: AWS Relational Database Service (Postgres), Github, AWS Elastic Container Registry
Python Libraries: Selenium, boto3, SQLAlchemy ORM
Compute: AWS Lambda
Deployment: Github actions, Docker, pgadmin4

Each case was retrieved by case number, which are ordered chronologically. You can see the exact details per case by querying the progress table, which will tell you the exact time the records for each case were retrieved and committed to the CuyaCourts database. In the case a record is retrieved twice, its existing presence in the CuyaCourts database is deleted and new records are written - so there are no duplicates. Should the case still be open at the time it was retrieved, the records for that case in CuyaCourts are incomplete.

The information contained in the CuyaCourts database is provided “as-is” without any warranties or guarantees of accuracy. Please do not rely on this data set to solve personal legal problems.

At this time, further technical development will hinge upon community interests. There is no known urgency or budget to guide database versioning development at this time.

According to Ohio Revised Code 149.43 all court records and dockets are public information. That being said, any patterns observed in this data reflect known biases and any analysis must be performed mindfully so as not to misrepresent or harm individuals engaged with the criminal litigation process.

Another angle for ethics may be: did the process of obtaining these records harm the host website? It did not. Each record was scraped one by one and at a gentle pace so as not to overwhelm the host server - and only after requesting access to the data via FOIA request and email inquiry yielded non-response.

The information contained in the CuyaCourts database is provided “as-is” without any warranties or guarantees of accuracy. Please do not rely on this data set to solve personal legal problems.

About

About

Frequently Asked Questions

Besides CuyaCourts, which other ways can an individual access the same data?

What tools were used to create the CuyaCourts ETL pipeline?

How current is the database?

Are there plans for updates to the CuyaCourts database?

Is it ethical to make this database public?