After running this, we can expect the typical JSON response:
It worked! All is right in the world!
Sometimes it isn't that simple, though. What happens when you get a response that isn't JSON? What if there's no API to get what you want? You have to come up with some other ways to extract the data you're looking for.
Now, I'm a full-time software engineering student at Flatiron School and I'm going to use their website for this example: flatironschool.com. While browsing the site, I noticed towards the bottom of the page they have a news feed. It looks like this today:
Of course, there's no API for this site and it's certainly not going to give me a JSON response. How can we get to this information if I want to save it all and display it in a table or on a list?
Let's see what happens when we try our normal fetch:
If you guessed that we would get an object back with a massive HTML content line, you were right. This website really isn't even a huge site and look at this return:
The content_length is 239147 characters! It's a massive block of text response wrapped inside an object. So, lets see how we can work with this. The HTML content we need is inside the "contents" key. If we grab it, we'll only have the pure HTML from the website. That's exactly what we want.
If you're following along in your own console, you'll see that now you only receive the pure 239,147 character HTML string. So, how can we extract information from it? Well, you likely already know how to find things inside your own document. It's really no different here. We need to tell our code to look at this string as HTML though. So let's add some more to our code and create a new function.
We now have the ability to read the string we previously acquired as actual HTML data. That means we can use querySelector, querySelectorAll, getElementById, getElementsByTagName, getElementsByName, or getElementsByClassName to find what we want. I'm going to use querySelector and querySelectorAll.
Just doing a visual inspection of the code using our development tools elements tab, we can quickly find the news container we're looking for here:
It looks like this container is a section with the id of "id-ea783478-7bf2-5965-897f-a023bd94ac90".
If we look a little deeper, we can also see the headlines we want are nested just a little bit deeper within that section and they have the id of "title-oE5vT4".
Let's find the container and all of the titles inside it:
That's that! We now have an object with all of our headlines. If we take it one small step farther, we can log them all out in the console.
Now you can take that headlines object and add them all to a table, or a spreadsheet, or anywhere else. If you want to go beyond the headline, you can easily use the above methods to also grab the hyperlink, summary, or any of the data you want!
Thanks for reading this! I hope it helped you out. If you enjoyed the content, please check back and also follow my journey on Twitter!
Now, go scrape your favorite website now that you know how to handle this kind of fetch response!